Voice Chat Hooks

The Voice Chat feature uses two main custom hooks: one for fetching video transcripts and another for handling audio recording and processing.

useFetchVideoTranscript Hook

This hook handles fetching YouTube video transcripts to provide context for voice conversations.

Implementation

import { useCallback } from 'react';
import { useToast } from '@/hooks/use-toast';
import { useVoiceChatStore } from '../store/voice-chat-store';

export const useFetchVideoTranscript = () => {
  const { toast } = useToast();
  const {
    videoUrl,
    setIsLoading,
    setTranscript,
    setHasTranscript,
    setMessages
  } = useVoiceChatStore();

  const fetchTranscript = useCallback(async () => {
    if (!videoUrl.trim()) {
      toast({
        title: 'Error',
        description: 'Please enter a YouTube URL',
        variant: 'destructive'
      });
      return false;
    }

    setIsLoading(true);
    setMessages([]);
    setTranscript('');
    setHasTranscript(false);

    try {
      const response = await fetch('/api/transcribe', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ videoUrl })
      });

      const data = await response.json();

      if (!response.ok) {
        throw new Error(data.error || 'Failed to fetch transcript');
      }

      if (!data?.transcript?.fullTranscript) {
        throw new Error('No transcript available for this video');
      }

      setTranscript(data.transcript.fullTranscript);
      setHasTranscript(true);

      // Add initial system message
      setMessages([
        {
          id: Date.now().toString(),
          role: 'system',
          content:
            "Voice agent is ready! You can now speak and I'll respond with voice in your selected language."
        }
      ]);

      toast({
        title: 'Success',
        description:
          'Voice Agent Ready. Click the microphone to start speaking!',
        variant: 'default'
      });

      return true;
    } catch (error: any) {
      console.error('Error fetching transcript:', error);
      toast({
        title: 'Error',
        description: error.message || 'Failed to fetch transcript',
        variant: 'destructive'
      });
      return false;
    } finally {
      setIsLoading(false);
    }
  }, [
    videoUrl,
    toast,
    setIsLoading,
    setTranscript,
    setHasTranscript,
    setMessages
  ]);

  return { fetchTranscript };
};

How It Works

Step 1: Validation

Checks if videoUrl is not empty
Shows error toast if URL missing
Returns false if validation fails

Step 2: Reset State

Sets isLoading to true (shows loading UI)
Clears previous messages array
Clears previous transcript
Sets hasTranscript to false

Step 3: Fetch Transcript

POSTs to /api/transcribe endpoint
Sends video URL in request body
Waits for response with transcript data

Step 4: Process Response

Checks if response is successful
Validates transcript exists in response
Throws error if transcript missing

Step 5: Update State

Stores transcript text in store
Sets hasTranscript to true
Creates initial system message welcoming user
Shows success toast notification

Step 6: Cleanup

Sets isLoading to false in finally block
Returns true on success, false on failure

Usage

function VoiceChatSetup() {
  const { fetchTranscript } = useFetchVideoTranscript();

  const handleSubmit = async () => {
    const success = await fetchTranscript();
    if (success) {
      // Transcript loaded, enable voice chat
    }
  };

  return (
    <button onClick={handleSubmit}>
      Activate Voice Agent
    </button>
  );
}

useAudioRecording Hook

This hook manages the entire audio recording and processing pipeline, from capturing microphone input to playing AI voice responses.

Hook Structure

export const useAudioRecording = () => {
  const { toast } = useToast();
  const {
    hasTranscript,
    isRecording,
    isProcessingAudio,
    transcript,
    messages,
    selectedLanguage,
    setIsRecording,
    setIsProcessingAudio,
    setAudioLevel,
    addMessage,
    setMessages,
    setIsPlayingAudio
  } = useVoiceChatStore();

  const mediaRecorderRef = useRef<MediaRecorder | null>(null);
  const audioChunksRef = useRef<Blob[]>([]);
  const audioRef = useRef<HTMLAudioElement | null>(null);
  const analyserRef = useRef<AnalyserNode | null>(null);
  const animationFrameRef = useRef<number | null>(null);
  const recordingStartTimeRef = useRef<number>(0);

  // ... methods
}

Refs Used

mediaRecorderRef: Stores MediaRecorder instance for recording audio

audioChunksRef: Accumulates audio data chunks during recording

audioRef: Stores Audio element for playing AI responses

analyserRef: AnalyserNode for real-time audio level monitoring

animationFrameRef: Stores animation frame ID for cancellation

recordingStartTimeRef: Records start time to validate minimum duration

Audio Processing Functions

monitorAudioLevel

const monitorAudioLevel = useCallback(() => {
  if (!analyserRef.current) return;

  const bufferLength = analyserRef.current.frequencyBinCount;
  const dataArray = new Uint8Array(bufferLength);
  analyserRef.current.getByteTimeDomainData(dataArray);

  let sum = 0;
  for (let i = 0; i < bufferLength; i++) {
    const sample = (dataArray[i] - 128) / 128;
    sum += sample * sample;
  }
  const rms = Math.sqrt(sum / bufferLength);
  const audioLevel = rms * 255;

  setAudioLevel(audioLevel);

  if (isRecording) {
    animationFrameRef.current = requestAnimationFrame(monitorAudioLevel);
  }
}, [isRecording, setAudioLevel]);

How It Works:

Gets frequency data from AnalyserNode
Converts time-domain data to RMS (root mean square)
Calculates audio level from 0-255
Updates store with current level
Recursively calls itself via requestAnimationFrame while recording
Drives visual feedback animation in UI

audioBufferToWav

const audioBufferToWav = async (audioBuffer: AudioBuffer): Promise<Blob> => {
  const numberOfChannels = audioBuffer.numberOfChannels;
  const sampleRate = audioBuffer.sampleRate;
  const format = 1; // PCM
  const bitDepth = 16;

  const bytesPerSample = bitDepth / 8;
  const blockAlign = numberOfChannels * bytesPerSample;
  const byteRate = sampleRate * blockAlign;
  const dataSize = audioBuffer.length * blockAlign;
  const bufferSize = 44 + dataSize; // 44 bytes for WAV header

  const arrayBuffer = new ArrayBuffer(bufferSize);
  const view = new DataView(arrayBuffer);

  const writeString = (offset: number, string: string) => {
    for (let i = 0; i < string.length; i++) {
      view.setUint8(offset + i, string.charCodeAt(i));
    }
  };

  // Write WAV header
  writeString(0, 'RIFF');
  view.setUint32(4, bufferSize - 8, true);
  writeString(8, 'WAVE');
  writeString(12, 'fmt ');
  view.setUint32(16, 16, true);
  view.setUint16(20, format, true);
  view.setUint16(22, numberOfChannels, true);
  view.setUint32(24, sampleRate, true);
  view.setUint32(28, byteRate, true);
  view.setUint16(32, blockAlign, true);
  view.setUint16(34, bitDepth, true);
  writeString(36, 'data');
  view.setUint32(40, dataSize, true);

  // Write audio samples
  const channelData = audioBuffer.getChannelData(0);
  let offset = 44;
  for (let i = 0; i < channelData.length; i++) {
    const sample = Math.max(-1, Math.min(1, channelData[i]));
    view.setInt16(offset, sample * 0x7fff, true);
    offset += 2;
  }

  return new Blob([arrayBuffer], { type: 'audio/wav' });
};

How It Works:

Creates a new ArrayBuffer for WAV file
Writes WAV header with correct format specifications
Converts floating-point audio samples to 16-bit PCM
Returns Blob suitable for API upload
Handles WebM/OGG to WAV conversion

processAudioInput

const processAudioInput = useCallback(
  async (audioBlob: Blob, mimeType: string) => {
    setIsProcessingAudio(true);

    try {
      let finalBlob = audioBlob;
      let fileName = 'recording.wav';

      // Convert to WAV if needed
      if (
        mimeType.includes('webm') ||
        mimeType.includes('ogg') ||
        mimeType.includes('mp4')
      ) {
        console.log('Converting audio to WAV format...');
        const arrayBuffer = await audioBlob.arrayBuffer();
        const audioContext = new AudioContext();
        const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
        finalBlob = await audioBufferToWav(audioBuffer);
        audioContext.close();
      }

      // Step 1: Speech to Text
      const formData = new FormData();
      formData.append('audio', finalBlob, fileName);
      formData.append('language', selectedLanguage);

      const sttResponse = await fetch('/api/voice-chat/speech-to-text', {
        method: 'POST',
        body: formData
      });

      const sttData = await sttResponse.json();

      if (!sttResponse.ok) {
        throw new Error(sttData.error || 'Failed to transcribe audio');
      }

      const userText = sttData.text;
      if (!userText.trim()) {
        toast({
          title: 'No Speech Detected',
          description: 'Please try speaking more clearly.',
          variant: 'destructive'
        });
        return;
      }

      // Step 2: Add user message
      const userMessage: Message = {
        id: Date.now().toString(),
        role: 'user',
        content: userText,
        language: selectedLanguage
      };
      addMessage(userMessage);

      // Step 3: Get AI response
      const chatResponse = await fetch('/api/voice-chat/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          message: userText,
          transcript,
          language: selectedLanguage,
          previousMessages: messages.filter((m) => m.role !== 'system')
        })
      });

      const chatData = await chatResponse.json();

      if (!chatResponse.ok) {
        throw new Error(chatData.error || 'Failed to get AI response');
      }

      // Step 4: Convert response to speech
      const ttsResponse = await fetch('/api/voice-chat/text-to-speech', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          text: chatData.response,
          language: selectedLanguage
        })
      });

      if (!ttsResponse.ok) {
        throw new Error('Failed to generate speech');
      }

      const audioBuffer = await ttsResponse.arrayBuffer();
      const responseBlobAudio = new Blob([audioBuffer], {
        type: 'audio/wav'
      });
      const audioUrl = URL.createObjectURL(responseBlobAudio);

      // Step 5: Add assistant message with audio
      const assistantMessage: Message = {
        id: (Date.now() + 1).toString(),
        role: 'assistant',
        content: chatData.response,
        audioUrl,
        language: selectedLanguage
      };
      addMessage(assistantMessage);

      // Step 6: Auto-play response
      playAudio(audioUrl);
    } catch (error: any) {
      console.error('Error processing audio:', error);
      toast({
        title: 'Processing Error',
        description: error.message || 'Failed to process audio input',
        variant: 'destructive'
      });
    } finally {
      setIsProcessingAudio(false);
    }
  },
  [
    selectedLanguage,
    transcript,
    messages,
    toast,
    setIsProcessingAudio,
    addMessage,
    audioBufferToWav
  ]
);

Processing Pipeline:

Format Conversion: Converts WebM/OGG to WAV if needed
Speech-to-Text: Sends audio to STT API, gets transcribed text
Validation: Checks if text is non-empty
User Message: Adds user's question to message history
AI Chat: Sends question + transcript context to chat API
Text-to-Speech: Converts AI response to audio
Assistant Message: Adds AI response with audio URL to history
Auto-play: Automatically plays audio response

Recording Control Functions

initializeRecording

const initializeRecording = async () => {
  try {
    const stream = await navigator.mediaDevices.getUserMedia({
      audio: {
        echoCancellation: true,
        noiseSuppression: true,
        sampleRate: 44100
      }
    });

    // Create audio context for level monitoring
    const audioContext = new AudioContext();
    const source = audioContext.createMediaStreamSource(stream);
    const analyser = audioContext.createAnalyser();
    analyser.fftSize = 2048;
    analyser.smoothingTimeConstant = 0.8;
    source.connect(analyser);
    analyserRef.current = analyser;

    // Determine MIME type
    let mimeType = 'audio/wav';
    const supportsWav = MediaRecorder.isTypeSupported('audio/wav');
    if (!supportsWav) {
      mimeType = 'audio/webm;codecs=opus';
    }

    const mediaRecorder = new MediaRecorder(stream, { mimeType });
    mediaRecorderRef.current = mediaRecorder;
    audioChunksRef.current = [];

    mediaRecorder.ondataavailable = (event) => {
      if (event.data.size > 0) {
        audioChunksRef.current.push(event.data);
      }
    };

    mediaRecorder.onstop = async () => {
      const audioBlob = new Blob(audioChunksRef.current, { type: mimeType });
      stream.getTracks().forEach((track) => track.stop());
      audioContext.close();

      if (audioBlob.size === 0) {
        toast({
          title: 'Recording Error',
          description: 'No audio data captured. Please try again.',
          variant: 'destructive'
        });
        setIsProcessingAudio(false);
        return;
      }

      await processAudioInput(audioBlob, mimeType);
    };

    return true;
  } catch (error) {
    console.error('Error accessing microphone:', error);
    toast({
      title: 'Microphone Error',
      description: 'Unable to access microphone. Please check permissions.',
      variant: 'destructive'
    });
    return false;
  }
};

Initialization Steps:

Requests microphone access with audio constraints
Creates AudioContext for level monitoring
Sets up AnalyserNode for waveform analysis
Determines best supported MIME type
Creates MediaRecorder with chosen format
Sets up event handlers for data and stop events
Returns success/failure status

startRecording

const startRecording = async () => {
  if (!hasTranscript || isRecording || isProcessingAudio) return;

  const initialized = await initializeRecording();
  if (!initialized) return;

  setIsRecording(true);
  setAudioLevel(0);
  recordingStartTimeRef.current = Date.now();
  mediaRecorderRef.current?.start(100); // Request data every 100ms
  monitorAudioLevel();

  toast({
    title: 'Recording Started',
    description: 'Speak now... Click the microphone again to stop.',
    variant: 'default'
  });
};

Start Process:

Validates transcript exists and no recording active
Initializes recording (microphone access, setup)
Sets recording state to true
Resets audio level
Records start time for duration validation
Starts MediaRecorder with 100ms timeslice
Begins audio level monitoring
Shows toast notification

stopRecording

const stopRecording = () => {
  if (mediaRecorderRef.current && isRecording) {
    try {
      const recordingDuration = Date.now() - recordingStartTimeRef.current;

      if (recordingDuration < 500) {
        toast({
          title: 'Recording Too Short',
          description: 'Please speak for at least half a second.',
          variant: 'destructive'
        });
        setIsRecording(false);
        setAudioLevel(0);
        if (mediaRecorderRef.current.state === 'recording') {
          mediaRecorderRef.current.stop();
        }
        return;
      }

      if (mediaRecorderRef.current.state === 'recording') {
        mediaRecorderRef.current.stop();
      }
      setIsRecording(false);
      setAudioLevel(0);

      if (animationFrameRef.current) {
        cancelAnimationFrame(animationFrameRef.current);
        animationFrameRef.current = null;
      }
    } catch (error) {
      console.error('Error stopping recording:', error);
      setIsRecording(false);
      setAudioLevel(0);
    }
  }
};

Stop Process:

Calculates recording duration
Validates minimum duration (500ms)
Stops MediaRecorder
Resets recording state and audio level
Cancels audio level monitoring animation
Triggers onstop event which processes audio

toggleRecording

const toggleRecording = () => {
  if (isRecording) {
    stopRecording();
  } else {
    startRecording();
  }
};

Convenience function for single-button record/stop control.

forceStop

const forceStop = () => {
  if (mediaRecorderRef.current) {
    try {
      mediaRecorderRef.current.ondataavailable = null;
      mediaRecorderRef.current.onstop = null;

      if (mediaRecorderRef.current.state === 'recording') {
        mediaRecorderRef.current.stop();
      }

      const stream = mediaRecorderRef.current.stream;
      stream?.getTracks().forEach((track) => track.stop());
      mediaRecorderRef.current = null;
    } catch (error) {
      console.error('Error force stopping:', error);
    }
  }

  if (animationFrameRef.current) {
    cancelAnimationFrame(animationFrameRef.current);
    animationFrameRef.current = null;
  }

  if (audioRef.current) {
    audioRef.current.pause();
    audioRef.current = null;
  }

  audioChunksRef.current = [];
  setIsRecording(false);
  setIsProcessingAudio(false);
  setIsPlayingAudio(false);
  setAudioLevel(0);

  toast({
    title: 'Stopped',
    description: 'Voice chat has been stopped.',
    variant: 'default'
  });
};

Force Stop Actions:

Removes MediaRecorder event listeners (prevents processing)
Stops MediaRecorder
Stops all media stream tracks
Cancels animation frame
Stops audio playback
Clears audio chunks
Resets all states
Shows notification

This is used for immediate cancellation without processing recorded audio.

Audio Playback Function

playAudio

const playAudio = (audioUrl: string) => {
  if (audioRef.current) {
    audioRef.current.pause();
  }

  const audio = new Audio(audioUrl);
  audioRef.current = audio;

  audio.onplay = () => setIsPlayingAudio(true);
  audio.onended = () => setIsPlayingAudio(false);
  audio.onerror = () => {
    setIsPlayingAudio(false);
    toast({
      title: 'Audio Error',
      description: 'Failed to play audio response',
      variant: 'destructive'
    });
  };

  audio.play().catch(console.error);
};

Playback Flow:

Pauses any currently playing audio
Creates new Audio element with URL
Sets up event listeners for play, end, and error
Updates isPlayingAudio state based on events
Starts playback
Shows error toast if playback fails

Cleanup Function

cleanup

const cleanup = () => {
  if (mediaRecorderRef.current?.state === 'recording') {
    mediaRecorderRef.current.stop();
  }
  if (animationFrameRef.current) {
    cancelAnimationFrame(animationFrameRef.current);
  }
  if (audioRef.current) {
    audioRef.current.pause();
  }
};

Called on component unmount to clean up resources.

Hook Return Value

return {
  toggleRecording,
  forceStop,
  playAudio,
  cleanup,
  audioRef
};

Usage Example

function VoiceChat() {
  const { toggleRecording, forceStop, playAudio, cleanup, audioRef } = useAudioRecording();
  const { isRecording, isProcessingAudio } = useVoiceChatStore();

  useEffect(() => {
    return cleanup; // Cleanup on unmount
  }, [cleanup]);

  return (
    <div>
      <button 
        onClick={toggleRecording}
        disabled={isProcessingAudio}
      >
        {isRecording ? 'Stop Recording' : 'Start Recording'}
      </button>
      
      {(isRecording || isProcessingAudio) && (
        <button onClick={forceStop}>Force Stop</button>
      )}
      
      <audio ref={audioRef} style={{ display: 'none' }} />
    </div>
  );
}

Voice Chat Hooks

On this page