Korai Docs
VoiceChat

Voice Chat Hooks

Custom hooks for voice chat functionality

Voice Chat Hooks

The Voice Chat feature uses two main custom hooks: one for fetching video transcripts and another for handling audio recording and processing.

useFetchVideoTranscript Hook

This hook handles fetching YouTube video transcripts to provide context for voice conversations.

Implementation

import { useCallback } from 'react';
import { useToast } from '@/hooks/use-toast';
import { useVoiceChatStore } from '../store/voice-chat-store';

export const useFetchVideoTranscript = () => {
  const { toast } = useToast();
  const {
    videoUrl,
    setIsLoading,
    setTranscript,
    setHasTranscript,
    setMessages
  } = useVoiceChatStore();

  const fetchTranscript = useCallback(async () => {
    if (!videoUrl.trim()) {
      toast({
        title: 'Error',
        description: 'Please enter a YouTube URL',
        variant: 'destructive'
      });
      return false;
    }

    setIsLoading(true);
    setMessages([]);
    setTranscript('');
    setHasTranscript(false);

    try {
      const response = await fetch('/api/transcribe', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ videoUrl })
      });

      const data = await response.json();

      if (!response.ok) {
        throw new Error(data.error || 'Failed to fetch transcript');
      }

      if (!data?.transcript?.fullTranscript) {
        throw new Error('No transcript available for this video');
      }

      setTranscript(data.transcript.fullTranscript);
      setHasTranscript(true);

      // Add initial system message
      setMessages([
        {
          id: Date.now().toString(),
          role: 'system',
          content:
            "Voice agent is ready! You can now speak and I'll respond with voice in your selected language."
        }
      ]);

      toast({
        title: 'Success',
        description:
          'Voice Agent Ready. Click the microphone to start speaking!',
        variant: 'default'
      });

      return true;
    } catch (error: any) {
      console.error('Error fetching transcript:', error);
      toast({
        title: 'Error',
        description: error.message || 'Failed to fetch transcript',
        variant: 'destructive'
      });
      return false;
    } finally {
      setIsLoading(false);
    }
  }, [
    videoUrl,
    toast,
    setIsLoading,
    setTranscript,
    setHasTranscript,
    setMessages
  ]);

  return { fetchTranscript };
};

How It Works

Step 1: Validation

  • Checks if videoUrl is not empty
  • Shows error toast if URL missing
  • Returns false if validation fails

Step 2: Reset State

  • Sets isLoading to true (shows loading UI)
  • Clears previous messages array
  • Clears previous transcript
  • Sets hasTranscript to false

Step 3: Fetch Transcript

  • POSTs to /api/transcribe endpoint
  • Sends video URL in request body
  • Waits for response with transcript data

Step 4: Process Response

  • Checks if response is successful
  • Validates transcript exists in response
  • Throws error if transcript missing

Step 5: Update State

  • Stores transcript text in store
  • Sets hasTranscript to true
  • Creates initial system message welcoming user
  • Shows success toast notification

Step 6: Cleanup

  • Sets isLoading to false in finally block
  • Returns true on success, false on failure

Usage

function VoiceChatSetup() {
  const { fetchTranscript } = useFetchVideoTranscript();

  const handleSubmit = async () => {
    const success = await fetchTranscript();
    if (success) {
      // Transcript loaded, enable voice chat
    }
  };

  return (
    <button onClick={handleSubmit}>
      Activate Voice Agent
    </button>
  );
}

useAudioRecording Hook

This hook manages the entire audio recording and processing pipeline, from capturing microphone input to playing AI voice responses.

Hook Structure

export const useAudioRecording = () => {
  const { toast } = useToast();
  const {
    hasTranscript,
    isRecording,
    isProcessingAudio,
    transcript,
    messages,
    selectedLanguage,
    setIsRecording,
    setIsProcessingAudio,
    setAudioLevel,
    addMessage,
    setMessages,
    setIsPlayingAudio
  } = useVoiceChatStore();

  const mediaRecorderRef = useRef<MediaRecorder | null>(null);
  const audioChunksRef = useRef<Blob[]>([]);
  const audioRef = useRef<HTMLAudioElement | null>(null);
  const analyserRef = useRef<AnalyserNode | null>(null);
  const animationFrameRef = useRef<number | null>(null);
  const recordingStartTimeRef = useRef<number>(0);

  // ... methods
}

Refs Used

mediaRecorderRef: Stores MediaRecorder instance for recording audio

audioChunksRef: Accumulates audio data chunks during recording

audioRef: Stores Audio element for playing AI responses

analyserRef: AnalyserNode for real-time audio level monitoring

animationFrameRef: Stores animation frame ID for cancellation

recordingStartTimeRef: Records start time to validate minimum duration

Audio Processing Functions

monitorAudioLevel

const monitorAudioLevel = useCallback(() => {
  if (!analyserRef.current) return;

  const bufferLength = analyserRef.current.frequencyBinCount;
  const dataArray = new Uint8Array(bufferLength);
  analyserRef.current.getByteTimeDomainData(dataArray);

  let sum = 0;
  for (let i = 0; i < bufferLength; i++) {
    const sample = (dataArray[i] - 128) / 128;
    sum += sample * sample;
  }
  const rms = Math.sqrt(sum / bufferLength);
  const audioLevel = rms * 255;

  setAudioLevel(audioLevel);

  if (isRecording) {
    animationFrameRef.current = requestAnimationFrame(monitorAudioLevel);
  }
}, [isRecording, setAudioLevel]);

How It Works:

  1. Gets frequency data from AnalyserNode
  2. Converts time-domain data to RMS (root mean square)
  3. Calculates audio level from 0-255
  4. Updates store with current level
  5. Recursively calls itself via requestAnimationFrame while recording
  6. Drives visual feedback animation in UI

audioBufferToWav

const audioBufferToWav = async (audioBuffer: AudioBuffer): Promise<Blob> => {
  const numberOfChannels = audioBuffer.numberOfChannels;
  const sampleRate = audioBuffer.sampleRate;
  const format = 1; // PCM
  const bitDepth = 16;

  const bytesPerSample = bitDepth / 8;
  const blockAlign = numberOfChannels * bytesPerSample;
  const byteRate = sampleRate * blockAlign;
  const dataSize = audioBuffer.length * blockAlign;
  const bufferSize = 44 + dataSize; // 44 bytes for WAV header

  const arrayBuffer = new ArrayBuffer(bufferSize);
  const view = new DataView(arrayBuffer);

  const writeString = (offset: number, string: string) => {
    for (let i = 0; i < string.length; i++) {
      view.setUint8(offset + i, string.charCodeAt(i));
    }
  };

  // Write WAV header
  writeString(0, 'RIFF');
  view.setUint32(4, bufferSize - 8, true);
  writeString(8, 'WAVE');
  writeString(12, 'fmt ');
  view.setUint32(16, 16, true);
  view.setUint16(20, format, true);
  view.setUint16(22, numberOfChannels, true);
  view.setUint32(24, sampleRate, true);
  view.setUint32(28, byteRate, true);
  view.setUint16(32, blockAlign, true);
  view.setUint16(34, bitDepth, true);
  writeString(36, 'data');
  view.setUint32(40, dataSize, true);

  // Write audio samples
  const channelData = audioBuffer.getChannelData(0);
  let offset = 44;
  for (let i = 0; i < channelData.length; i++) {
    const sample = Math.max(-1, Math.min(1, channelData[i]));
    view.setInt16(offset, sample * 0x7fff, true);
    offset += 2;
  }

  return new Blob([arrayBuffer], { type: 'audio/wav' });
};

How It Works:

  1. Creates a new ArrayBuffer for WAV file
  2. Writes WAV header with correct format specifications
  3. Converts floating-point audio samples to 16-bit PCM
  4. Returns Blob suitable for API upload
  5. Handles WebM/OGG to WAV conversion

processAudioInput

const processAudioInput = useCallback(
  async (audioBlob: Blob, mimeType: string) => {
    setIsProcessingAudio(true);

    try {
      let finalBlob = audioBlob;
      let fileName = 'recording.wav';

      // Convert to WAV if needed
      if (
        mimeType.includes('webm') ||
        mimeType.includes('ogg') ||
        mimeType.includes('mp4')
      ) {
        console.log('Converting audio to WAV format...');
        const arrayBuffer = await audioBlob.arrayBuffer();
        const audioContext = new AudioContext();
        const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
        finalBlob = await audioBufferToWav(audioBuffer);
        audioContext.close();
      }

      // Step 1: Speech to Text
      const formData = new FormData();
      formData.append('audio', finalBlob, fileName);
      formData.append('language', selectedLanguage);

      const sttResponse = await fetch('/api/voice-chat/speech-to-text', {
        method: 'POST',
        body: formData
      });

      const sttData = await sttResponse.json();

      if (!sttResponse.ok) {
        throw new Error(sttData.error || 'Failed to transcribe audio');
      }

      const userText = sttData.text;
      if (!userText.trim()) {
        toast({
          title: 'No Speech Detected',
          description: 'Please try speaking more clearly.',
          variant: 'destructive'
        });
        return;
      }

      // Step 2: Add user message
      const userMessage: Message = {
        id: Date.now().toString(),
        role: 'user',
        content: userText,
        language: selectedLanguage
      };
      addMessage(userMessage);

      // Step 3: Get AI response
      const chatResponse = await fetch('/api/voice-chat/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          message: userText,
          transcript,
          language: selectedLanguage,
          previousMessages: messages.filter((m) => m.role !== 'system')
        })
      });

      const chatData = await chatResponse.json();

      if (!chatResponse.ok) {
        throw new Error(chatData.error || 'Failed to get AI response');
      }

      // Step 4: Convert response to speech
      const ttsResponse = await fetch('/api/voice-chat/text-to-speech', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          text: chatData.response,
          language: selectedLanguage
        })
      });

      if (!ttsResponse.ok) {
        throw new Error('Failed to generate speech');
      }

      const audioBuffer = await ttsResponse.arrayBuffer();
      const responseBlobAudio = new Blob([audioBuffer], {
        type: 'audio/wav'
      });
      const audioUrl = URL.createObjectURL(responseBlobAudio);

      // Step 5: Add assistant message with audio
      const assistantMessage: Message = {
        id: (Date.now() + 1).toString(),
        role: 'assistant',
        content: chatData.response,
        audioUrl,
        language: selectedLanguage
      };
      addMessage(assistantMessage);

      // Step 6: Auto-play response
      playAudio(audioUrl);
    } catch (error: any) {
      console.error('Error processing audio:', error);
      toast({
        title: 'Processing Error',
        description: error.message || 'Failed to process audio input',
        variant: 'destructive'
      });
    } finally {
      setIsProcessingAudio(false);
    }
  },
  [
    selectedLanguage,
    transcript,
    messages,
    toast,
    setIsProcessingAudio,
    addMessage,
    audioBufferToWav
  ]
);

Processing Pipeline:

  1. Format Conversion: Converts WebM/OGG to WAV if needed
  2. Speech-to-Text: Sends audio to STT API, gets transcribed text
  3. Validation: Checks if text is non-empty
  4. User Message: Adds user's question to message history
  5. AI Chat: Sends question + transcript context to chat API
  6. Text-to-Speech: Converts AI response to audio
  7. Assistant Message: Adds AI response with audio URL to history
  8. Auto-play: Automatically plays audio response

Recording Control Functions

initializeRecording

const initializeRecording = async () => {
  try {
    const stream = await navigator.mediaDevices.getUserMedia({
      audio: {
        echoCancellation: true,
        noiseSuppression: true,
        sampleRate: 44100
      }
    });

    // Create audio context for level monitoring
    const audioContext = new AudioContext();
    const source = audioContext.createMediaStreamSource(stream);
    const analyser = audioContext.createAnalyser();
    analyser.fftSize = 2048;
    analyser.smoothingTimeConstant = 0.8;
    source.connect(analyser);
    analyserRef.current = analyser;

    // Determine MIME type
    let mimeType = 'audio/wav';
    const supportsWav = MediaRecorder.isTypeSupported('audio/wav');
    if (!supportsWav) {
      mimeType = 'audio/webm;codecs=opus';
    }

    const mediaRecorder = new MediaRecorder(stream, { mimeType });
    mediaRecorderRef.current = mediaRecorder;
    audioChunksRef.current = [];

    mediaRecorder.ondataavailable = (event) => {
      if (event.data.size > 0) {
        audioChunksRef.current.push(event.data);
      }
    };

    mediaRecorder.onstop = async () => {
      const audioBlob = new Blob(audioChunksRef.current, { type: mimeType });
      stream.getTracks().forEach((track) => track.stop());
      audioContext.close();

      if (audioBlob.size === 0) {
        toast({
          title: 'Recording Error',
          description: 'No audio data captured. Please try again.',
          variant: 'destructive'
        });
        setIsProcessingAudio(false);
        return;
      }

      await processAudioInput(audioBlob, mimeType);
    };

    return true;
  } catch (error) {
    console.error('Error accessing microphone:', error);
    toast({
      title: 'Microphone Error',
      description: 'Unable to access microphone. Please check permissions.',
      variant: 'destructive'
    });
    return false;
  }
};

Initialization Steps:

  1. Requests microphone access with audio constraints
  2. Creates AudioContext for level monitoring
  3. Sets up AnalyserNode for waveform analysis
  4. Determines best supported MIME type
  5. Creates MediaRecorder with chosen format
  6. Sets up event handlers for data and stop events
  7. Returns success/failure status

startRecording

const startRecording = async () => {
  if (!hasTranscript || isRecording || isProcessingAudio) return;

  const initialized = await initializeRecording();
  if (!initialized) return;

  setIsRecording(true);
  setAudioLevel(0);
  recordingStartTimeRef.current = Date.now();
  mediaRecorderRef.current?.start(100); // Request data every 100ms
  monitorAudioLevel();

  toast({
    title: 'Recording Started',
    description: 'Speak now... Click the microphone again to stop.',
    variant: 'default'
  });
};

Start Process:

  1. Validates transcript exists and no recording active
  2. Initializes recording (microphone access, setup)
  3. Sets recording state to true
  4. Resets audio level
  5. Records start time for duration validation
  6. Starts MediaRecorder with 100ms timeslice
  7. Begins audio level monitoring
  8. Shows toast notification

stopRecording

const stopRecording = () => {
  if (mediaRecorderRef.current && isRecording) {
    try {
      const recordingDuration = Date.now() - recordingStartTimeRef.current;

      if (recordingDuration < 500) {
        toast({
          title: 'Recording Too Short',
          description: 'Please speak for at least half a second.',
          variant: 'destructive'
        });
        setIsRecording(false);
        setAudioLevel(0);
        if (mediaRecorderRef.current.state === 'recording') {
          mediaRecorderRef.current.stop();
        }
        return;
      }

      if (mediaRecorderRef.current.state === 'recording') {
        mediaRecorderRef.current.stop();
      }
      setIsRecording(false);
      setAudioLevel(0);

      if (animationFrameRef.current) {
        cancelAnimationFrame(animationFrameRef.current);
        animationFrameRef.current = null;
      }
    } catch (error) {
      console.error('Error stopping recording:', error);
      setIsRecording(false);
      setAudioLevel(0);
    }
  }
};

Stop Process:

  1. Calculates recording duration
  2. Validates minimum duration (500ms)
  3. Stops MediaRecorder
  4. Resets recording state and audio level
  5. Cancels audio level monitoring animation
  6. Triggers onstop event which processes audio

toggleRecording

const toggleRecording = () => {
  if (isRecording) {
    stopRecording();
  } else {
    startRecording();
  }
};

Convenience function for single-button record/stop control.

forceStop

const forceStop = () => {
  if (mediaRecorderRef.current) {
    try {
      mediaRecorderRef.current.ondataavailable = null;
      mediaRecorderRef.current.onstop = null;

      if (mediaRecorderRef.current.state === 'recording') {
        mediaRecorderRef.current.stop();
      }

      const stream = mediaRecorderRef.current.stream;
      stream?.getTracks().forEach((track) => track.stop());
      mediaRecorderRef.current = null;
    } catch (error) {
      console.error('Error force stopping:', error);
    }
  }

  if (animationFrameRef.current) {
    cancelAnimationFrame(animationFrameRef.current);
    animationFrameRef.current = null;
  }

  if (audioRef.current) {
    audioRef.current.pause();
    audioRef.current = null;
  }

  audioChunksRef.current = [];
  setIsRecording(false);
  setIsProcessingAudio(false);
  setIsPlayingAudio(false);
  setAudioLevel(0);

  toast({
    title: 'Stopped',
    description: 'Voice chat has been stopped.',
    variant: 'default'
  });
};

Force Stop Actions:

  1. Removes MediaRecorder event listeners (prevents processing)
  2. Stops MediaRecorder
  3. Stops all media stream tracks
  4. Cancels animation frame
  5. Stops audio playback
  6. Clears audio chunks
  7. Resets all states
  8. Shows notification

This is used for immediate cancellation without processing recorded audio.

Audio Playback Function

playAudio

const playAudio = (audioUrl: string) => {
  if (audioRef.current) {
    audioRef.current.pause();
  }

  const audio = new Audio(audioUrl);
  audioRef.current = audio;

  audio.onplay = () => setIsPlayingAudio(true);
  audio.onended = () => setIsPlayingAudio(false);
  audio.onerror = () => {
    setIsPlayingAudio(false);
    toast({
      title: 'Audio Error',
      description: 'Failed to play audio response',
      variant: 'destructive'
    });
  };

  audio.play().catch(console.error);
};

Playback Flow:

  1. Pauses any currently playing audio
  2. Creates new Audio element with URL
  3. Sets up event listeners for play, end, and error
  4. Updates isPlayingAudio state based on events
  5. Starts playback
  6. Shows error toast if playback fails

Cleanup Function

cleanup

const cleanup = () => {
  if (mediaRecorderRef.current?.state === 'recording') {
    mediaRecorderRef.current.stop();
  }
  if (animationFrameRef.current) {
    cancelAnimationFrame(animationFrameRef.current);
  }
  if (audioRef.current) {
    audioRef.current.pause();
  }
};

Called on component unmount to clean up resources.

Hook Return Value

return {
  toggleRecording,
  forceStop,
  playAudio,
  cleanup,
  audioRef
};

Usage Example

function VoiceChat() {
  const { toggleRecording, forceStop, playAudio, cleanup, audioRef } = useAudioRecording();
  const { isRecording, isProcessingAudio } = useVoiceChatStore();

  useEffect(() => {
    return cleanup; // Cleanup on unmount
  }, [cleanup]);

  return (
    <div>
      <button 
        onClick={toggleRecording}
        disabled={isProcessingAudio}
      >
        {isRecording ? 'Stop Recording' : 'Start Recording'}
      </button>
      
      {(isRecording || isProcessingAudio) && (
        <button onClick={forceStop}>Force Stop</button>
      )}
      
      <audio ref={audioRef} style={{ display: 'none' }} />
    </div>
  );
}