Voice Chat Hooks
Custom hooks for voice chat functionality
Voice Chat Hooks
The Voice Chat feature uses two main custom hooks: one for fetching video transcripts and another for handling audio recording and processing.
useFetchVideoTranscript Hook
This hook handles fetching YouTube video transcripts to provide context for voice conversations.
Implementation
import { useCallback } from 'react';
import { useToast } from '@/hooks/use-toast';
import { useVoiceChatStore } from '../store/voice-chat-store';
export const useFetchVideoTranscript = () => {
const { toast } = useToast();
const {
videoUrl,
setIsLoading,
setTranscript,
setHasTranscript,
setMessages
} = useVoiceChatStore();
const fetchTranscript = useCallback(async () => {
if (!videoUrl.trim()) {
toast({
title: 'Error',
description: 'Please enter a YouTube URL',
variant: 'destructive'
});
return false;
}
setIsLoading(true);
setMessages([]);
setTranscript('');
setHasTranscript(false);
try {
const response = await fetch('/api/transcribe', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ videoUrl })
});
const data = await response.json();
if (!response.ok) {
throw new Error(data.error || 'Failed to fetch transcript');
}
if (!data?.transcript?.fullTranscript) {
throw new Error('No transcript available for this video');
}
setTranscript(data.transcript.fullTranscript);
setHasTranscript(true);
// Add initial system message
setMessages([
{
id: Date.now().toString(),
role: 'system',
content:
"Voice agent is ready! You can now speak and I'll respond with voice in your selected language."
}
]);
toast({
title: 'Success',
description:
'Voice Agent Ready. Click the microphone to start speaking!',
variant: 'default'
});
return true;
} catch (error: any) {
console.error('Error fetching transcript:', error);
toast({
title: 'Error',
description: error.message || 'Failed to fetch transcript',
variant: 'destructive'
});
return false;
} finally {
setIsLoading(false);
}
}, [
videoUrl,
toast,
setIsLoading,
setTranscript,
setHasTranscript,
setMessages
]);
return { fetchTranscript };
};How It Works
Step 1: Validation
- Checks if
videoUrlis not empty - Shows error toast if URL missing
- Returns
falseif validation fails
Step 2: Reset State
- Sets
isLoadingtotrue(shows loading UI) - Clears previous messages array
- Clears previous transcript
- Sets
hasTranscripttofalse
Step 3: Fetch Transcript
- POSTs to
/api/transcribeendpoint - Sends video URL in request body
- Waits for response with transcript data
Step 4: Process Response
- Checks if response is successful
- Validates transcript exists in response
- Throws error if transcript missing
Step 5: Update State
- Stores transcript text in store
- Sets
hasTranscripttotrue - Creates initial system message welcoming user
- Shows success toast notification
Step 6: Cleanup
- Sets
isLoadingtofalsein finally block - Returns
trueon success,falseon failure
Usage
function VoiceChatSetup() {
const { fetchTranscript } = useFetchVideoTranscript();
const handleSubmit = async () => {
const success = await fetchTranscript();
if (success) {
// Transcript loaded, enable voice chat
}
};
return (
<button onClick={handleSubmit}>
Activate Voice Agent
</button>
);
}useAudioRecording Hook
This hook manages the entire audio recording and processing pipeline, from capturing microphone input to playing AI voice responses.
Hook Structure
export const useAudioRecording = () => {
const { toast } = useToast();
const {
hasTranscript,
isRecording,
isProcessingAudio,
transcript,
messages,
selectedLanguage,
setIsRecording,
setIsProcessingAudio,
setAudioLevel,
addMessage,
setMessages,
setIsPlayingAudio
} = useVoiceChatStore();
const mediaRecorderRef = useRef<MediaRecorder | null>(null);
const audioChunksRef = useRef<Blob[]>([]);
const audioRef = useRef<HTMLAudioElement | null>(null);
const analyserRef = useRef<AnalyserNode | null>(null);
const animationFrameRef = useRef<number | null>(null);
const recordingStartTimeRef = useRef<number>(0);
// ... methods
}Refs Used
mediaRecorderRef: Stores MediaRecorder instance for recording audio
audioChunksRef: Accumulates audio data chunks during recording
audioRef: Stores Audio element for playing AI responses
analyserRef: AnalyserNode for real-time audio level monitoring
animationFrameRef: Stores animation frame ID for cancellation
recordingStartTimeRef: Records start time to validate minimum duration
Audio Processing Functions
monitorAudioLevel
const monitorAudioLevel = useCallback(() => {
if (!analyserRef.current) return;
const bufferLength = analyserRef.current.frequencyBinCount;
const dataArray = new Uint8Array(bufferLength);
analyserRef.current.getByteTimeDomainData(dataArray);
let sum = 0;
for (let i = 0; i < bufferLength; i++) {
const sample = (dataArray[i] - 128) / 128;
sum += sample * sample;
}
const rms = Math.sqrt(sum / bufferLength);
const audioLevel = rms * 255;
setAudioLevel(audioLevel);
if (isRecording) {
animationFrameRef.current = requestAnimationFrame(monitorAudioLevel);
}
}, [isRecording, setAudioLevel]);How It Works:
- Gets frequency data from AnalyserNode
- Converts time-domain data to RMS (root mean square)
- Calculates audio level from 0-255
- Updates store with current level
- Recursively calls itself via
requestAnimationFramewhile recording - Drives visual feedback animation in UI
audioBufferToWav
const audioBufferToWav = async (audioBuffer: AudioBuffer): Promise<Blob> => {
const numberOfChannels = audioBuffer.numberOfChannels;
const sampleRate = audioBuffer.sampleRate;
const format = 1; // PCM
const bitDepth = 16;
const bytesPerSample = bitDepth / 8;
const blockAlign = numberOfChannels * bytesPerSample;
const byteRate = sampleRate * blockAlign;
const dataSize = audioBuffer.length * blockAlign;
const bufferSize = 44 + dataSize; // 44 bytes for WAV header
const arrayBuffer = new ArrayBuffer(bufferSize);
const view = new DataView(arrayBuffer);
const writeString = (offset: number, string: string) => {
for (let i = 0; i < string.length; i++) {
view.setUint8(offset + i, string.charCodeAt(i));
}
};
// Write WAV header
writeString(0, 'RIFF');
view.setUint32(4, bufferSize - 8, true);
writeString(8, 'WAVE');
writeString(12, 'fmt ');
view.setUint32(16, 16, true);
view.setUint16(20, format, true);
view.setUint16(22, numberOfChannels, true);
view.setUint32(24, sampleRate, true);
view.setUint32(28, byteRate, true);
view.setUint16(32, blockAlign, true);
view.setUint16(34, bitDepth, true);
writeString(36, 'data');
view.setUint32(40, dataSize, true);
// Write audio samples
const channelData = audioBuffer.getChannelData(0);
let offset = 44;
for (let i = 0; i < channelData.length; i++) {
const sample = Math.max(-1, Math.min(1, channelData[i]));
view.setInt16(offset, sample * 0x7fff, true);
offset += 2;
}
return new Blob([arrayBuffer], { type: 'audio/wav' });
};How It Works:
- Creates a new ArrayBuffer for WAV file
- Writes WAV header with correct format specifications
- Converts floating-point audio samples to 16-bit PCM
- Returns Blob suitable for API upload
- Handles WebM/OGG to WAV conversion
processAudioInput
const processAudioInput = useCallback(
async (audioBlob: Blob, mimeType: string) => {
setIsProcessingAudio(true);
try {
let finalBlob = audioBlob;
let fileName = 'recording.wav';
// Convert to WAV if needed
if (
mimeType.includes('webm') ||
mimeType.includes('ogg') ||
mimeType.includes('mp4')
) {
console.log('Converting audio to WAV format...');
const arrayBuffer = await audioBlob.arrayBuffer();
const audioContext = new AudioContext();
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
finalBlob = await audioBufferToWav(audioBuffer);
audioContext.close();
}
// Step 1: Speech to Text
const formData = new FormData();
formData.append('audio', finalBlob, fileName);
formData.append('language', selectedLanguage);
const sttResponse = await fetch('/api/voice-chat/speech-to-text', {
method: 'POST',
body: formData
});
const sttData = await sttResponse.json();
if (!sttResponse.ok) {
throw new Error(sttData.error || 'Failed to transcribe audio');
}
const userText = sttData.text;
if (!userText.trim()) {
toast({
title: 'No Speech Detected',
description: 'Please try speaking more clearly.',
variant: 'destructive'
});
return;
}
// Step 2: Add user message
const userMessage: Message = {
id: Date.now().toString(),
role: 'user',
content: userText,
language: selectedLanguage
};
addMessage(userMessage);
// Step 3: Get AI response
const chatResponse = await fetch('/api/voice-chat/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: userText,
transcript,
language: selectedLanguage,
previousMessages: messages.filter((m) => m.role !== 'system')
})
});
const chatData = await chatResponse.json();
if (!chatResponse.ok) {
throw new Error(chatData.error || 'Failed to get AI response');
}
// Step 4: Convert response to speech
const ttsResponse = await fetch('/api/voice-chat/text-to-speech', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
text: chatData.response,
language: selectedLanguage
})
});
if (!ttsResponse.ok) {
throw new Error('Failed to generate speech');
}
const audioBuffer = await ttsResponse.arrayBuffer();
const responseBlobAudio = new Blob([audioBuffer], {
type: 'audio/wav'
});
const audioUrl = URL.createObjectURL(responseBlobAudio);
// Step 5: Add assistant message with audio
const assistantMessage: Message = {
id: (Date.now() + 1).toString(),
role: 'assistant',
content: chatData.response,
audioUrl,
language: selectedLanguage
};
addMessage(assistantMessage);
// Step 6: Auto-play response
playAudio(audioUrl);
} catch (error: any) {
console.error('Error processing audio:', error);
toast({
title: 'Processing Error',
description: error.message || 'Failed to process audio input',
variant: 'destructive'
});
} finally {
setIsProcessingAudio(false);
}
},
[
selectedLanguage,
transcript,
messages,
toast,
setIsProcessingAudio,
addMessage,
audioBufferToWav
]
);Processing Pipeline:
- Format Conversion: Converts WebM/OGG to WAV if needed
- Speech-to-Text: Sends audio to STT API, gets transcribed text
- Validation: Checks if text is non-empty
- User Message: Adds user's question to message history
- AI Chat: Sends question + transcript context to chat API
- Text-to-Speech: Converts AI response to audio
- Assistant Message: Adds AI response with audio URL to history
- Auto-play: Automatically plays audio response
Recording Control Functions
initializeRecording
const initializeRecording = async () => {
try {
const stream = await navigator.mediaDevices.getUserMedia({
audio: {
echoCancellation: true,
noiseSuppression: true,
sampleRate: 44100
}
});
// Create audio context for level monitoring
const audioContext = new AudioContext();
const source = audioContext.createMediaStreamSource(stream);
const analyser = audioContext.createAnalyser();
analyser.fftSize = 2048;
analyser.smoothingTimeConstant = 0.8;
source.connect(analyser);
analyserRef.current = analyser;
// Determine MIME type
let mimeType = 'audio/wav';
const supportsWav = MediaRecorder.isTypeSupported('audio/wav');
if (!supportsWav) {
mimeType = 'audio/webm;codecs=opus';
}
const mediaRecorder = new MediaRecorder(stream, { mimeType });
mediaRecorderRef.current = mediaRecorder;
audioChunksRef.current = [];
mediaRecorder.ondataavailable = (event) => {
if (event.data.size > 0) {
audioChunksRef.current.push(event.data);
}
};
mediaRecorder.onstop = async () => {
const audioBlob = new Blob(audioChunksRef.current, { type: mimeType });
stream.getTracks().forEach((track) => track.stop());
audioContext.close();
if (audioBlob.size === 0) {
toast({
title: 'Recording Error',
description: 'No audio data captured. Please try again.',
variant: 'destructive'
});
setIsProcessingAudio(false);
return;
}
await processAudioInput(audioBlob, mimeType);
};
return true;
} catch (error) {
console.error('Error accessing microphone:', error);
toast({
title: 'Microphone Error',
description: 'Unable to access microphone. Please check permissions.',
variant: 'destructive'
});
return false;
}
};Initialization Steps:
- Requests microphone access with audio constraints
- Creates AudioContext for level monitoring
- Sets up AnalyserNode for waveform analysis
- Determines best supported MIME type
- Creates MediaRecorder with chosen format
- Sets up event handlers for data and stop events
- Returns success/failure status
startRecording
const startRecording = async () => {
if (!hasTranscript || isRecording || isProcessingAudio) return;
const initialized = await initializeRecording();
if (!initialized) return;
setIsRecording(true);
setAudioLevel(0);
recordingStartTimeRef.current = Date.now();
mediaRecorderRef.current?.start(100); // Request data every 100ms
monitorAudioLevel();
toast({
title: 'Recording Started',
description: 'Speak now... Click the microphone again to stop.',
variant: 'default'
});
};Start Process:
- Validates transcript exists and no recording active
- Initializes recording (microphone access, setup)
- Sets recording state to true
- Resets audio level
- Records start time for duration validation
- Starts MediaRecorder with 100ms timeslice
- Begins audio level monitoring
- Shows toast notification
stopRecording
const stopRecording = () => {
if (mediaRecorderRef.current && isRecording) {
try {
const recordingDuration = Date.now() - recordingStartTimeRef.current;
if (recordingDuration < 500) {
toast({
title: 'Recording Too Short',
description: 'Please speak for at least half a second.',
variant: 'destructive'
});
setIsRecording(false);
setAudioLevel(0);
if (mediaRecorderRef.current.state === 'recording') {
mediaRecorderRef.current.stop();
}
return;
}
if (mediaRecorderRef.current.state === 'recording') {
mediaRecorderRef.current.stop();
}
setIsRecording(false);
setAudioLevel(0);
if (animationFrameRef.current) {
cancelAnimationFrame(animationFrameRef.current);
animationFrameRef.current = null;
}
} catch (error) {
console.error('Error stopping recording:', error);
setIsRecording(false);
setAudioLevel(0);
}
}
};Stop Process:
- Calculates recording duration
- Validates minimum duration (500ms)
- Stops MediaRecorder
- Resets recording state and audio level
- Cancels audio level monitoring animation
- Triggers
onstopevent which processes audio
toggleRecording
const toggleRecording = () => {
if (isRecording) {
stopRecording();
} else {
startRecording();
}
};Convenience function for single-button record/stop control.
forceStop
const forceStop = () => {
if (mediaRecorderRef.current) {
try {
mediaRecorderRef.current.ondataavailable = null;
mediaRecorderRef.current.onstop = null;
if (mediaRecorderRef.current.state === 'recording') {
mediaRecorderRef.current.stop();
}
const stream = mediaRecorderRef.current.stream;
stream?.getTracks().forEach((track) => track.stop());
mediaRecorderRef.current = null;
} catch (error) {
console.error('Error force stopping:', error);
}
}
if (animationFrameRef.current) {
cancelAnimationFrame(animationFrameRef.current);
animationFrameRef.current = null;
}
if (audioRef.current) {
audioRef.current.pause();
audioRef.current = null;
}
audioChunksRef.current = [];
setIsRecording(false);
setIsProcessingAudio(false);
setIsPlayingAudio(false);
setAudioLevel(0);
toast({
title: 'Stopped',
description: 'Voice chat has been stopped.',
variant: 'default'
});
};Force Stop Actions:
- Removes MediaRecorder event listeners (prevents processing)
- Stops MediaRecorder
- Stops all media stream tracks
- Cancels animation frame
- Stops audio playback
- Clears audio chunks
- Resets all states
- Shows notification
This is used for immediate cancellation without processing recorded audio.
Audio Playback Function
playAudio
const playAudio = (audioUrl: string) => {
if (audioRef.current) {
audioRef.current.pause();
}
const audio = new Audio(audioUrl);
audioRef.current = audio;
audio.onplay = () => setIsPlayingAudio(true);
audio.onended = () => setIsPlayingAudio(false);
audio.onerror = () => {
setIsPlayingAudio(false);
toast({
title: 'Audio Error',
description: 'Failed to play audio response',
variant: 'destructive'
});
};
audio.play().catch(console.error);
};Playback Flow:
- Pauses any currently playing audio
- Creates new Audio element with URL
- Sets up event listeners for play, end, and error
- Updates
isPlayingAudiostate based on events - Starts playback
- Shows error toast if playback fails
Cleanup Function
cleanup
const cleanup = () => {
if (mediaRecorderRef.current?.state === 'recording') {
mediaRecorderRef.current.stop();
}
if (animationFrameRef.current) {
cancelAnimationFrame(animationFrameRef.current);
}
if (audioRef.current) {
audioRef.current.pause();
}
};Called on component unmount to clean up resources.
Hook Return Value
return {
toggleRecording,
forceStop,
playAudio,
cleanup,
audioRef
};Usage Example
function VoiceChat() {
const { toggleRecording, forceStop, playAudio, cleanup, audioRef } = useAudioRecording();
const { isRecording, isProcessingAudio } = useVoiceChatStore();
useEffect(() => {
return cleanup; // Cleanup on unmount
}, [cleanup]);
return (
<div>
<button
onClick={toggleRecording}
disabled={isProcessingAudio}
>
{isRecording ? 'Stop Recording' : 'Start Recording'}
</button>
{(isRecording || isProcessingAudio) && (
<button onClick={forceStop}>Force Stop</button>
)}
<audio ref={audioRef} style={{ display: 'none' }} />
</div>
);
}