Voice Chat Component
Main UI component for voice chat interface
Voice Chat Component
The Voice Chat component provides the complete UI for voice-enabled conversations with YouTube videos. It handles both the initial setup phase and the active chat interface.
Component Structure
export default function VoiceChatViewPage() {
// State
const [videoUrl, setVideoUrl] = useState('');
const [transcript, setTranscript] = useState('');
const [messages, setMessages] = useState<Message[]>([]);
const [loading, setLoading] = useState(false);
const [isRecording, setIsRecording] = useState(false);
const [isProcessingAudio, setIsProcessingAudio] = useState(false);
const [isPlayingAudio, setIsPlayingAudio] = useState(false);
const [hasTranscript, setHasTranscript] = useState(false);
const [selectedLanguage, setSelectedLanguage] = useState('en-US');
const [audioLevel, setAudioLevel] = useState(0);
// Refs
const scrollAreaRef = useRef<HTMLDivElement>(null);
const mediaRecorderRef = useRef<MediaRecorder | null>(null);
const audioChunksRef = useRef<Blob[]>([]);
const audioRef = useRef<HTMLAudioElement | null>(null);
const analyserRef = useRef<AnalyserNode | null>(null);
const animationFrameRef = useRef<number | null>(null);
// ... handler functions and effects
}Two-Phase UI
The component renders different UIs based on whether a transcript is loaded:
Phase 1: Setup View (No Transcript)
Shown when hasTranscript === false.
if (!hasTranscript) {
return (
<PageContainer scrollable>
<div className='flex min-h-[calc(100vh-8rem)] w-full flex-col'>
<Heading
title='Talk with Video'
description='Voice-enabled conversations with YouTube videos in multiple languages'
/>
<div className='mt-8 flex flex-1 flex-col items-center justify-center'>
<motion.div
initial={{ opacity: 0, y: 20 }}
animate={{ opacity: 1, y: 0 }}
className='w-full max-w-2xl space-y-6'
>
<FeatureCard type='voice-chat' />
<Card>
<CardContent className='p-4'>
<form onSubmit={handleVideoSubmit} className='space-y-4'>
<div className='flex flex-col gap-3 sm:flex-row'>
<Input
type='url'
placeholder='Enter YouTube URL...'
value={videoUrl}
onChange={(e) => setVideoUrl(e.target.value)}
className='flex-1'
disabled={loading}
/>
<Select
value={selectedLanguage}
onValueChange={setSelectedLanguage}
>
<SelectTrigger className='w-full sm:w-48'>
<SelectValue />
</SelectTrigger>
<SelectContent>
<SelectGroup>
{LANGUAGES.map((lang) => (
<SelectItem key={lang.code} value={lang.code}>
{lang.name}
</SelectItem>
))}
</SelectGroup>
</SelectContent>
</Select>
<FancyButton
onClick={(e) => {
e.preventDefault();
handleVideoSubmit(e);
}}
loading={loading}
label='Activate Agent'
/>
</div>
</form>
</CardContent>
</Card>
</motion.div>
</div>
</div>
</PageContainer>
);
}Setup View Components:
- Feature Card: Shows feature description and benefits
- URL Input: Text field for YouTube video URL
- Language Selector: Dropdown with 11 supported languages
- Activate Button: Triggers transcript fetch and agent activation
How Setup Works:
- User enters YouTube URL
- User selects preferred language
- User clicks "Activate Agent"
handleVideoSubmitis called- Validates URL using
validateYoutubeVideoUrl() - Fetches transcript via
/api/transcribe - Updates state:
hasTranscript = true - Adds system welcome message
- UI switches to Chat View
Phase 2: Chat View (With Transcript)
Shown when hasTranscript === true.
return (
<PageContainer scrollable>
<div className='w-full space-y-4'>
<div className='flex items-start justify-between'>
<Heading
title='Voice Chat'
description={`Speak and chat in ${LANGUAGES.find((l) => l.code === selectedLanguage)?.name}`}
/>
</div>
<Card>
<CardContent className='flex h-[calc(100vh-16rem)] flex-col p-4 md:p-6'>
{/* Messages Area */}
<div className='flex-1 overflow-hidden'>
<ScrollArea className='h-full pr-4' ref={scrollAreaRef}>
<div className='space-y-4 pb-4'>
<AnimatePresence>
{messages.map((message) => (
// Message bubbles
))}
</AnimatePresence>
{isProcessingAudio && (
// Processing indicator
)}
</div>
</ScrollArea>
</div>
{/* Voice Controls */}
<div className='mt-4 space-y-4 border-t pt-4'>
{/* Microphone button and controls */}
</div>
</CardContent>
</Card>
</div>
<audio ref={audioRef} style={{ display: 'none' }} />
</PageContainer>
);Chat View Sections:
- Header: Shows "Voice Chat" title with selected language
- Messages Area: Scrollable conversation history
- Voice Controls: Microphone button and status indicators
- Hidden Audio Element: For playing responses
Message Rendering
Each message is rendered based on its role (user/assistant/system):
{messages.map((message) => (
<motion.div
key={message.id}
initial={{ opacity: 0, y: 10 }}
animate={{ opacity: 1, y: 0 }}
exit={{ opacity: 0, y: -10 }}
className={`flex ${message.role === 'user' ? 'justify-end' : 'justify-start'}`}
>
<div
className={`flex max-w-[80%] items-start gap-3 ${
message.role === 'user' ? 'flex-row-reverse' : ''
}`}
>
{/* Avatar Icon */}
<div
className={`flex h-8 w-8 flex-shrink-0 items-center justify-center rounded-full ${
message.role === 'user'
? 'bg-primary'
: message.role === 'system'
? 'bg-muted'
: 'bg-primary/80'
}`}
>
{message.role === 'user' ? (
<User className='text-primary-foreground h-4 w-4' />
) : message.role === 'system' ? (
<MessageSquare className='h-4 w-4' />
) : (
<Bot className='text-primary-foreground h-4 w-4' />
)}
</div>
{/* Message Content */}
<div
className={`rounded-lg px-4 py-2 ${
message.role === 'user'
? 'bg-primary text-primary-foreground'
: message.role === 'system'
? 'bg-muted text-muted-foreground'
: 'bg-secondary'
}`}
>
<Markdown className='prose prose-sm dark:prose-invert max-w-none'>
{message.content}
</Markdown>
{/* Play Audio Button (for assistant messages) */}
{message.audioUrl && (
<div className='mt-2 flex items-center gap-2'>
<Button
size='sm'
variant='outline'
onClick={() => playAudio(message.audioUrl!)}
className='h-8'
>
<Volume2 className='mr-2 h-3 w-3' />
Play Audio
</Button>
</div>
)}
</div>
</div>
</motion.div>
))}Message Styling:
User Messages:
- Aligned right
- Primary color background
- User icon avatar
- No audio playback
Assistant Messages:
- Aligned left
- Secondary color background
- Bot icon avatar
- Play Audio button (if audioUrl exists)
System Messages:
- Aligned left
- Muted background
- MessageSquare icon
- Welcome/status messages
Animation:
- Fade in from bottom on mount
- Fade out upward on unmount
- Smooth transitions with Framer Motion
Voice Controls Section
The bottom control panel manages recording and playback:
<div className='mt-4 space-y-4 border-t pt-4'>
<div className='flex items-center justify-center gap-4'>
{/* Microphone Button */}
<div className='relative'>
<Button
onClick={toggleRecording}
disabled={isProcessingAudio}
size='lg'
className={`relative h-16 w-16 rounded-full transition-all ${
isRecording
? 'bg-destructive hover:bg-destructive/90 animate-pulse'
: 'bg-primary hover:bg-primary/90'
}`}
>
{isProcessingAudio ? (
<Loader2 className='h-6 w-6 animate-spin' />
) : isRecording ? (
<MicOff className='h-6 w-6' />
) : (
<Mic className='h-6 w-6' />
)}
</Button>
{/* Pulse Animation */}
{isRecording && (
<div className='border-destructive absolute -inset-2 animate-ping rounded-full border-2 opacity-75' />
)}
{/* Audio Level Indicator */}
{isRecording && audioLevel > 0 && (
<div
className='border-primary absolute -inset-1 rounded-full border-2 transition-all duration-100'
style={{
transform: `scale(${1 + (audioLevel / 255) * 0.5})`,
opacity: 0.7 + (audioLevel / 255) * 0.3
}}
/>
)}
</div>
{/* Force Stop Button */}
{(isRecording || isProcessingAudio) && (
<Button
onClick={forceStop}
size='lg'
variant='destructive'
className='h-12 px-6'
>
Stop
</Button>
)}
</div>
{/* Status Text */}
<div className='text-center'>
<p className='text-muted-foreground text-sm'>
{isRecording
? 'Speaking... Click microphone to stop'
: isProcessingAudio
? 'Processing your speech...'
: 'Click microphone to start speaking'}
</p>
{isRecording && (
<p className='text-muted-foreground mt-1 text-xs'>
Audio level: {Math.round(audioLevel)}
</p>
)}
</div>
{/* Load New Video */}
<div className='text-muted-foreground flex items-center justify-between text-xs'>
<span />
<Button variant='ghost' size='sm' onClick={resetChat}>
Load new video
</Button>
</div>
</div>Microphone Button States:
- Idle: Blue, shows Mic icon, enabled
- Recording: Red, shows MicOff icon, pulsing animation, audio level ring
- Processing: Blue, shows Loader spinner, disabled
Audio Level Visualization:
- Scales border ring based on audio level (0-255)
- Opacity increases with louder input
- Updates in real-time during recording
- Provides visual feedback that microphone is working
Force Stop Button:
- Only visible when recording or processing
- Immediately cancels current operation
- Stops recording without processing audio
- Useful if user wants to abort
Handler Functions
handleVideoSubmit
const handleVideoSubmit = async (e: React.FormEvent) => {
e.preventDefault();
if (!videoUrl.trim()) {
toast.error('Please enter a YouTube URL');
return;
}
// Validate YouTube URL
const validation = validateYoutubeVideoUrl(videoUrl);
if (!validation.isValid) {
toast.error(validation.error || 'Please enter a valid YouTube video URL');
return;
}
setLoading(true);
setMessages([]);
setTranscript('');
setHasTranscript(false);
try {
const response = await fetch('/api/transcribe', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ videoUrl })
});
const data = await response.json();
if (!response.ok) {
throw new Error(data.error || 'Failed to fetch transcript');
}
if (!data?.transcript?.fullTranscript) {
throw new Error('No transcript available for this video');
}
setTranscript(data.transcript.fullTranscript);
setHasTranscript(true);
setMessages([
{
id: Date.now().toString(),
role: 'system',
content:
"Voice agent is ready! You can now speak and I'll respond with voice in your selected language."
}
]);
toast.success('Voice Agent Ready. Click the microphone to start speaking!');
} catch (error: any) {
console.error('Error fetching transcript:', error);
toast.error(error.message || 'Failed to fetch transcript');
} finally {
setLoading(false);
}
};Submit Flow:
- Validates URL is non-empty
- Validates YouTube URL format and video ID
- Resets previous state (messages, transcript)
- Fetches transcript from API
- Updates state with transcript
- Adds welcome system message
- Shows success notification
toggleRecording
const toggleRecording = () => {
if (isRecording) {
stopRecording();
} else {
startRecording();
}
};Single function for microphone button click.
startRecording
const startRecording = async () => {
if (!hasTranscript || isRecording || isProcessingAudio) return;
const initialized = await initializeRecording();
if (!initialized) return;
setIsRecording(true);
setAudioLevel(0);
mediaRecorderRef.current?.start();
monitorAudioLevel();
toast.info('Recording... Click the microphone again to stop.');
};Start Actions:
- Guard: Check transcript exists and not already recording
- Initialize MediaRecorder and audio context
- Set recording state to true
- Reset audio level
- Start MediaRecorder
- Begin audio level monitoring
- Show recording notification
stopRecording
const stopRecording = () => {
if (mediaRecorderRef.current && isRecording) {
try {
if (mediaRecorderRef.current.state === 'recording') {
mediaRecorderRef.current.stop();
}
setIsRecording(false);
setAudioLevel(0);
if (animationFrameRef.current) {
cancelAnimationFrame(animationFrameRef.current);
animationFrameRef.current = null;
}
} catch (error) {
console.error('Error stopping recording:', error);
setIsRecording(false);
setAudioLevel(0);
}
}
};Stop Actions:
- Stop MediaRecorder (triggers
onstopevent) - Reset recording state
- Reset audio level
- Cancel audio monitoring animation
- Triggers audio processing pipeline
resetChat
const resetChat = () => {
setHasTranscript(false);
setMessages([]);
setTranscript('');
setVideoUrl('');
};Resets entire component back to setup view for new video.
Effects
Auto-scroll Messages
const scrollToBottom = useCallback(() => {
if (scrollAreaRef.current) {
const scrollContainer = scrollAreaRef.current.querySelector(
'[data-radix-scroll-area-viewport]'
);
if (scrollContainer) {
scrollContainer.scrollTop = scrollContainer.scrollHeight;
}
}
}, []);
useEffect(() => {
scrollToBottom();
}, [messages, scrollToBottom]);Automatically scrolls to bottom when new messages are added.
Cleanup Effect
useEffect(() => {
return () => {
if (
mediaRecorderRef.current &&
mediaRecorderRef.current.state === 'recording'
) {
mediaRecorderRef.current.stop();
}
if (animationFrameRef.current) {
cancelAnimationFrame(animationFrameRef.current);
}
if (audioRef.current) {
audioRef.current.pause();
}
};
}, []);Cleans up resources on component unmount:
- Stops active recording
- Cancels animation frames
- Pauses audio playback
Language Configuration
const LANGUAGES = [
{ code: 'en-US', name: 'English' },
{ code: 'hi-IN', name: 'Hindi' },
{ code: 'bn-IN', name: 'Bengali' },
{ code: 'ta-IN', name: 'Tamil' },
{ code: 'te-IN', name: 'Telugu' },
{ code: 'mr-IN', name: 'Marathi' },
{ code: 'gu-IN', name: 'Gujarati' },
{ code: 'kn-IN', name: 'Kannada' },
{ code: 'ml-IN', name: 'Malayalam' },
{ code: 'od-IN', name: 'Odia' },
{ code: 'pa-IN', name: 'Punjabi' }
];Language selector uses this array for dropdown options.
Responsive Design
Mobile (< 640px):
- Vertical layout for input and selectors
- Full-width language selector
- Smaller message max-width (80%)
- Touch-optimized button sizes
Desktop (≥ 640px):
- Horizontal layout for controls
- Fixed-width language selector (192px)
- Larger message area
- Mouse-optimized interactions
Accessibility
- Proper ARIA labels on buttons
- Keyboard navigation support
- Screen reader friendly status messages
- Color contrast meets WCAG AA standards
- Focus indicators on interactive elements