Voice Chat Component

The Voice Chat component provides the complete UI for voice-enabled conversations with YouTube videos. It handles both the initial setup phase and the active chat interface.

Component Structure

export default function VoiceChatViewPage() {
  // State
  const [videoUrl, setVideoUrl] = useState('');
  const [transcript, setTranscript] = useState('');
  const [messages, setMessages] = useState<Message[]>([]);
  const [loading, setLoading] = useState(false);
  const [isRecording, setIsRecording] = useState(false);
  const [isProcessingAudio, setIsProcessingAudio] = useState(false);
  const [isPlayingAudio, setIsPlayingAudio] = useState(false);
  const [hasTranscript, setHasTranscript] = useState(false);
  const [selectedLanguage, setSelectedLanguage] = useState('en-US');
  const [audioLevel, setAudioLevel] = useState(0);

  // Refs
  const scrollAreaRef = useRef<HTMLDivElement>(null);
  const mediaRecorderRef = useRef<MediaRecorder | null>(null);
  const audioChunksRef = useRef<Blob[]>([]);
  const audioRef = useRef<HTMLAudioElement | null>(null);
  const analyserRef = useRef<AnalyserNode | null>(null);
  const animationFrameRef = useRef<number | null>(null);

  // ... handler functions and effects
}

Two-Phase UI

The component renders different UIs based on whether a transcript is loaded:

Phase 1: Setup View (No Transcript)

Shown when hasTranscript === false.

if (!hasTranscript) {
  return (
    <PageContainer scrollable>
      <div className='flex min-h-[calc(100vh-8rem)] w-full flex-col'>
        <Heading
          title='Talk with Video'
          description='Voice-enabled conversations with YouTube videos in multiple languages'
        />

        <div className='mt-8 flex flex-1 flex-col items-center justify-center'>
          <motion.div
            initial={{ opacity: 0, y: 20 }}
            animate={{ opacity: 1, y: 0 }}
            className='w-full max-w-2xl space-y-6'
          >
            <FeatureCard type='voice-chat' />

            <Card>
              <CardContent className='p-4'>
                <form onSubmit={handleVideoSubmit} className='space-y-4'>
                  <div className='flex flex-col gap-3 sm:flex-row'>
                    <Input
                      type='url'
                      placeholder='Enter YouTube URL...'
                      value={videoUrl}
                      onChange={(e) => setVideoUrl(e.target.value)}
                      className='flex-1'
                      disabled={loading}
                    />
                    <Select
                      value={selectedLanguage}
                      onValueChange={setSelectedLanguage}
                    >
                      <SelectTrigger className='w-full sm:w-48'>
                        <SelectValue />
                      </SelectTrigger>
                      <SelectContent>
                        <SelectGroup>
                          {LANGUAGES.map((lang) => (
                            <SelectItem key={lang.code} value={lang.code}>
                              {lang.name}
                            </SelectItem>
                          ))}
                        </SelectGroup>
                      </SelectContent>
                    </Select>
                    <FancyButton
                      onClick={(e) => {
                        e.preventDefault();
                        handleVideoSubmit(e);
                      }}
                      loading={loading}
                      label='Activate Agent'
                    />
                  </div>
                </form>
              </CardContent>
            </Card>
          </motion.div>
        </div>
      </div>
    </PageContainer>
  );
}

Setup View Components:

Feature Card: Shows feature description and benefits
URL Input: Text field for YouTube video URL
Language Selector: Dropdown with 11 supported languages
Activate Button: Triggers transcript fetch and agent activation

How Setup Works:

User enters YouTube URL
User selects preferred language
User clicks "Activate Agent"
handleVideoSubmit is called
Validates URL using validateYoutubeVideoUrl()
Fetches transcript via /api/transcribe
Updates state: hasTranscript = true
Adds system welcome message
UI switches to Chat View

Phase 2: Chat View (With Transcript)

Shown when hasTranscript === true.

return (
  <PageContainer scrollable>
    <div className='w-full space-y-4'>
      <div className='flex items-start justify-between'>
        <Heading
          title='Voice Chat'
          description={`Speak and chat in ${LANGUAGES.find((l) => l.code === selectedLanguage)?.name}`}
        />
      </div>

      <Card>
        <CardContent className='flex h-[calc(100vh-16rem)] flex-col p-4 md:p-6'>
          {/* Messages Area */}
          <div className='flex-1 overflow-hidden'>
            <ScrollArea className='h-full pr-4' ref={scrollAreaRef}>
              <div className='space-y-4 pb-4'>
                <AnimatePresence>
                  {messages.map((message) => (
                    // Message bubbles
                  ))}
                </AnimatePresence>

                {isProcessingAudio && (
                  // Processing indicator
                )}
              </div>
            </ScrollArea>
          </div>

          {/* Voice Controls */}
          <div className='mt-4 space-y-4 border-t pt-4'>
            {/* Microphone button and controls */}
          </div>
        </CardContent>
      </Card>
    </div>

    <audio ref={audioRef} style={{ display: 'none' }} />
  </PageContainer>
);

Chat View Sections:

Header: Shows "Voice Chat" title with selected language
Messages Area: Scrollable conversation history
Voice Controls: Microphone button and status indicators
Hidden Audio Element: For playing responses

Message Rendering

Each message is rendered based on its role (user/assistant/system):

{messages.map((message) => (
  <motion.div
    key={message.id}
    initial={{ opacity: 0, y: 10 }}
    animate={{ opacity: 1, y: 0 }}
    exit={{ opacity: 0, y: -10 }}
    className={`flex ${message.role === 'user' ? 'justify-end' : 'justify-start'}`}
  >
    <div
      className={`flex max-w-[80%] items-start gap-3 ${
        message.role === 'user' ? 'flex-row-reverse' : ''
      }`}
    >
      {/* Avatar Icon */}
      <div
        className={`flex h-8 w-8 flex-shrink-0 items-center justify-center rounded-full ${
          message.role === 'user'
            ? 'bg-primary'
            : message.role === 'system'
              ? 'bg-muted'
              : 'bg-primary/80'
        }`}
      >
        {message.role === 'user' ? (
          <User className='text-primary-foreground h-4 w-4' />
        ) : message.role === 'system' ? (
          <MessageSquare className='h-4 w-4' />
        ) : (
          <Bot className='text-primary-foreground h-4 w-4' />
        )}
      </div>

      {/* Message Content */}
      <div
        className={`rounded-lg px-4 py-2 ${
          message.role === 'user'
            ? 'bg-primary text-primary-foreground'
            : message.role === 'system'
              ? 'bg-muted text-muted-foreground'
              : 'bg-secondary'
        }`}
      >
        <Markdown className='prose prose-sm dark:prose-invert max-w-none'>
          {message.content}
        </Markdown>

        {/* Play Audio Button (for assistant messages) */}
        {message.audioUrl && (
          <div className='mt-2 flex items-center gap-2'>
            <Button
              size='sm'
              variant='outline'
              onClick={() => playAudio(message.audioUrl!)}
              className='h-8'
            >
              <Volume2 className='mr-2 h-3 w-3' />
              Play Audio
            </Button>
          </div>
        )}
      </div>
    </div>
  </motion.div>
))}

Message Styling:

User Messages:

Aligned right
Primary color background
User icon avatar
No audio playback

Assistant Messages:

Aligned left
Secondary color background
Bot icon avatar
Play Audio button (if audioUrl exists)

System Messages:

Aligned left
Muted background
MessageSquare icon
Welcome/status messages

Animation:

Fade in from bottom on mount
Fade out upward on unmount
Smooth transitions with Framer Motion

Voice Controls Section

The bottom control panel manages recording and playback:

<div className='mt-4 space-y-4 border-t pt-4'>
  <div className='flex items-center justify-center gap-4'>
    {/* Microphone Button */}
    <div className='relative'>
      <Button
        onClick={toggleRecording}
        disabled={isProcessingAudio}
        size='lg'
        className={`relative h-16 w-16 rounded-full transition-all ${
          isRecording
            ? 'bg-destructive hover:bg-destructive/90 animate-pulse'
            : 'bg-primary hover:bg-primary/90'
        }`}
      >
        {isProcessingAudio ? (
          <Loader2 className='h-6 w-6 animate-spin' />
        ) : isRecording ? (
          <MicOff className='h-6 w-6' />
        ) : (
          <Mic className='h-6 w-6' />
        )}
      </Button>

      {/* Pulse Animation */}
      {isRecording && (
        <div className='border-destructive absolute -inset-2 animate-ping rounded-full border-2 opacity-75' />
      )}

      {/* Audio Level Indicator */}
      {isRecording && audioLevel > 0 && (
        <div
          className='border-primary absolute -inset-1 rounded-full border-2 transition-all duration-100'
          style={{
            transform: `scale(${1 + (audioLevel / 255) * 0.5})`,
            opacity: 0.7 + (audioLevel / 255) * 0.3
          }}
        />
      )}
    </div>

    {/* Force Stop Button */}
    {(isRecording || isProcessingAudio) && (
      <Button
        onClick={forceStop}
        size='lg'
        variant='destructive'
        className='h-12 px-6'
      >
        Stop
      </Button>
    )}
  </div>

  {/* Status Text */}
  <div className='text-center'>
    <p className='text-muted-foreground text-sm'>
      {isRecording
        ? 'Speaking... Click microphone to stop'
        : isProcessingAudio
          ? 'Processing your speech...'
          : 'Click microphone to start speaking'}
    </p>
    {isRecording && (
      <p className='text-muted-foreground mt-1 text-xs'>
        Audio level: {Math.round(audioLevel)}
      </p>
    )}
  </div>

  {/* Load New Video */}
  <div className='text-muted-foreground flex items-center justify-between text-xs'>
    <span />
    <Button variant='ghost' size='sm' onClick={resetChat}>
      Load new video
    </Button>
  </div>
</div>

Microphone Button States:

Idle: Blue, shows Mic icon, enabled
Recording: Red, shows MicOff icon, pulsing animation, audio level ring
Processing: Blue, shows Loader spinner, disabled

Audio Level Visualization:

Scales border ring based on audio level (0-255)
Opacity increases with louder input
Updates in real-time during recording
Provides visual feedback that microphone is working

Force Stop Button:

Only visible when recording or processing
Immediately cancels current operation
Stops recording without processing audio
Useful if user wants to abort

Handler Functions

handleVideoSubmit

const handleVideoSubmit = async (e: React.FormEvent) => {
  e.preventDefault();
  if (!videoUrl.trim()) {
    toast.error('Please enter a YouTube URL');
    return;
  }

  // Validate YouTube URL
  const validation = validateYoutubeVideoUrl(videoUrl);
  if (!validation.isValid) {
    toast.error(validation.error || 'Please enter a valid YouTube video URL');
    return;
  }

  setLoading(true);
  setMessages([]);
  setTranscript('');
  setHasTranscript(false);

  try {
    const response = await fetch('/api/transcribe', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ videoUrl })
    });

    const data = await response.json();

    if (!response.ok) {
      throw new Error(data.error || 'Failed to fetch transcript');
    }

    if (!data?.transcript?.fullTranscript) {
      throw new Error('No transcript available for this video');
    }

    setTranscript(data.transcript.fullTranscript);
    setHasTranscript(true);

    setMessages([
      {
        id: Date.now().toString(),
        role: 'system',
        content:
          "Voice agent is ready! You can now speak and I'll respond with voice in your selected language."
      }
    ]);

    toast.success('Voice Agent Ready. Click the microphone to start speaking!');
  } catch (error: any) {
    console.error('Error fetching transcript:', error);
    toast.error(error.message || 'Failed to fetch transcript');
  } finally {
    setLoading(false);
  }
};

Submit Flow:

Validates URL is non-empty
Validates YouTube URL format and video ID
Resets previous state (messages, transcript)
Fetches transcript from API
Updates state with transcript
Adds welcome system message
Shows success notification

toggleRecording

const toggleRecording = () => {
  if (isRecording) {
    stopRecording();
  } else {
    startRecording();
  }
};

Single function for microphone button click.

startRecording

const startRecording = async () => {
  if (!hasTranscript || isRecording || isProcessingAudio) return;

  const initialized = await initializeRecording();
  if (!initialized) return;

  setIsRecording(true);
  setAudioLevel(0);
  mediaRecorderRef.current?.start();
  monitorAudioLevel();

  toast.info('Recording... Click the microphone again to stop.');
};

Start Actions:

Guard: Check transcript exists and not already recording
Initialize MediaRecorder and audio context
Set recording state to true
Reset audio level
Start MediaRecorder
Begin audio level monitoring
Show recording notification

stopRecording

const stopRecording = () => {
  if (mediaRecorderRef.current && isRecording) {
    try {
      if (mediaRecorderRef.current.state === 'recording') {
        mediaRecorderRef.current.stop();
      }
      setIsRecording(false);
      setAudioLevel(0);

      if (animationFrameRef.current) {
        cancelAnimationFrame(animationFrameRef.current);
        animationFrameRef.current = null;
      }
    } catch (error) {
      console.error('Error stopping recording:', error);
      setIsRecording(false);
      setAudioLevel(0);
    }
  }
};

Stop Actions:

Stop MediaRecorder (triggers onstop event)
Reset recording state
Reset audio level
Cancel audio monitoring animation
Triggers audio processing pipeline

resetChat

const resetChat = () => {
  setHasTranscript(false);
  setMessages([]);
  setTranscript('');
  setVideoUrl('');
};

Resets entire component back to setup view for new video.

Effects

Auto-scroll Messages

const scrollToBottom = useCallback(() => {
  if (scrollAreaRef.current) {
    const scrollContainer = scrollAreaRef.current.querySelector(
      '[data-radix-scroll-area-viewport]'
    );
    if (scrollContainer) {
      scrollContainer.scrollTop = scrollContainer.scrollHeight;
    }
  }
}, []);

useEffect(() => {
  scrollToBottom();
}, [messages, scrollToBottom]);

Automatically scrolls to bottom when new messages are added.

Cleanup Effect

useEffect(() => {
  return () => {
    if (
      mediaRecorderRef.current &&
      mediaRecorderRef.current.state === 'recording'
    ) {
      mediaRecorderRef.current.stop();
    }
    if (animationFrameRef.current) {
      cancelAnimationFrame(animationFrameRef.current);
    }
    if (audioRef.current) {
      audioRef.current.pause();
    }
  };
}, []);

Cleans up resources on component unmount:

Stops active recording
Cancels animation frames
Pauses audio playback

Language Configuration

const LANGUAGES = [
  { code: 'en-US', name: 'English' },
  { code: 'hi-IN', name: 'Hindi' },
  { code: 'bn-IN', name: 'Bengali' },
  { code: 'ta-IN', name: 'Tamil' },
  { code: 'te-IN', name: 'Telugu' },
  { code: 'mr-IN', name: 'Marathi' },
  { code: 'gu-IN', name: 'Gujarati' },
  { code: 'kn-IN', name: 'Kannada' },
  { code: 'ml-IN', name: 'Malayalam' },
  { code: 'od-IN', name: 'Odia' },
  { code: 'pa-IN', name: 'Punjabi' }
];

Language selector uses this array for dropdown options.

Responsive Design

Mobile (< 640px):

Vertical layout for input and selectors
Full-width language selector
Smaller message max-width (80%)
Touch-optimized button sizes

Desktop (≥ 640px):

Horizontal layout for controls
Fixed-width language selector (192px)
Larger message area
Mouse-optimized interactions

Accessibility

Proper ARIA labels on buttons
Keyboard navigation support
Screen reader friendly status messages
Color contrast meets WCAG AA standards
Focus indicators on interactive elements

Voice Chat Component

On this page