Chat with Video - API Route

Overview

The chat API route handles streaming AI responses using the Vercel AI SDK. It supports multiple AI providers, web search, rate limiting, and returns streaming responses in real-time to the client.

API Route File

Location: src/app/api/chat/route.ts

Endpoint: POST /api/chat

Max Duration: 30 seconds

Configuration

export const maxDuration = 30;

Allows streaming responses up to 30 seconds.

Request Flow

1. Authentication

const { userId } = await auth();

if (!userId) {
  return NextResponse.json(
    { error: 'Authentication required' },
    { status: 401 }
  );
}

Uses Clerk's auth() to verify user is authenticated. Returns 401 if not logged in.

2. Rate Limiting

Check Rate Limit

let rateLimitResult;
try {
  rateLimitResult = await userChatLimiter.limit(`chat_${userId}`);
} catch (error) {
  console.error('Rate limiter error:', error);
  return NextResponse.json(
    {
      error: 'Service temporarily unavailable. Please try again in a moment.'
    },
    { status: 503 }
  );
}

Checks rate limit using Upstash Redis. Uses identifier chat_{userId} for per-user limits.

Extract Rate Limit Info

const { success, limit, remaining, reset } = rateLimitResult;

Gets rate limit status from result.

Handle Rate Limit Exceeded

if (!success) {
  return NextResponse.json(
    {
      error: 'Chat limit exceeded. You have used all 30 chat attempts for today.',
      limit,
      remaining: 0,
      reset
    },
    {
      status: 429,
      headers: {
        'X-RateLimit-Limit': limit.toString(),
        'X-RateLimit-Remaining': '0',
        'X-RateLimit-Reset': reset.toString()
      }
    }
  );
}

Returns 429 status with rate limit headers when user exceeds limit (30 chats per day).

3. Parse Request Body

const {
  messages,
  model,
  webSearch,
  system
}: {
  messages: UIMessage[] | any[];
  model: string;
  webSearch: boolean;
  system?: string;
} = await req.json();

Request Parameters

messages: Array of chat messages (can be UIMessage format or simple format)
model: AI model identifier (e.g., 'meta-llama/llama-3.1-70b-instruct')
webSearch: Boolean flag to enable web search via Perplexity
system: Optional system prompt (includes video transcript as context)

4. Model Selection

const selectedModel = getModel(DEFAULT_MODEL);

Gets the default model from provider configuration. DEFAULT_MODEL is set to 'gemini-2.5-flash'.

5. Message Processing

let processedMessages;
if (messages && messages.length > 0 && 'parts' in messages[0]) {
  // UIMessage format from useChat
  processedMessages = convertToModelMessages(messages as UIMessage[]);
} else {
  // Simple format from thread generation or other sources
  processedMessages = messages;
}

Message Format Detection

UIMessage format: Messages from useChat hook with parts property
Simple format: Plain message arrays from thread generation

Converts UIMessage format to model-compatible format using convertToModelMessages().

6. Generate Streaming Response

const result = streamText({
  model: webSearch ? 'perplexity/sonar' : (selectedModel as any),
  messages: processedMessages,
  system:
    system ||
    'You are a helpful assistant that can answer questions and help with tasks',
  experimental_transform: smoothStream()
});

Parameters

model: Uses Perplexity Sonar if webSearch is true, otherwise uses selected model
messages: Processed message array
system: Custom system prompt (includes transcript) or default assistant prompt
experimental_transform: smoothStream() for smoother streaming experience

7. Return Streaming Response

return result.toUIMessageStreamResponse({
  sendSources: true,
  sendReasoning: true,
  headers: {
    'X-RateLimit-Limit': limit.toString(),
    'X-RateLimit-Remaining': remaining.toString(),
    'X-RateLimit-Reset': reset.toString()
  }
});

Response Configuration

sendSources: Includes source URLs in response (for web search)
sendReasoning: Includes reasoning parts in response
headers: Rate limit information for client

8. Error Handling

catch (error) {
  console.error('Error in chat route:', error);
  return NextResponse.json(
    { error: 'Failed to process chat request' },
    { status: 500 }
  );
}

Catches any errors during processing and returns 500 status.

System Prompt Format

The system prompt includes the video transcript for context:

system: `You are an AI assistant helping users understand video content. You have access to the following video transcript:

${transcript}

Answer questions based on this transcript. Be conversational, helpful, and accurate. If something is not mentioned in the transcript, say so.`

This prompt:

Defines the AI's role
Provides the full transcript as context
Sets guidelines for responses

Response Format

Success Response (Streaming)

Returns a streaming response with:

Content-Type: text/event-stream
Headers: Rate limit information
Body: Streaming message parts

UIMessage Parts

{
  id: "msg-123",
  role: "assistant",
  parts: [
    { type: "text", text: "Response content..." },
    { type: "reasoning", text: "I analyzed..." },
    { type: "source-url", url: "https://..." }
  ]
}

Error Responses

401 Unauthorized

{
  "error": "Authentication required"
}

429 Rate Limit Exceeded

{
  "error": "Chat limit exceeded. You have used all 30 chat attempts for today.",
  "limit": 30,
  "remaining": 0,
  "reset": 1234567890
}

500 Internal Server Error

{
  "error": "Failed to process chat request"
}

503 Service Unavailable

{
  "error": "Service temporarily unavailable. Please try again in a moment."
}

AI Provider Integration

Provider Configuration

File: src/lib/providers.ts

import { createGroq } from '@ai-sdk/groq';
import { createGoogleGenerativeAI } from '@ai-sdk/google';
import { createCerebras } from '@ai-sdk/cerebras';

export const groq = createGroq({
  apiKey: process.env.GROQ_API_KEY
});

const cerebras = createCerebras({
  apiKey: process.env.CEREBRAS_API_KEY
});

export const google = createGoogleGenerativeAI({
  apiKey: process.env.GOOGLE_GENERATIVE_AI_API_KEY
});

export function getModel(modelName: string) {
  if (modelName.startsWith('gemini-')) {
    return google(modelName);
  } else {
    return cerebras(modelName);
  }
}

export const DEFAULT_MODEL = 'gemini-2.5-flash';

Model Selection Logic

Gemini models: Use Google Generative AI provider
Other models: Use Cerebras provider (Llama models)
Web search: Always use Perplexity Sonar

Available Models

Gemini 2.5 Flash (default): Fast, efficient Google model
Llama 3.1 70B: Large Llama model via Cerebras
Llama 4 Maverick: Latest Llama model via Cerebras
Perplexity Sonar: Web search enabled model

Rate Limiting Details

Configuration

Limiter: userChatLimiter from @/lib/ratelimit
Identifier: chat_{userId} (per-user limit)
Limit: 30 chat attempts per day
Backend: Upstash Redis

Rate Limit Headers

X-RateLimit-Limit: 30
X-RateLimit-Remaining: 25
X-RateLimit-Reset: 1234567890

Client Usage

Client can read these headers to show remaining attempts:

const remaining = response.headers.get('X-RateLimit-Remaining');
// Display to user: "25 chats remaining today"

Web Search Mode

Activation

When webSearch is true, the route uses Perplexity's Sonar model:

model: webSearch ? 'perplexity/sonar' : selectedModel

Features

Real-time web information
Source URLs included in response
Useful for current events or facts not in transcript

Response with Sources

{
  id: "msg-123",
  role: "assistant",
  parts: [
    { type: "text", text: "Based on recent data..." },
    { type: "source-url", url: "https://example.com/source" },
    { type: "source-url", url: "https://example.com/another-source" }
  ]
}

Security Considerations

Authentication Required

All requests must be authenticated via Clerk. The user ID is used for:

Rate limiting
Logging (if implemented)
Potential future features (conversation history, etc.)

Rate Limiting

Prevents abuse by limiting each user to 30 chats per day.

Error Message Safety

Generic error messages prevent leaking system details:

{ error: 'Failed to process chat request' }

Usage Example

Client Request

const response = await fetch('/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    messages: [
      { role: 'user', content: 'What is this video about?' }
    ],
    model: 'meta-llama/llama-3.1-70b-instruct',
    webSearch: false,
    system: `You are an AI assistant helping users understand video content. 
             Transcript: ${transcript}`
  })
});

With useChat Hook

const { messages, sendMessage } = useChat();

sendMessage(
  { text: 'Summarize this video' },
  {
    body: {
      model: 'meta-llama/llama-3.1-70b-instruct',
      webSearch: true,
      system: `Transcript: ${transcript}`
    }
  }
);

Chat with Video - API Route

On this page