Chat with Video - API Route
Next.js API route for streaming AI chat responses
Overview
The chat API route handles streaming AI responses using the Vercel AI SDK. It supports multiple AI providers, web search, rate limiting, and returns streaming responses in real-time to the client.
API Route File
Location: src/app/api/chat/route.ts
Endpoint: POST /api/chat
Max Duration: 30 seconds
Configuration
export const maxDuration = 30;
Allows streaming responses up to 30 seconds.
Request Flow
1. Authentication
const { userId } = await auth();
if (!userId) {
return NextResponse.json(
{ error: 'Authentication required' },
{ status: 401 }
);
}
Uses Clerk's auth()
to verify user is authenticated. Returns 401 if not logged in.
2. Rate Limiting
Check Rate Limit
let rateLimitResult;
try {
rateLimitResult = await userChatLimiter.limit(`chat_${userId}`);
} catch (error) {
console.error('Rate limiter error:', error);
return NextResponse.json(
{
error: 'Service temporarily unavailable. Please try again in a moment.'
},
{ status: 503 }
);
}
Checks rate limit using Upstash Redis. Uses identifier chat_{userId}
for per-user limits.
Extract Rate Limit Info
const { success, limit, remaining, reset } = rateLimitResult;
Gets rate limit status from result.
Handle Rate Limit Exceeded
if (!success) {
return NextResponse.json(
{
error: 'Chat limit exceeded. You have used all 30 chat attempts for today.',
limit,
remaining: 0,
reset
},
{
status: 429,
headers: {
'X-RateLimit-Limit': limit.toString(),
'X-RateLimit-Remaining': '0',
'X-RateLimit-Reset': reset.toString()
}
}
);
}
Returns 429 status with rate limit headers when user exceeds limit (30 chats per day).
3. Parse Request Body
const {
messages,
model,
webSearch,
system
}: {
messages: UIMessage[] | any[];
model: string;
webSearch: boolean;
system?: string;
} = await req.json();
Request Parameters
- messages: Array of chat messages (can be UIMessage format or simple format)
- model: AI model identifier (e.g., 'meta-llama/llama-3.1-70b-instruct')
- webSearch: Boolean flag to enable web search via Perplexity
- system: Optional system prompt (includes video transcript as context)
4. Model Selection
const selectedModel = getModel(DEFAULT_MODEL);
Gets the default model from provider configuration. DEFAULT_MODEL
is set to 'gemini-2.5-flash'
.
5. Message Processing
let processedMessages;
if (messages && messages.length > 0 && 'parts' in messages[0]) {
// UIMessage format from useChat
processedMessages = convertToModelMessages(messages as UIMessage[]);
} else {
// Simple format from thread generation or other sources
processedMessages = messages;
}
Message Format Detection
- UIMessage format: Messages from
useChat
hook withparts
property - Simple format: Plain message arrays from thread generation
Converts UIMessage format to model-compatible format using convertToModelMessages()
.
6. Generate Streaming Response
const result = streamText({
model: webSearch ? 'perplexity/sonar' : (selectedModel as any),
messages: processedMessages,
system:
system ||
'You are a helpful assistant that can answer questions and help with tasks',
experimental_transform: smoothStream()
});
Parameters
- model: Uses Perplexity Sonar if
webSearch
is true, otherwise uses selected model - messages: Processed message array
- system: Custom system prompt (includes transcript) or default assistant prompt
- experimental_transform:
smoothStream()
for smoother streaming experience
7. Return Streaming Response
return result.toUIMessageStreamResponse({
sendSources: true,
sendReasoning: true,
headers: {
'X-RateLimit-Limit': limit.toString(),
'X-RateLimit-Remaining': remaining.toString(),
'X-RateLimit-Reset': reset.toString()
}
});
Response Configuration
- sendSources: Includes source URLs in response (for web search)
- sendReasoning: Includes reasoning parts in response
- headers: Rate limit information for client
8. Error Handling
catch (error) {
console.error('Error in chat route:', error);
return NextResponse.json(
{ error: 'Failed to process chat request' },
{ status: 500 }
);
}
Catches any errors during processing and returns 500 status.
System Prompt Format
The system prompt includes the video transcript for context:
system: `You are an AI assistant helping users understand video content. You have access to the following video transcript:
${transcript}
Answer questions based on this transcript. Be conversational, helpful, and accurate. If something is not mentioned in the transcript, say so.`
This prompt:
- Defines the AI's role
- Provides the full transcript as context
- Sets guidelines for responses
Response Format
Success Response (Streaming)
Returns a streaming response with:
- Content-Type:
text/event-stream
- Headers: Rate limit information
- Body: Streaming message parts
UIMessage Parts
{
id: "msg-123",
role: "assistant",
parts: [
{ type: "text", text: "Response content..." },
{ type: "reasoning", text: "I analyzed..." },
{ type: "source-url", url: "https://..." }
]
}
Error Responses
401 Unauthorized
{
"error": "Authentication required"
}
429 Rate Limit Exceeded
{
"error": "Chat limit exceeded. You have used all 30 chat attempts for today.",
"limit": 30,
"remaining": 0,
"reset": 1234567890
}
500 Internal Server Error
{
"error": "Failed to process chat request"
}
503 Service Unavailable
{
"error": "Service temporarily unavailable. Please try again in a moment."
}
AI Provider Integration
Provider Configuration
File: src/lib/providers.ts
import { createGroq } from '@ai-sdk/groq';
import { createGoogleGenerativeAI } from '@ai-sdk/google';
import { createCerebras } from '@ai-sdk/cerebras';
export const groq = createGroq({
apiKey: process.env.GROQ_API_KEY
});
const cerebras = createCerebras({
apiKey: process.env.CEREBRAS_API_KEY
});
export const google = createGoogleGenerativeAI({
apiKey: process.env.GOOGLE_GENERATIVE_AI_API_KEY
});
export function getModel(modelName: string) {
if (modelName.startsWith('gemini-')) {
return google(modelName);
} else {
return cerebras(modelName);
}
}
export const DEFAULT_MODEL = 'gemini-2.5-flash';
Model Selection Logic
- Gemini models: Use Google Generative AI provider
- Other models: Use Cerebras provider (Llama models)
- Web search: Always use Perplexity Sonar
Available Models
- Gemini 2.5 Flash (default): Fast, efficient Google model
- Llama 3.1 70B: Large Llama model via Cerebras
- Llama 4 Maverick: Latest Llama model via Cerebras
- Perplexity Sonar: Web search enabled model
Rate Limiting Details
Configuration
- Limiter:
userChatLimiter
from@/lib/ratelimit
- Identifier:
chat_{userId}
(per-user limit) - Limit: 30 chat attempts per day
- Backend: Upstash Redis
Rate Limit Headers
X-RateLimit-Limit: 30
X-RateLimit-Remaining: 25
X-RateLimit-Reset: 1234567890
Client Usage
Client can read these headers to show remaining attempts:
const remaining = response.headers.get('X-RateLimit-Remaining');
// Display to user: "25 chats remaining today"
Web Search Mode
Activation
When webSearch
is true
, the route uses Perplexity's Sonar model:
model: webSearch ? 'perplexity/sonar' : selectedModel
Features
- Real-time web information
- Source URLs included in response
- Useful for current events or facts not in transcript
Response with Sources
{
id: "msg-123",
role: "assistant",
parts: [
{ type: "text", text: "Based on recent data..." },
{ type: "source-url", url: "https://example.com/source" },
{ type: "source-url", url: "https://example.com/another-source" }
]
}
Security Considerations
Authentication Required
All requests must be authenticated via Clerk. The user ID is used for:
- Rate limiting
- Logging (if implemented)
- Potential future features (conversation history, etc.)
Rate Limiting
Prevents abuse by limiting each user to 30 chats per day.
Error Message Safety
Generic error messages prevent leaking system details:
{ error: 'Failed to process chat request' }
Usage Example
Client Request
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: [
{ role: 'user', content: 'What is this video about?' }
],
model: 'meta-llama/llama-3.1-70b-instruct',
webSearch: false,
system: `You are an AI assistant helping users understand video content.
Transcript: ${transcript}`
})
});
With useChat Hook
const { messages, sendMessage } = useChat();
sendMessage(
{ text: 'Summarize this video' },
{
body: {
model: 'meta-llama/llama-3.1-70b-instruct',
webSearch: true,
system: `Transcript: ${transcript}`
}
}
);