AI Podcast Clipper
The AiPodcastClipper
class is the main class that orchestrates the entire video processing workflow. It uses a Modal cls
to define a containerized environment with the necessary dependencies and GPU resources.
@app.cls(gpu="L40S", timeout=9000, retries=0, scaledown_window=300, secrets=[modal.Secret.from_name("jif-backend"), modal.Secret.from_name("sarvam-ai"), modal.Secret.from_name("huggingface"), modal.Secret.from_name("openrouter-api-key")], volumes={mount_path: volume})
class AiPodcastClipper:
# ... (class methods)
load_model
The load_model
method is executed when the Modal class is initialized. It loads the WhisperX model for transcription, the diarization pipeline for speaker identification, and initializes the OpenRouter and Sarvam AI clients.
@modal.enter()
def load_model(self):
# ... (implementation details)
manual_speaker_assignment
The manual_speaker_assignment
method manually assigns speakers to word segments based on timestamp overlap with diarization segments. This is a fallback mechanism to improve speaker attribution.
def manual_speaker_assignment(self, result, diarize_segments):
"""Manually assign speakers to word segments based on timestamp overlap"""
# ... (implementation details)
transcribe_video_fast
The transcribe_video_fast
method performs a fast transcription of a video, skipping diarization and alignment for speed. This is used when only the transcript is needed to identify potential clips.
def transcribe_video_fast(self, base_dir: str, video_path: str) -> tuple[str, object, str]:
"""Fast transcription for identify_clips - skips diarization and alignment"""
# ... (implementation details)
transcribe_video
The transcribe_video
method performs a full transcription of a video, including speaker diarization and alignment for accurate timing and speaker attribution.
def transcribe_video(self, base_dir: str, video_path: str, target_language: Optional[str] = None) -> tuple[str, object, str]:
# ... (implementation details)
identify_moments
The identify_moments
method uses a large language model (via OpenRouter) to identify compelling moments in a transcript that are suitable for creating viral clips.
def identify_moments(self, transcript: dict, source_language: str, custom_prompt: Optional[str] = None):
# ... (implementation details)
process_video
The process_video
method is a FastAPI endpoint that processes a video from a YouTube URL or S3 key, identifies clips, and processes them according to the request parameters.
@modal.fastapi_endpoint(method="POST")
def process_video(self, request: ProcessVideoRequest, token: HTTPAuthorizationCredentials = Depends(auth_scheme)):
# ... (implementation details)
identify_clips
The identify_clips
method is a FastAPI endpoint that identifies potential clips from a video without processing them. It returns a list of identified moments with metadata.
@modal.fastapi_endpoint(method="POST")
def identify_clips(self, request: IdentifyClipsRequest, token: HTTPAuthorizationCredentials = Depends(auth_scheme)):
# ... (implementation details)