FFmpeg Usage
FFmpeg is a powerful multimedia framework used extensively in the AI backend for various video and audio manipulation tasks. This document provides a detailed explanation of how and where FFmpeg is used in the main.py file.
1. Video Clip Creation
In the create_video_clip function, FFmpeg is used to combine the processed video frames (without audio) with the corresponding audio track to create the final video clip. It also applies an audio fade-out effect.
ffmpeg_command = (f"ffmpeg -y -i {temp_video_path} -i {audio_path} "
f"-af \"afade=t=out:st={fade_start}:d={fade_duration}\" "
f"-c:v h264 -preset fast -crf 23 -c:a aac -b:a 128k "
f"{output_path}")
subprocess.run(ffmpeg_command, shell=True, check=True, text=True)Command Breakdown:
-y: Overwrite output file if it exists.-i {temp_video_path}: Specifies the input video file (video only).-i {audio_path}: Specifies the input audio file.-af "afade=t=out:st={fade_start}:d={fade_duration}": Applies an audio fade-out effect.-c:v h264 -preset fast -crf 23: Encodes the video using the H.264 codec with a fast preset and a Constant Rate Factor (CRF) of 23 for good quality and file size.-c:a aac -b:a 128k: Encodes the audio using the AAC codec with a bitrate of 128 kbps.{output_path}: Specifies the output file path.
2. Subtitle Burning
In the create_subtitles_with_ffmpeg function, FFmpeg is used to burn the generated subtitles (in .ass format) onto the video clip.
ffmpeg_cmd = (f"ffmpeg -y -i {clip_video_path} -vf \"ass={subtitle_path}\" "
f"-c:v h264 -preset fast -crf 23 {output_path}")
subprocess.run(ffmpeg_cmd, shell=True, check=True)Command Breakdown:
-vf "ass={subtitle_path}": Applies a video filter that renders the subtitles from the specified.assfile.
3. Adding Background Music
In the add_background_music function, FFmpeg is used to mix the original audio of a video with a background music track.
ffmpeg_cmd = (
f"ffmpeg -y -i {input_video_path} -i {music_path} "
f'-filter_complex "[1:a]volume={volume}[bg]; [0:a][bg]amix=inputs=2:duration=shortest:dropout_transition=2[mixed]" '
f'-map 0:v -map "[mixed]" -c:v copy -c:a aac -b:a 128k -shortest {output_video_path}'
)Command Breakdown:
-filter_complex "...": Defines a complex filtergraph for audio mixing.[1:a]volume={volume}[bg]: Takes the audio from the second input (music) and adjusts its volume, labeling the output as[bg].[0:a][bg]amix=inputs=2:duration=shortest:dropout_transition=2[mixed]: Mixes the audio from the first input (original audio) with the[bg]stream. The output duration is set to the shortest input, and a dropout transition is used.
-map 0:v: Selects the video stream from the first input.-map "[mixed]": Selects the mixed audio stream.-c:v copy: Copies the video stream without re-encoding, which is fast and preserves quality.
4. Adding a Watermark
In the add_watermark function, FFmpeg is used to overlay a watermark image onto the video.
ffmpeg_cmd = (f"ffmpeg -y -i {input_video_path} -i {watermark_path} "
f'-filter_complex "[1:v][0:v]scale2ref=w=main_w/10:h=-1[wm][base];[base][wm]overlay=40:40" '
f"-c:v h264 -preset fast -crf 23 -c:a copy {output_video_path}")Command Breakdown:
-filter_complex "...": Defines a complex filtergraph for video processing.[1:v][0:v]scale2ref=w=main_w/10:h=-1[wm][base]: Scales the watermark relative to the main video's width.[base][wm]overlay=40:40: Overlays the scaled watermark[wm]onto the base video[base]at position (40, 40).
-c:a copy: Copies the audio stream without re-encoding.
5. Clipping and Audio Extraction
In the process_clip function, FFmpeg is used to cut a segment from the original video and extract its audio.
Clipping:
cut_command = (f"ffmpeg -i {original_video_path} -ss {start_time} -t {duration} "
f"{clip_segment_path}")
subprocess.run(cut_command, shell=True, check=True,
capture_output=True, text=True)Audio Extraction:
extract_cmd = f"ffmpeg -i {clip_segment_path} -vn -acodec pcm_s16le -ar 16000 -ac 1 {audio_path}"
subprocess.run(extract_cmd, shell=True,
check=True, capture_output=True)Command Breakdown:
-ss {start_time}: Seeks to the specified start time.-t {duration}: Specifies the duration of the clip.-vn: Disables video recording (for audio extraction).-acodec pcm_s16le: Sets the audio codec to 16-bit PCM.-ar 16000: Sets the audio sample rate to 16000 Hz.-ac 1: Sets the number of audio channels to 1 (mono).
6. Audio Extraction for Transcription
In the transcribe_video_fast and transcribe_video functions, FFmpeg is used to extract the audio from the input video for transcription.
extract_cmd = f"ffmpeg -i {video_path} -vn -acodec pcm_s16le -ar 16000 -ac 1 -threads 0 {audio_path}"
subprocess.run(extract_cmd, shell=True, check=True, capture_output=True)This command is similar to the audio extraction in process_clip, with the addition of -threads 0 to use all available CPU cores.
7. Probing Video Information
In download_youtube_video, ffprobe (a tool included with FFmpeg) is used to get information about the downloaded video, such as its resolution.
probe_cmd = f'ffprobe -v quiet -print_format json -show_streams "{output_path.replace("%("ext)s", "*")}"'
result = subprocess.run(probe_cmd, shell=True, capture_output=True, text=True)Command Breakdown:
-v quiet: Suppresses all logging except for errors.-print_format json: Sets the output format to JSON.-show_streams: Shows information about each stream in the video.