Korai Docs
Ai backend

FFmpeg Usage

FFmpeg is a powerful multimedia framework used extensively in the AI backend for various video and audio manipulation tasks. This document provides a detailed explanation of how and where FFmpeg is used in the main.py file.

1. Video Clip Creation

In the create_video_clip function, FFmpeg is used to combine the processed video frames (without audio) with the corresponding audio track to create the final video clip. It also applies an audio fade-out effect.

    ffmpeg_command = (f"ffmpeg -y -i {temp_video_path} -i {audio_path} "
                      f"-af \"afade=t=out:st={fade_start}:d={fade_duration}\" "
                      f"-c:v h264 -preset fast -crf 23 -c:a aac -b:a 128k "
                      f"{output_path}")
    subprocess.run(ffmpeg_command, shell=True, check=True, text=True)

Command Breakdown:

  • -y: Overwrite output file if it exists.
  • -i {temp_video_path}: Specifies the input video file (video only).
  • -i {audio_path}: Specifies the input audio file.
  • -af "afade=t=out:st={fade_start}:d={fade_duration}": Applies an audio fade-out effect.
  • -c:v h264 -preset fast -crf 23: Encodes the video using the H.264 codec with a fast preset and a Constant Rate Factor (CRF) of 23 for good quality and file size.
  • -c:a aac -b:a 128k: Encodes the audio using the AAC codec with a bitrate of 128 kbps.
  • {output_path}: Specifies the output file path.

2. Subtitle Burning

In the create_subtitles_with_ffmpeg function, FFmpeg is used to burn the generated subtitles (in .ass format) onto the video clip.

    ffmpeg_cmd = (f"ffmpeg -y -i {clip_video_path} -vf \"ass={subtitle_path}\" "
                  f"-c:v h264 -preset fast -crf 23 {output_path}")

    subprocess.run(ffmpeg_cmd, shell=True, check=True)

Command Breakdown:

  • -vf "ass={subtitle_path}": Applies a video filter that renders the subtitles from the specified .ass file.

3. Adding Background Music

In the add_background_music function, FFmpeg is used to mix the original audio of a video with a background music track.

        ffmpeg_cmd = (
            f"ffmpeg -y -i {input_video_path} -i {music_path} "
            f'-filter_complex "[1:a]volume={volume}[bg]; [0:a][bg]amix=inputs=2:duration=shortest:dropout_transition=2[mixed]" '
            f'-map 0:v -map "[mixed]" -c:v copy -c:a aac -b:a 128k -shortest {output_video_path}'
        )

Command Breakdown:

  • -filter_complex "...": Defines a complex filtergraph for audio mixing.
    • [1:a]volume={volume}[bg]: Takes the audio from the second input (music) and adjusts its volume, labeling the output as [bg].
    • [0:a][bg]amix=inputs=2:duration=shortest:dropout_transition=2[mixed]: Mixes the audio from the first input (original audio) with the [bg] stream. The output duration is set to the shortest input, and a dropout transition is used.
  • -map 0:v: Selects the video stream from the first input.
  • -map "[mixed]": Selects the mixed audio stream.
  • -c:v copy: Copies the video stream without re-encoding, which is fast and preserves quality.

4. Adding a Watermark

In the add_watermark function, FFmpeg is used to overlay a watermark image onto the video.

    ffmpeg_cmd = (f"ffmpeg -y -i {input_video_path} -i {watermark_path} "
                  f'-filter_complex "[1:v][0:v]scale2ref=w=main_w/10:h=-1[wm][base];[base][wm]overlay=40:40" '
                  f"-c:v h264 -preset fast -crf 23 -c:a copy {output_video_path}")

Command Breakdown:

  • -filter_complex "...": Defines a complex filtergraph for video processing.
    • [1:v][0:v]scale2ref=w=main_w/10:h=-1[wm][base]: Scales the watermark relative to the main video's width.
    • [base][wm]overlay=40:40: Overlays the scaled watermark [wm] onto the base video [base] at position (40, 40).
  • -c:a copy: Copies the audio stream without re-encoding.

5. Clipping and Audio Extraction

In the process_clip function, FFmpeg is used to cut a segment from the original video and extract its audio.

Clipping:

    cut_command = (f"ffmpeg -i {original_video_path} -ss {start_time} -t {duration} "
                   f"{clip_segment_path}")
    subprocess.run(cut_command, shell=True, check=True,
                   capture_output=True, text=True)

Audio Extraction:

    extract_cmd = f"ffmpeg -i {clip_segment_path} -vn -acodec pcm_s16le -ar 16000 -ac 1 {audio_path}"
    subprocess.run(extract_cmd, shell=True,
                   check=True, capture_output=True)

Command Breakdown:

  • -ss {start_time}: Seeks to the specified start time.
  • -t {duration}: Specifies the duration of the clip.
  • -vn: Disables video recording (for audio extraction).
  • -acodec pcm_s16le: Sets the audio codec to 16-bit PCM.
  • -ar 16000: Sets the audio sample rate to 16000 Hz.
  • -ac 1: Sets the number of audio channels to 1 (mono).

6. Audio Extraction for Transcription

In the transcribe_video_fast and transcribe_video functions, FFmpeg is used to extract the audio from the input video for transcription.

    extract_cmd = f"ffmpeg -i {video_path} -vn -acodec pcm_s16le -ar 16000 -ac 1 -threads 0 {audio_path}"
    subprocess.run(extract_cmd, shell=True, check=True, capture_output=True)

This command is similar to the audio extraction in process_clip, with the addition of -threads 0 to use all available CPU cores.

7. Probing Video Information

In download_youtube_video, ffprobe (a tool included with FFmpeg) is used to get information about the downloaded video, such as its resolution.

    probe_cmd = f'ffprobe -v quiet -print_format json -show_streams "{output_path.replace("%("ext)s", "*")}"'
    result = subprocess.run(probe_cmd, shell=True, capture_output=True, text=True)

Command Breakdown:

  • -v quiet: Suppresses all logging except for errors.
  • -print_format json: Sets the output format to JSON.
  • -show_streams: Shows information about each stream in the video.