Benchmark encoding against ffmpeg cli #1074

Dan-Flores · 2025-11-26T06:50:07Z

This PR creates a benchmark to compare VideoEncoder against FFmpeg CLI. These tools aren't one-to-one, so some assumptions are made:

For VideoEncoder, we use this simple workflow:

encoder = VideoEncoder(frames=frames, frame_rate=30)
encoder.to_file(dest=output_path, codec="h264_nvenc", extra_options={"qp": 1})

For FFmpeg CLI, we count the time used to write frames from a tensor to a file if the flag is used: --write_frames

if write_frames:
	raw_frames = frames.permute(0, 2, 3, 1).contiguous()[:num_frames]
    	with open(raw_path, "wb") as f:
        	f.write(raw_frames.cpu().numpy().tobytes())
        
ffmpeg_cmd = [...]
subprocess.run(ffmpeg_cmd, check=True, capture_output=True)

Result Summary:

VideoEncoder shows better performance on GPU + CPU.
- When the time required to write frames to bytes is added, FFmpeg CLI is much slower.
On GPU, VideoEncoder shows a significant speed improvement, up to 3.5x faster than FFmpeg CLI for decoding 30 frames, without adding the time required to write frames to bytes.
- NVENC utilization is higher for VideoEncoder, while Median GPU memory used values are the same.
On CPU, FFmpeg CLI has a slight edge without adding the time required to write frames to bytes. Otherwise, VideoEncoder is significantly faster.
- I suspect there are optimizations we could make in VideoEncoder::encode to close the gap, but lets land this benchmark as is.

Details

All benchmarks are run using a 1280x720 video: Command to generate video: `ffmpeg -f lavfi -i testsrc2=duration=600:size=1280x720:rate=30 -c:v libx264 -pix_fmt yuv420p test/resources/testsrc2_10min.mp4`

Benchmarking `nasa_13013.mp4`, writing frames in FFmpeg

$ `python benchmarks/encoders/benchmark_encoders.py`

Benchmarking 390 frames from nasa_13013.mp4 over 30 runs:
Decoded 390 frames of size 270x480

VideoEncoder on GPU   med = 119.26 ms, max = 122.06 ms, fps = 3270.1
GPU memory used:      med = 1231.0 MB, max = 1231.0 MB
NVENC utilization:    med = 30.0%,     max = 38.0%

FFmpeg CLI on GPU     med = 1174.55 ms, max = 1524.59 ms, fps = 332.0
GPU memory used:      med = 1231.0 MB, max = 1231.0 MB
NVENC utilization:    med = 15.0%,     max = 22.0%

VideoEncoder on CPU   med = 408.43 ms, max = 454.66 ms, fps = 954.9

FFmpeg CLI on CPU     med = 1184.47 ms, max = 1219.28 ms, fps = 329.3

Benchmarking `nasa_13013.mp4`, with `--skip-write-frames`

$ `python benchmarks/encoders/benchmark_encoders.py --skip-write-frames`

Benchmarking 390 frames from nasa_13013.mp4 over 30 runs:
Decoded 390 frames of size 270x480

VideoEncoder on GPU   med = 120.21 ms, max = 122.40 ms, fps = 3244.4
GPU memory used:      med = 1231.0 MB, max = 1231.0 MB
NVENC utilization:    med = 26.0%,     max = 39.0%

FFmpeg CLI on GPU     med = 419.66 ms, max = 1189.17 ms, fps = 929.3
GPU memory used:      med = 1231.0 MB, max = 1231.0 MB
NVENC utilization:    med = 18.0%,     max = 23.0%

VideoEncoder on CPU   med = 408.86 ms, max = 449.01 ms, fps = 953.9

FFmpeg CLI on CPU     med = 383.65 ms, max = 410.91 ms, fps = 1016.5

…o encode_gpu

…into test_gpu_benchmarking

…o test_gpu_benchmarking

…/torchcodec into test_gpu_benchmarking

mollyxu · 2025-12-18T17:51:30Z