-
Notifications
You must be signed in to change notification settings - Fork 15
Editly FFmpeg filter assumes input clips have audio streams, fails with video-only inputs (kling-v3) #142
Description
Problem
The editly/rendi FFmpeg composer generates audio filter chains ([0:a], [1:a], etc.) that reference audio streams from input video clips. When clips are generated by video-only models like kling-v3, these audio streams don't exist, causing FFmpeg to fail with:
Stream specifier ':a' in filtergraph description ... matches no streams.
This breaks any render that combines video clips (from kling/other video models) with <Speech> or <Captions> components, because the audio mixing filter graph assumes every input file has an audio track.
Evidence
Render job 5342a27d-da18-426b-b0c7-56c97484b199 failed at the Rendi/FFmpeg stitching stage.
The generated FFmpeg filter graph:
[0:a]atrim=0.3:1.6,asetpts=PTS-STARTPTS,volume=1,adelay=0|0[vsrc0];
[1:a]atrim=0.3:2,asetpts=PTS-STARTPTS,volume=1,adelay=1300|1300[vsrc1];
[2:a]atrim=0.5:1.8,asetpts=PTS-STARTPTS,volume=1,adelay=3000|3000[vsrc2];
[3:a]atrim=0.5:3,asetpts=PTS-STARTPTS,volume=1,adelay=4300|4300[vsrc3];
[vsrc0][vsrc1][vsrc2][vsrc3]amix=inputs=4:normalize=0[aout]
Inputs 0-3 are kling-v3 generated .mp4 files which contain video only (no audio stream). FFmpeg cannot find :a (audio) streams and fails.
Input files:
in_1: https://su.varg.ai/cache/1772555793809_707ip3.mp4 (kling-v3, video-only)
in_2: https://su.varg.ai/cache/1772555231833_1ilys3.mp4 (kling-v3, video-only)
in_3: https://su.varg.ai/cache/1772555268703_77psef.mp4 (kling-v3, video-only)
in_4: https://su.varg.ai/cache/1772555271987_o4hofg.mp4 (kling-v3, video-only)
in_5: https://storage.rendi.dev/.../varg-packshot-final-...mp4 (packshot, may have audio)
Expected behavior
The editly composer should handle video-only input files gracefully:
- Detect which inputs have audio streams and which don't
- For video-only inputs, either skip audio filters or generate a silent audio track (
anullsrc) - Only reference
[N:a]for inputs that actually contain audio
Reproduction
Any multi-clip render using kling-v3 (or other video-only models) with <Speech> or <Captions> components will hit this at the FFmpeg stitching stage.
Suggested fix
In the editly FFmpeg filter graph builder (src/ai-sdk/providers/editly/index.ts), before generating audio filter chains:
-
Option A (robust): For each input, probe whether it has an audio stream. If not, generate a silent audio source:
anullsrc=r=44100:cl=stereo[silent_N]and use[silent_N]instead of[N:a]. -
Option B (simpler): Always add
-f lavfi -i anullsrc=r=44100:cl=stereoas an extra input and use it as a fallback for any input without audio. -
Option C (pragmatic): Skip audio filters entirely for inputs known to be video-only (based on the generation model or by checking the file's metadata at cache time).
Related
- Gateway timeout issue: https://github.com/vargHQ/gateway/issues/43 (same user session)
- The user also hit a separate voice validation error (
voice='aria'not in pooled allowlist) on a subsequent attempt — that's a user-side fix (use allowed voices or BYOK)