Skip to content

Editly FFmpeg filter assumes input clips have audio streams, fails with video-only inputs (kling-v3) #142

@SecurityQQ

Description

@SecurityQQ

Problem

The editly/rendi FFmpeg composer generates audio filter chains ([0:a], [1:a], etc.) that reference audio streams from input video clips. When clips are generated by video-only models like kling-v3, these audio streams don't exist, causing FFmpeg to fail with:

Stream specifier ':a' in filtergraph description ... matches no streams.

This breaks any render that combines video clips (from kling/other video models) with <Speech> or <Captions> components, because the audio mixing filter graph assumes every input file has an audio track.

Evidence

Render job 5342a27d-da18-426b-b0c7-56c97484b199 failed at the Rendi/FFmpeg stitching stage.

The generated FFmpeg filter graph:

[0:a]atrim=0.3:1.6,asetpts=PTS-STARTPTS,volume=1,adelay=0|0[vsrc0];
[1:a]atrim=0.3:2,asetpts=PTS-STARTPTS,volume=1,adelay=1300|1300[vsrc1];
[2:a]atrim=0.5:1.8,asetpts=PTS-STARTPTS,volume=1,adelay=3000|3000[vsrc2];
[3:a]atrim=0.5:3,asetpts=PTS-STARTPTS,volume=1,adelay=4300|4300[vsrc3];
[vsrc0][vsrc1][vsrc2][vsrc3]amix=inputs=4:normalize=0[aout]

Inputs 0-3 are kling-v3 generated .mp4 files which contain video only (no audio stream). FFmpeg cannot find :a (audio) streams and fails.

Input files:

in_1: https://su.varg.ai/cache/1772555793809_707ip3.mp4   (kling-v3, video-only)
in_2: https://su.varg.ai/cache/1772555231833_1ilys3.mp4   (kling-v3, video-only)
in_3: https://su.varg.ai/cache/1772555268703_77psef.mp4   (kling-v3, video-only)
in_4: https://su.varg.ai/cache/1772555271987_o4hofg.mp4   (kling-v3, video-only)
in_5: https://storage.rendi.dev/.../varg-packshot-final-...mp4  (packshot, may have audio)

Expected behavior

The editly composer should handle video-only input files gracefully:

  • Detect which inputs have audio streams and which don't
  • For video-only inputs, either skip audio filters or generate a silent audio track (anullsrc)
  • Only reference [N:a] for inputs that actually contain audio

Reproduction

Any multi-clip render using kling-v3 (or other video-only models) with <Speech> or <Captions> components will hit this at the FFmpeg stitching stage.

Suggested fix

In the editly FFmpeg filter graph builder (src/ai-sdk/providers/editly/index.ts), before generating audio filter chains:

  1. Option A (robust): For each input, probe whether it has an audio stream. If not, generate a silent audio source: anullsrc=r=44100:cl=stereo[silent_N] and use [silent_N] instead of [N:a].

  2. Option B (simpler): Always add -f lavfi -i anullsrc=r=44100:cl=stereo as an extra input and use it as a fallback for any input without audio.

  3. Option C (pragmatic): Skip audio filters entirely for inputs known to be video-only (based on the generation model or by checking the file's metadata at cache time).

Related

  • Gateway timeout issue: https://github.com/vargHQ/gateway/issues/43 (same user session)
  • The user also hit a separate voice validation error (voice='aria' not in pooled allowlist) on a subsequent attempt — that's a user-side fix (use allowed voices or BYOK)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions