Skip to content

feat: Image composition/composite support (split, mirror, overlay before video) #178

@annzrva

Description

@annzrva

Problem

When building video pipelines, we often need to compose multiple generated images into a single frame BEFORE passing it to a video model (e.g. kling-v3). Currently this requires a separate ffmpeg script outside the TSX pipeline, breaking the single-file workflow.

Concrete Use Case

Eye + Ceiling Fan Composite:

  1. Generate macro eye image (nano-banana-pro)
  2. Generate ceiling fan image (nano-banana-pro)
  3. Composite: fan mirrored on left+right sides, eye centered → single 1920x1080 image
  4. Animate this composite via kling-v3

Step 3 currently requires ffmpeg:

ffmpeg -i fan.png -i eye.png -filter_complex \
  "[0:v]split[f1][f2]; [f2]hflip[fm]; \
   [f1]crop=700:1080[left]; [fm]crop=700:1080[right]; \
   [1:v]crop=860:1080[eye]; \
   color=c=black:s=2060x1080[canvas]; \
   [canvas][left]overlay=0:0[c1]; \
   [c1][right]overlay=W-700:0[c2]; \
   [c2][eye]overlay=(W-w)/2:0" \
  -frames:v 1 output.png

Proposed Solution

A Composite() or compose() function that takes multiple images and a layout, returning a single image that can be passed to Video():

const eyeComposite = Composite({
  width: 1920,
  height: 1080,
  layers: [
    { src: fanImage, crop: { width: 700 }, position: "left" },
    { src: fanImage, crop: { width: 700 }, position: "right", mirror: true },
    { src: eyeImage, crop: { width: 860 }, position: "center" },
  ],
  background: "black",
})

// Now use as input for video generation
const eyeClosing = Video({
  model: varg.videoModel("kling-v3"),
  prompt: { text: "eye slowly closes...", images: [eyeComposite] },
  duration: 5,
})

This is similar to how Speech() returns a result that can be used in <Captions src={speech}> — the composite returns an image that can be used anywhere an image is expected.

Alternative: Simpler API

Even a basic Split or Mirror helper would solve most cases:

// Mirror an image horizontally and place original + mirror side by side
const triptych = Triptych({
  left: fanImage,
  center: eyeImage, 
  right: fanImage,
  mirrorRight: true,
})

Current Workaround

Separate eye-composite.sh script that runs before pipeline.tsx, generating and uploading the composite to S3. The URL is then hardcoded in the pipeline. This breaks the reproducible single-file workflow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions