Skip to content

Conversation

@ryanontheinside
Copy link
Collaborator

Summary

Fix v2v streaming choppiness when using fast VAEs (LightVAE/TAE) by basing frame consumption rate on measured input FPS rather than pipeline production throughput. This fix is required to introduce faster VAEs and likely other performance enhancements.

Problem

LightVAE and TAE produced choppy playback during v2v streaming, where the boundaries of the chunks are visible. The slower Wan VAE worked perfectly. Symptoms:

  • Choppiness only with fast VAEs
  • Test scripts produced smooth output
  • Choppiness went away after ~45 seconds or when using more denoise steps
  • Choppiness sometimes went away upon updating the prompt
  • Lightvae and Wan vae are virtually identical apart from processing speed. Tae is faster still

Root Cause

The frame consumption rate (how fast WebRTC sends frames to the client) was calculated from production throughput (how fast the GPU produces frames) rather than content temporal rate (how fast frames should be played).

Eg, when a fast VAE produces 12 frames in 0.3s, the code calculated FPS=40 and sent frames to the client at 40fps. But the video content should maintain its original temporal rate for correct motion - playing it faster causes choppy/jerky appearance.

The test scripts worked because they export with a fixed FPS value, not the production rate.

Solution

Measure the actual input video frame rate by tracking timestamps of incoming frames, then use that rate for consumption:

  • Track timestamps of last 30 incoming frames in input_loop()
  • Calculate input FPS from frame intervals
  • Use input FPS when available (>=5 samples)
  • Fall back to existing pipeline FPS calculation otherwise (for t2v mode or during warm-up)

Tested

  • no regression with Wan VAE

## Summary
Fix v2v streaming choppiness when using fast VAEs (LightVAE/TAE) by basing frame consumption rate on measured input FPS rather than pipeline production throughput. This fix is required to introduce faster VAEs and likely other performance enhancements.

## Problem
LightVAE and TAE produced choppy playback during v2v streaming, where the boundaries of the chunks are visible. The slower Wan VAE worked perfectly. Symptoms:
- Choppiness only with fast VAEs
- Test scripts produced smooth output
- Choppiness went away after ~45 seconds or when using more denoise steps
- Choppiness _sometimes_ went away upon updating the prompt
- Lightvae and Wan vae are virtually identical apart from processing speed. Tae is faster still

## Root Cause
The frame consumption rate (how fast WebRTC sends frames to the client) was calculated from **production throughput** (how fast the GPU produces frames) rather than **content temporal rate** (how fast frames should be played).

Eg, when a fast VAE produces 12 frames in 0.3s, the code calculated FPS=40 and sent frames to the client at 40fps. But the video content should maintain its original temporal rate for correct motion - playing it faster causes choppy/jerky appearance.

The test scripts worked because they export with a fixed FPS value, not the production rate.

## Solution
Measure the actual input video frame rate by tracking timestamps of incoming frames, then use that rate for consumption:
- Track timestamps of last 30 incoming frames in `input_loop()`
- Calculate input FPS from frame intervals
- Use input FPS when available (>=5 samples)
- Fall back to existing pipeline FPS calculation otherwise (for t2v mode or during warm-up)

## Tested
- [x] no regression with Wan VAE

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>
Copy link
Contributor

@yondonfu yondonfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jotting down my understanding:

The VAE speed should only matter in that it contributes to the overall generation speed of the pipeline eg faster VAE speed results in faster overall generation speed when all other components are held constant. We currently calculate the FPS of the pipeline based on its overall generation speed and then use that FPS to ensure that we send out frames at a constant rate.

But the video content should maintain its original temporal rate for correct motion - playing it faster causes choppy/jerky appearance.

The root problem is when input FPS < pipeline FPS = output FPS results in choppy/jerky appearance in the output? The default input FPS hardcoded in the frontend is 15 so if the VAE speed boost increased the pipeline FPS higher than the input FPS then we could get to this situation.

These changes address the root problem by ensuring that we cannot end up with output FPS > input FPS. We could increase the FPS used in the frontend, but that is a separate concern because regardless of what value is used there we would want to have logic in the backend that handles the scenario where the input FPS < pipeline FPS. And see my other comments about the actual conditional that I think we want for determining the output FPS.

Sound right?

@ryanontheinside
Copy link
Collaborator Author

Sound right?

Correct. When VAE speed increased, pipeline FPS went above input FPS, causing choppy motion because frames were being sent faster than their intended temporal rate. This presumably would be surfaced by any performance improvement, not specifically VAE.

…t, pipeline) for output rate

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>
@ryanontheinside
Copy link
Collaborator Author

cherry picked and tested in #221

Copy link
Contributor

@yondonfu yondonfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@yondonfu yondonfu merged commit e012f53 into main Dec 9, 2025
5 checks passed
@yondonfu yondonfu deleted the ryanontheinside/fix/consumption-rate branch December 9, 2025 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants