-
Notifications
You must be signed in to change notification settings - Fork 18
fix: use input frame rate for v2v consumption instead of production rate #219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## Summary Fix v2v streaming choppiness when using fast VAEs (LightVAE/TAE) by basing frame consumption rate on measured input FPS rather than pipeline production throughput. This fix is required to introduce faster VAEs and likely other performance enhancements. ## Problem LightVAE and TAE produced choppy playback during v2v streaming, where the boundaries of the chunks are visible. The slower Wan VAE worked perfectly. Symptoms: - Choppiness only with fast VAEs - Test scripts produced smooth output - Choppiness went away after ~45 seconds or when using more denoise steps - Choppiness _sometimes_ went away upon updating the prompt - Lightvae and Wan vae are virtually identical apart from processing speed. Tae is faster still ## Root Cause The frame consumption rate (how fast WebRTC sends frames to the client) was calculated from **production throughput** (how fast the GPU produces frames) rather than **content temporal rate** (how fast frames should be played). Eg, when a fast VAE produces 12 frames in 0.3s, the code calculated FPS=40 and sent frames to the client at 40fps. But the video content should maintain its original temporal rate for correct motion - playing it faster causes choppy/jerky appearance. The test scripts worked because they export with a fixed FPS value, not the production rate. ## Solution Measure the actual input video frame rate by tracking timestamps of incoming frames, then use that rate for consumption: - Track timestamps of last 30 incoming frames in `input_loop()` - Calculate input FPS from frame intervals - Use input FPS when available (>=5 samples) - Fall back to existing pipeline FPS calculation otherwise (for t2v mode or during warm-up) ## Tested - [x] no regression with Wan VAE Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jotting down my understanding:
The VAE speed should only matter in that it contributes to the overall generation speed of the pipeline eg faster VAE speed results in faster overall generation speed when all other components are held constant. We currently calculate the FPS of the pipeline based on its overall generation speed and then use that FPS to ensure that we send out frames at a constant rate.
But the video content should maintain its original temporal rate for correct motion - playing it faster causes choppy/jerky appearance.
The root problem is when input FPS < pipeline FPS = output FPS results in choppy/jerky appearance in the output? The default input FPS hardcoded in the frontend is 15 so if the VAE speed boost increased the pipeline FPS higher than the input FPS then we could get to this situation.
These changes address the root problem by ensuring that we cannot end up with output FPS > input FPS. We could increase the FPS used in the frontend, but that is a separate concern because regardless of what value is used there we would want to have logic in the backend that handles the scenario where the input FPS < pipeline FPS. And see my other comments about the actual conditional that I think we want for determining the output FPS.
Sound right?
Correct. When VAE speed increased, pipeline FPS went above input FPS, causing choppy motion because frames were being sent faster than their intended temporal rate. This presumably would be surfaced by any performance improvement, not specifically VAE. |
…t, pipeline) for output rate Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>
|
cherry picked and tested in #221 |
yondonfu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Summary
Fix v2v streaming choppiness when using fast VAEs (LightVAE/TAE) by basing frame consumption rate on measured input FPS rather than pipeline production throughput. This fix is required to introduce faster VAEs and likely other performance enhancements.
Problem
LightVAE and TAE produced choppy playback during v2v streaming, where the boundaries of the chunks are visible. The slower Wan VAE worked perfectly. Symptoms:
Root Cause
The frame consumption rate (how fast WebRTC sends frames to the client) was calculated from production throughput (how fast the GPU produces frames) rather than content temporal rate (how fast frames should be played).
Eg, when a fast VAE produces 12 frames in 0.3s, the code calculated FPS=40 and sent frames to the client at 40fps. But the video content should maintain its original temporal rate for correct motion - playing it faster causes choppy/jerky appearance.
The test scripts worked because they export with a fixed FPS value, not the production rate.
Solution
Measure the actual input video frame rate by tracking timestamps of incoming frames, then use that rate for consumption:
input_loop()Tested