Skip to content

Fix VAE encoder broadcast error for LTX-2.3 I2V#24

Open
nopmobiel wants to merge 1 commit intoBlaizzy:mainfrom
nopmobiel:fix/vae-encoder-ltx23-i2v
Open

Fix VAE encoder broadcast error for LTX-2.3 I2V#24
nopmobiel wants to merge 1 commit intoBlaizzy:mainfrom
nopmobiel:fix/vae-encoder-ltx23-i2v

Conversation

@nopmobiel
Copy link
Copy Markdown

Summary

  • Fix SpaceToDepthDownsample broadcast shape error when using LTX-2.3 converted weights for image-to-video (I2V) generation
  • The skip connection channel count is now derived from the actual conv output shape instead of the configured out_channels
  • Backwards-compatible with LTX-2 weights

Problem

When running I2V with prince-canuma/LTX-2.3-distilled, the VAE encoder crashes:

ValueError: [broadcast_shapes] Shapes (1,1024,1,8,12) and (1,2048,1,8,12) cannot be broadcast

The last compress_all_res downsample block has conv weights with 128 output channels, but out_channels // multiplier computes 256. After space-to-depth this becomes 1024 vs 2048.

T2V works fine since it doesn't use the VAE encoder.

Fix

In SpaceToDepthDownsample.__call__, compute the conv branch first and derive the skip connection's reshape dimensions from the actual output rather than self.out_channels. This handles weight/config mismatches in converted models while remaining compatible with LTX-2.

Test plan

  • Tested LTX-2.3 I2V at 768x512 with prince-canuma/LTX-2.3-distilled — generates successfully
  • Tested multi-scene I2V chaining (3 scenes) — all scenes consistent
  • T2V continues to work as before

The SpaceToDepthDownsample skip connection computed channel counts from
self.out_channels, but LTX-2.3 converted weights (prince-canuma/LTX-2.3-distilled)
have conv output channels that don't match out_channels // multiplier for the
last downsample block (128 vs 256), causing a broadcast shape error:

  ValueError: Shapes (1,1024,1,8,12) and (1,2048,1,8,12) cannot be broadcast

This only affects I2V (image-to-video) since T2V doesn't use the VAE encoder.

Fix: derive the skip connection channel count from the actual conv output
shape instead of the configured out_channels. This is backwards-compatible
with LTX-2 weights where the values already match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant