A curated list of models, text encoders, and tools for the LTX-2 video generation suite.
-
ComfyUI official blogpost
LTX2.3-Multifunctional is a desktop-optimized version of LTX that lowers GPU requirements and simplifies usage. It integrates all features including image-to-video, text-to-video, start/end frames, lip-sync, video enhancement, and image generation into a single application.
Key Features:
- Lower GPU Requirements: Only needs 24GB VRAM (vs 32GB for standard desktop version)
- All-in-One Interface: No complex ComfyUI workflows or error-prone nodes
- Features: T2V, I2V, start/end frames, lip-sync, video enhancement, image generation, LoRA support
- Multi-Frame Insertion: Two modes for generating long videos
- Easy Setup: No third-party software required, just install LTX desktop
Downloads & Resources:
LTX-2 models are available in various formats including full weights, transformers-only, and GGUF quantizations for efficient inference.
- Lightricks/LTX-2 - Official repository.
- Lightricks/LTX-2.3 - Official repository (latest version).
- Drbaph - Quantization
Quantized to fp8_e5m2 to support older Triton with older Pytorch on 30 series GPUs. For WangGP in Pinokio
| Ver | Name | Precision | Size | Download |
|---|---|---|---|---|
| 2 | ltx-2-19b dev |
27.1 GB |
| Ver | Rank | Precision | Size | Download |
|---|---|---|---|---|
| 2.3 | 384 |
7.61 GB | ||
| 2.3 | 208 |
4.97 GB | ||
| 2.3 | 159 |
3.83 GB | ||
| 2.3 | 105 |
2.59 GB | ||
| 2 | 384 |
7.67 GB | ||
| 2 | 242 |
4.88 GB | ||
| 2 | 175 |
3.58 GB | ||
| 2 | 175 |
1.79 GB |
Required for current two-stage pipeline implementations in this repository. Download to COMFYUI_ROOT_FOLDER/models/latent_upscale_models folder.
| Ver | Name | Size | Download |
|---|---|---|---|
| 2.3 | spatial-upscaler x2 1.0 |
996 MB | |
| 2.3 | spatial-upscaler x1.5 1.0 |
1.09 GB | |
| 2 | spatial-upscaler x2 1.0 |
1.05 GB |
Required for current two-stage pipeline implementations in this repository. Download to COMFYUI_ROOT_FOLDER/models/latent_upscale_models folder.
| Ver | Name | Size | Download |
|---|---|---|---|
| 2.3 | temporal-upscaler x2 1.0 |
262 MB | |
| 2 | temporal-upscaler x2 1.0 |
262 MB |
══════════════════════════════════
These models are optimized for lower memory usage. Note that in ComfyUI, these are typically loaded as transformer-only models.
QuantStack
| Model | Quant | Size | Download |
|---|---|---|---|
| ltx-2.3-22b | 12.4 GB | dev ┊ distilled | |
| ltx-2.3-22b | 14.7 GB | dev ┊ distilled | |
| ltx-2.3-22b | 14 GB | dev ┊ distilled | |
| ltx-2.3-22b | 17.8 GB | dev ┊ distilled | |
| ltx-2.3-22b | 16.7 GB | dev ┊ distilled | |
| ltx-2.3-22b | 19.4 GB | dev ┊ distilled | |
| ltx-2.3-22b | 18.5 GB | dev ┊ distilled | |
| ltx-2.3-22b | 21 GB | dev ┊ distilled | |
| ltx-2.3-22b | 25.5 GB | dev ┊ distilled |
| Model | Quant | Size | Download |
|---|---|---|---|
| LTX-2-dev | 8.03 GB | ||
| LTX-2-dev | 10.3 GB | ||
| LTX-2-dev | 9.57 GB | ||
| LTX-2-dev | 13.4 GB | ||
| LTX-2-dev | 12.3 GB | ||
| LTX-2-dev | 15 GB | ||
| LTX-2-dev | 14.2 GB | ||
| LTX-2-dev | 16.6 GB | ||
| LTX-2-dev | 21.1 GB |
Unsloth
| Model | Quant | Size | Download |
|---|---|---|---|
| ltx-2.3-22b | 42 GB | dev ┊ distilled | |
| ltx-2.3-22b | 42 GB | dev ┊ distilled | |
| ltx-2.3-22b | 8.28 GB | dev ┊ distilled | |
| ltx-2.3-22b | 10.8 GB | dev ┊ distilled | |
| ltx-2.3-22b | 9.95 GB | dev ┊ distilled | |
| ltx-2.3-22b | 12.7 GB | dev ┊ distilled | |
| ltx-2.3-22b | 13.8 GB | dev ┊ distilled | |
| ltx-2.3-22b | 14.3 GB | dev ┊ distilled | |
| ltx-2.3-22b | 13.1 GB | dev ┊ distilled | |
| ltx-2.3-22b | 15.3 GB | dev ┊ distilled | |
| ltx-2.3-22b | 16.3 GB | dev ┊ distilled | |
| ltx-2.3-22b | 16.1 GB | dev ┊ distilled | |
| ltx-2.3-22b | 15.2 GB | dev ┊ distilled | |
| ltx-2.3-22b | 17.8 GB | dev ┊ distilled | |
| ltx-2.3-22b | 22.8 GB | dev ┊ distilled | |
| ltx-2.3-22b | 8.98 GB | dev ┊ distilled | |
| ltx-2.3-22b | 11.8 GB | dev ┊ distilled | |
| ltx-2.3-22b | 10.5 GB | dev ┊ distilled | |
| ltx-2.3-22b | 15.1 GB | dev ┊ distilled | |
| ltx-2.3-22b | 13.7 GB | dev ┊ distilled | |
| ltx-2.3-22b | 17.1 GB | dev ┊ distilled | |
| ltx-2.3-22b | 15.8 GB | dev ┊ distilled |
Vantage
◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆
LTX-2 requires Gemma-3-12b variants. LTX-2.3 uses text projection layers.
Official and optimized versions for ComfyUI.
gemma_3_12B_it_fpmixed: Experimental quant. Should be better than the fp8 scaledgemma_3_12B_it_fp4_mixed: 90% fp4 layers
Standard Gemma models often incorporate safety alignment that "sanitizes" or weakens specific concepts within prompt embeddings. Even when the model doesn't explicitly refuse a request, this internal filtering can dilute creative intent. For LTX-2 video generation, using a standard encoder often results in:
- Reduced Prompt Adherence: Key stylistic or descriptive terms may be ignored or weakened.
- Visual Softening: Visual intensity and fine details are often "muted" to fit generic safety profiles.
- Concept Dilution: Complex or niche creative requests are subtly altered, leading to less faithful representations of your vision.
Abliteration bypasses these restrictive alignment layers, allowing the encoder to translate your prompts into embeddings with maximum fidelity. This ensures LTX-2 receives the most accurate and un-filtered instructions possible.
Gemma-3-12b-Abliterated
Fixed versions of the abliterated Gemma-3-12b-it model by FusionCow, modified specifically for compatibility with LTX-2. The original model
| Model | Precision | Size | Download |
|---|---|---|---|
Gemma ablit fixed |
23.5 GB | ||
Gemma ablit fixed |
13.8 GB |
Gemma 3 12B IT Heretic
Models by DreamFast
| Model | Precision | Size | Download |
|---|---|---|---|
Gemma_3_12B_it Heretic |
23.5 GB | ||
Gemma_3_12B_it Heretic |
12.8 GB |
Sikaworld1990 Gemma-3-12b Abliterated
NVFP4 quantization variants by Sikaworld1990 optimized for Blackwell GPUs.
| Model | Precision | Size | Download |
|---|---|---|---|
Gemma-3-12b QAT Abliterated FP4 |
12.1 GB | ||
Gemma-3-12b QAT Abliterated FP4 |
8.91 GB | ||
Gemma-3-12b HereticX Abliterated |
15 GB | ||
Gemma-3-12b High-Fidelity Abliterated |
14.1 GB |
- FP4-HF: High-fidelity mixed precision calibration
- FP4-Pure: Pure FP4 quantization for maximum compression
- HereticX: Uncensored variant with maximum prompt fidelity
- High-Fidelity: Optimized for quality with better detail preservation
◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆
Separated LTX2 checkpoint by Kijai and Kijai for LTX-2.3. For alternative way to load the models in Comfy.
Note
input_scaled additionally have activation scaling, and are set to run with fp8 matmuls on supported hardware (roughly 40xx and later Nvidia GPUs).
| Ver | Component | Precision | Size | Download |
|---|---|---|---|---|
| 2.3 | Video VAE |
1.45 GB | ||
| 2.3 | Audio VAE |
365 MB | ||
| 2 | Video VAE |
2.45 GB | ||
| 2 | Audio VAE |
218 MB |
| Ver | Name | Precision | Size | Download |
|---|---|---|---|---|
| 2.3 | Embeddings Connectors dev |
2.31 GB | ||
| 2.3 | Embeddings Connectors distilled |
2.31 GB | ||
| 2 | Connector dev |
2.86 GB | ||
| 2 | Connector distilled |
2.86 GB |
◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆
- LTX-2.3-IC-LoRA-Colorizer by DoctorDiffusion (331 MB) - Colorize black and white videos
- JUST-DUB-IT
- Best-Face-Swap-Video
- Image-to-Video Adapter LoRA
- Original by MachineDelusions
- siraxe variant - Stripped audio layers + rank64 compressed (2.62 GB, 655 MB rank64 bf16)
- Lightricks LTX-2.3
- Union Control - Unified IC-LoRA combining Canny + Depth + Pose control signals for multi-signal video generation conditioning
- Motion Track Control - Guides object motion using sparse point trajectories via colored spline overlays on reference videos
- Lightricks LTX-2
- Canny Control - Edge detection control for structural guidance
- Depth Control - Depth map conditioning for 3D spatial control
- Detailer - Enhances fine details and textures in generated videos
- Pose Control - Human pose estimation control for motion guidance
- LTX-2-19b-LoRA-SPROUT
- Hydraulic press
- Cakeify
- Big Anime Breasts
- Clay Stop Motion
- Eat
- POP! Inflatable Animation - Comically inflate and pop cartoon/anime characters into confetti and fabric scraps (I2V focused)
- Outfit Switch
- Handheld run
- Atomic Explosion
- Squish
- IC luminance map
- Yoshiaki Kawajiri Retro Anime - LoRA trained on Yoshiaki Kawajiri's distinctive retro anime art style
- DonaldTrump
- WHATUSEE
- Squish – One Hand Only
- Black Venom
- HERO CAM
- Animatediff V1
- PUSH TO GLASS
- Object POV
- Group Photo
- EarthZoomOut
- Lightricks
- Wan2.1 VAE Adapter
- Latent space adapter for converting between LTX-2 and Wan2.1 VAE representations
latent_adapter_final.pt(447 MB)
ID-LoRA is a method that enables identity-preserving audio-video generation in a single model. It jointly generates a subject's appearance and voice, letting a text prompt, a reference image, and a short audio clip govern both modalities together. Built on top of LTX-2.3 (22B), it is the first method to personalize visual appearance and voice within a single generative pass.
Unlike cascaded pipelines that treat audio and video separately, ID-LoRA operates in a unified latent space where a single text prompt can simultaneously dictate the scene's visual content, environmental acoustics, and speaking style—while preserving the subject's vocal identity and visual likeness.
Key Features:
- Text prompt controls the scene and content
- Reference image preserves the subject's visual likeness
- Short audio clip preserves the subject's vocal identity
- Single unified generation pass for both appearance and voice
Available LoRAs for LTX-2.3:
| LoRA | LoRA Rank | Size | Download |
|---|---|---|---|
| ID-LoRA-TalkVid-3K | 128 | 1.1 GB | |
| ID-LoRA-CelebVHQ-3K | 128 | 1.1 GB |
Resources:
◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆
Lightricks official WF:
- Text to Video Full
- Text to Video Distilled
- Image to Video Full
- Image to Video Distilled
- ICLoRA
- Video to Video
- Video to Video Detailer
ComfyUI official WF:
- Text-to-video
- Text-to-video Distilled (faster, 8 steps)
- Image-to-video
- Image-to-video Distilled (faster, 8 steps)
- Depth control
- Canny control
- Pose control
RuneXX LTX-2.3 Workflows:
Text-to-Video (T2V):
| Workflow |
|---|
| T2V Basic |
| T2V Single Pass |
Image-to-Video / Text-to-Video (I2V/T2V):
Long Video:
| Workflow |
|---|
| I2V T2V Long Video Custom Audio |
| I2V T2V Long Video Custom Audio Loop |
| I2V T2V Long Video Custom Audio Singlepass Loop |
First-Last Frame Video (FL2V):
| Workflow |
|---|
| FL2V Custom Audio |
| FL2V First Last Frame Injection |
First-Middle-Last Frame Video (FML2V):
| Workflow |
|---|
| FML2V First Middle Last Frame Guider |
| FML2V First Middle Last Frame Injection |
| FML2V Guider Custom Audio |
Video-to-Video (V2V):
| Workflow |
|---|
| V2V Extend Any Video |
| V2V Foley Add Sound To Any Video |
| V2V Just Talk Add Lipsynced Voice To Any Video |
| V2V ReTake Recreate Any Section Of Any Video |
