ModelTC
diff --git a/‎README.md‎
100644100755
Lines changed: 55 additions & 3 deletions b/‎README.md‎
100644100755
Lines changed: 55 additions & 3 deletions
diff --git a/‎README_zh.md‎
100644100755
Lines changed: 54 additions & 1 deletion b/‎README_zh.md‎
100644100755
Lines changed: 54 additions & 1 deletion
diff --git a/‎configs/wan22/wan_moe_i2v_distil_with_lora.json‎ renamed to ‎configs/wan22/wan_moe_i2v_distill_with_lora.json‎ b/‎configs/wan22/wan_moe_i2v_distil_with_lora.json‎ renamed to ‎configs/wan22/wan_moe_i2v_distill_with_lora.json‎
diff --git a/‎docs/EN/source/getting_started/quickstart.md‎
100644100755
Lines changed: 89 additions & 1 deletion b/‎docs/EN/source/getting_started/quickstart.md‎
100644100755
Lines changed: 89 additions & 1 deletion
diff --git a/‎docs/ZH_CN/source/getting_started/quickstart.md‎
100644100755
Lines changed: 93 additions & 2 deletions b/‎docs/ZH_CN/source/getting_started/quickstart.md‎
100644100755
Lines changed: 93 additions & 2 deletions
@@ -20,12 +20,66 @@
 
 ## :fire: Latest News
 
-- **November 21, 2025:** 🚀 We support the [HunyuanVideo-1.5](https://huggingface.co/tencent/HunyuanVideo-1.5) video generation model since Day 0. With the same number of GPUs, LightX2V can achieve a speed improvement of over 2 times and supports deployment on GPUs with lower memory (such as the 24GB RTX 4090). It also supports CFG/Ulysses parallelism, efficient offloading, TeaCache/MagCache technologies, and more. We will soon update our models on our [HuggingFace page](https://huggingface.co/lightx2v), including quantization, step distillation, VAE distillation, and other related models. Refer to [this](https://github.com/ModelTC/LightX2V/tree/main/scripts/hunyuan_video_15) for usage tutorials.
+- **November 21, 2025:** 🚀 We support the [HunyuanVideo-1.5](https://huggingface.co/tencent/HunyuanVideo-1.5) video generation model since Day 0. With the same number of GPUs, LightX2V can achieve a speed improvement of over 2 times and supports deployment on GPUs with lower memory (such as the 24GB RTX 4090). It also supports CFG/Ulysses parallelism, efficient offloading, TeaCache/MagCache technologies, and more. It also supports deployment on domestic chips such as Muxi and Cambricon. Quantized models and lightweight VAE models are now available: [Hy1.5-Quantized-Models](https://huggingface.co/lightx2v/Hy1.5-Quantized-Models) for quantized inference, and [LightTAE for HunyuanVideo-1.5](https://huggingface.co/lightx2v/Autoencoders/blob/main/lighttaehy1_5.safetensors) for fast VAE decoding. We will soon update more models on our [HuggingFace page](https://huggingface.co/lightx2v), including step distillation, VAE distillation, and other related models. Refer to [this](https://github.com/ModelTC/LightX2V/tree/main/scripts/hunyuan_video_15) for usage tutorials, or check out the [examples directory](https://github.com/ModelTC/LightX2V/tree/main/examples) for code examples.
 
 ## 💡 Quick Start
 
 For comprehensive usage instructions, please refer to our documentation: **[English Docs](https://lightx2v-en.readthedocs.io/en/latest/) | [中文文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/)**
 
+### Installation from Git
+```bash
+pip install -v git+https://github.com/ModelTC/LightX2V.git
+```
+
+### Building from Source
+```bash
+git clone https://github.com/ModelTC/LightX2V.git
+cd LightX2V
+uv pip install -v . # pip install -v .
+```
+
+### (Optional) Install Attention/Quantize Operators
+For attention operators installation, please refer to our documentation: **[English Docs](https://lightx2v-en.readthedocs.io/en/latest/getting_started/quickstart.html#step-4-install-attention-operators) | [中文文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/quickstart.html#id9)**
+
+### Quick Start
+```python
+# examples/hunyuan_video/hunyuan_t2v.py
+from lightx2v import LightX2VPipeline
+
+pipe = LightX2VPipeline(
+    model_path="/path/to/ckpts/hunyuanvideo-1.5/",
+    model_cls="hunyuan_video_1.5",
+    transformer_model_name="720p_t2v",
+    task="t2v",
+)
+
+pipe.create_generator(
+    attn_mode="sage_attn2",
+    infer_steps=50,
+    num_frames=121,
+    guidance_scale=6.0,
+    sample_shift=9.0,
+    aspect_ratio="16:9",
+    fps=24,
+)
+
+seed = 123
+prompt = "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
+negative_prompt = ""
+save_result_path="/path/to/save_results/output.mp4"
+
+pipe.generate(
+    seed=seed,
+    prompt=prompt,
+    negative_prompt=negative_prompt,
+    save_result_path=save_result_path,
+)
+
+```
+
+> 💡 **More Examples**: For more usage examples including quantization, offloading, caching, and other advanced configurations, please refer to the [examples directory](https://github.com/ModelTC/LightX2V/tree/main/examples).
+
+
 
 ## 🤖 Supported Model Ecosystem
 
@@ -37,15 +91,13 @@ For comprehensive usage instructions, please refer to our documentation: **[Engl
 - ✅ [Qwen-Image-Edit-2509](https://huggingface.co/Qwen/Qwen-Image-Edit-2509)
 
 ### Quantized and Distilled Models/LoRAs (**🚀 Recommended: 4-step inference**)
-- ✅ [Hy1.5-Quantized-Models](https://huggingface.co/lightx2v/Hy1.5-Quantized-Models)
 - ✅ [Wan2.1-Distill-Models](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
 - ✅ [Wan2.2-Distill-Models](https://huggingface.co/lightx2v/Wan2.2-Distill-Models)
 - ✅ [Wan2.1-Distill-Loras](https://huggingface.co/lightx2v/Wan2.1-Distill-Loras)
 - ✅ [Wan2.2-Distill-Loras](https://huggingface.co/lightx2v/Wan2.2-Distill-Loras)
 
 ### Lightweight Autoencoder Models (**🚀 Recommended: fast inference & low memory usage**)
 - ✅ [Autoencoders](https://huggingface.co/lightx2v/Autoencoders)
-
 🔔 Follow our [HuggingFace page](https://huggingface.co/lightx2v) for the latest model releases from our team.
 
 ### Autoregressive Models
 
@@ -20,13 +20,66 @@
 
 ## :fire: 最新动态
 
-- **2025年11月21日:** 🚀 我们Day0支持了[HunyuanVideo-1.5](https://huggingface.co/tencent/HunyuanVideo-1.5)的视频生成模型，同样GPU数量，LightX2V可带来约2倍以上的速度提升，并支持更低显存GPU部署(如24G RTX4090)。支持CFG并行/Ulysses并行，高效Offload，TeaCache/MagCache等技术。同时支持沐曦，寒武纪等国产芯片部署。我们很快将在我们的[HuggingFace主页](https://huggingface.co/lightx2v)更新量化，步数蒸馏，VAE蒸馏等相关模型。使用教程参考[这里](https://github.com/ModelTC/LightX2V/tree/main/scripts/hunyuan_video_15)。
+- **2025年11月21日:** 🚀 我们Day0支持了[HunyuanVideo-1.5](https://huggingface.co/tencent/HunyuanVideo-1.5)的视频生成模型，同样GPU数量，LightX2V可带来约2倍以上的速度提升，并支持更低显存GPU部署(如24G RTX4090)。支持CFG并行/Ulysses并行，高效Offload，TeaCache/MagCache等技术。同时支持沐曦，寒武纪等国产芯片部署。量化模型和轻量VAE模型现已可用：[Hy1.5-Quantized-Models](https://huggingface.co/lightx2v/Hy1.5-Quantized-Models)用于量化推理，[HunyuanVideo-1.5轻量TAE](https://huggingface.co/lightx2v/Autoencoders/blob/main/lighttaehy1_5.safetensors)用于快速VAE解码。我们很快将在我们的[HuggingFace主页](https://huggingface.co/lightx2v)更新更多模型，包括步数蒸馏，VAE蒸馏等相关模型。使用教程参考[这里](https://github.com/ModelTC/LightX2V/tree/main/scripts/hunyuan_video_15)，或查看[示例目录](https://github.com/ModelTC/LightX2V/tree/main/examples)获取代码示例。
 
 
 ## 💡 快速开始
 
 详细使用说明请参考我们的文档：**[英文文档](https://lightx2v-en.readthedocs.io/en/latest/) | [中文文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/)**
 
+### 从 Git 安装
+```bash
+pip install -v git+https://github.com/ModelTC/LightX2V.git
+```
+
+### 从源码构建
+```bash
+git clone https://github.com/ModelTC/LightX2V.git
+cd LightX2V
+uv pip install -v . # pip install -v .
+```
+
+### （可选）安装注意力/量化算子
+注意力算子安装说明请参考我们的文档：**[英文文档](https://lightx2v-en.readthedocs.io/en/latest/getting_started/quickstart.html#step-4-install-attention-operators) | [中文文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/quickstart.html#id9)**
+
+### 快速开始
+```python
+# examples/hunyuan_video/hunyuan_t2v.py
+from lightx2v import LightX2VPipeline
+
+pipe = LightX2VPipeline(
+    model_path="/path/to/ckpts/hunyuanvideo-1.5/",
+    model_cls="hunyuan_video_1.5",
+    transformer_model_name="720p_t2v",
+    task="t2v",
+)
+
+pipe.create_generator(
+    attn_mode="sage_attn2",
+    infer_steps=50,
+    num_frames=121,
+    guidance_scale=6.0,
+    sample_shift=9.0,
+    aspect_ratio="16:9",
+    fps=24,
+)
+
+seed = 123
+prompt = "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
+negative_prompt = ""
+save_result_path="/path/to/save_results/output.mp4"
+
+pipe.generate(
+    seed=seed,
+    prompt=prompt,
+    negative_prompt=negative_prompt,
+    save_result_path=save_result_path,
+)
+
+```
+
+> 💡 **更多示例**: 更多使用案例，包括量化、卸载、缓存等进阶配置，请参考 [examples 目录](https://github.com/ModelTC/LightX2V/tree/main/examples)。
+
 ## 🤖 支持的模型生态
 
 ### 官方开源模型
 
@@ -102,13 +102,46 @@ git clone https://github.com/thu-ml/SageAttention.git
 cd SageAttention && CUDA_ARCHITECTURES="8.0,8.6,8.9,9.0,12.0" EXT_PARALLEL=4 NVCC_APPEND_FLAGS="--threads 8" MAX_JOBS=32 pip install -v -e .
 ```
 
-**Option D: Q8 Kernels**
+#### Step 4: Install Quantization Operators (Optional)
+
+Quantization operators are used to support model quantization, which can significantly reduce memory usage and accelerate inference. Choose the appropriate quantization operator based on your needs:
+
+**Option A: VLLM Kernels (Recommended)**
+Suitable for various quantization schemes, supports FP8 and other quantization formats.
+
+```bash
+pip install vllm
+```
+
+Or install from source for the latest features:
+
+```bash
+git clone https://github.com/vllm-project/vllm.git
+cd vllm
+uv pip install -e .
+```
+
+**Option B: SGL Kernels**
+Suitable for SGL quantization scheme, requires torch == 2.8.0.
+
+```bash
+pip install sgl-kernel --upgrade
+```
+
+**Option C: Q8 Kernels**
+Suitable for Ada architecture GPUs (such as RTX 4090, L40S, etc.).
+
 ```bash
 git clone https://github.com/KONAKONA666/q8_kernels.git
 cd q8_kernels && git submodule init && git submodule update
 python setup.py install
 ```
 
+> 💡 **Note**:
+> - You can skip this step if you don't need quantization functionality
+> - Quantized models can be downloaded from [LightX2V HuggingFace](https://huggingface.co/lightx2v)
+> - For more quantization information, please refer to the [Quantization Documentation](method_tutorials/quantization.html)
+
 #### Step 5: Verify Installation
 
 ```python
@@ -215,8 +248,27 @@ cd LightX2V
 
 # Install Windows-specific dependencies
 pip install -r requirements_win.txt
+pip install -v -e .
+```
+
+#### Step 7: Install Quantization Operators (Optional)
+
+Quantization operators are used to support model quantization, which can significantly reduce memory usage and accelerate inference.
+
+**Install VLLM (Recommended):**
+
+Download the corresponding wheel package from [vllm-windows releases](https://github.com/SystemPanic/vllm-windows/releases) and install it.
+
+```cmd
+# Install vLLM (please adjust according to actual filename)
+pip install vllm-0.9.1+cu124-cp312-cp312-win_amd64.whl
 ```
 
+> 💡 **Note**:
+> - You can skip this step if you don't need quantization functionality
+> - Quantized models can be downloaded from [LightX2V HuggingFace](https://huggingface.co/lightx2v)
+> - For more quantization information, please refer to the [Quantization Documentation](method_tutorials/quantization.html)
+
 ## 🎯 Inference Usage
 
 ### 📥 Model Preparation
@@ -249,6 +301,42 @@ bash scripts/wan/run_wan_t2v.sh
 scripts\win\run_wan_t2v.bat
 ```
 
+#### Python Script Launch
+
+```python
+from lightx2v import LightX2VPipeline
+
+pipe = LightX2VPipeline(
+    model_path="/path/to/Wan2.1-T2V-14B",
+    model_cls="wan2.1",
+    task="t2v",
+)
+
+pipe.create_generator(
+    attn_mode="sage_attn2",
+    infer_steps=50,
+    height=480,  # 720
+    width=832,   # 1280
+    num_frames=81,
+    guidance_scale=5.0,
+    sample_shift=5.0,
+)
+
+seed = 42
+prompt = "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
+negative_prompt = "镜头晃动，色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走"
+save_result_path="/path/to/save_results/output.mp4"
+
+pipe.generate(
+    seed=seed,
+    prompt=prompt,
+    negative_prompt=negative_prompt,
+    save_result_path=save_result_path,
+)
+```
+
+> 💡 **More Examples**: For more usage examples including quantization, offloading, caching, and other advanced configurations, please refer to the [examples directory](https://github.com/ModelTC/LightX2V/tree/main/examples).
+
 ## 📞 Get Help
 
 If you encounter problems during installation or usage, please:
 
@@ -83,7 +83,6 @@ conda activate lightx2v
 pip install -v -e .
 ```
 
-
 #### 步骤 4: 安装注意力机制算子
 
 **选项 A: Flash Attention 2**
@@ -103,13 +102,46 @@ git clone https://github.com/thu-ml/SageAttention.git
 cd SageAttention && CUDA_ARCHITECTURES="8.0,8.6,8.9,9.0,12.0" EXT_PARALLEL=4 NVCC_APPEND_FLAGS="--threads 8" MAX_JOBS=32 pip install -v -e .
 ```
 
-**选项 D: Q8 Kernels**
+#### 步骤 4: 安装量化算子（可选）
+
+量化算子用于支持模型量化功能，可以显著降低显存占用并加速推理。根据您的需求选择合适的量化算子：
+
+**选项 A: VLLM Kernels（推荐）**
+适用于多种量化方案，支持 FP8 等量化格式。
+
+```bash
+pip install vllm
+```
+
+或者从源码安装以获得最新功能：
+
+```bash
+git clone https://github.com/vllm-project/vllm.git
+cd vllm
+uv pip install -e .
+```
+
+**选项 B: SGL Kernels**
+适用于 SGL 量化方案，需要 torch == 2.8.0。
+
+```bash
+pip install sgl-kernel --upgrade
+```
+
+**选项 C: Q8 Kernels**
+适用于 Ada 架构显卡（如 RTX 4090、L40S 等）。
+
 ```bash
 git clone https://github.com/KONAKONA666/q8_kernels.git
 cd q8_kernels && git submodule init && git submodule update
 python setup.py install
 ```
 
+> 💡 **提示**:
+> - 如果不需要使用量化功能，可以跳过此步骤
+> - 量化模型可以从 [LightX2V HuggingFace](https://huggingface.co/lightx2v) 下载
+> - 更多量化相关信息请参考 [量化文档](method_tutorials/quantization.html)
+
 #### 步骤 5: 验证安装
 ```python
 import lightx2v
@@ -215,6 +247,31 @@ cd LightX2V
 
 # 安装 Windows 专用依赖
 pip install -r requirements_win.txt
+pip install -v -e .
+```
+
+#### 步骤 7: 安装量化算子（可选）
+
+量化算子用于支持模型量化功能，可以显著降低显存占用并加速推理。
+
+**安装 VLLM（推荐）：**
+
+从 [vllm-windows releases](https://github.com/SystemPanic/vllm-windows/releases) 下载对应的 wheel 包并安装。
+
+```cmd
+# 安装 vLLM（请根据实际文件名调整）
+pip install vllm-0.9.1+cu124-cp312-cp312-win_amd64.whl
+```
+
+> 💡 **提示**:
+> - 如果不需要使用量化功能，可以跳过此步骤
+> - 量化模型可以从 [LightX2V HuggingFace](https://huggingface.co/lightx2v) 下载
+> - 更多量化相关信息请参考 [量化文档](method_tutorials/quantization.html)
+
+#### 步骤 8: 验证安装
+```python
+import lightx2v
+print(f"LightX2V 版本: {lightx2v.__version__}")
 ```
 
 ## 🎯 推理使用
@@ -248,6 +305,40 @@ bash scripts/wan/run_wan_t2v.sh
 # 使用 Windows 批处理脚本
 scripts\win\run_wan_t2v.bat
 ```
+#### Python脚本启动
+
+```python
+from lightx2v import LightX2VPipeline
+
+pipe = LightX2VPipeline(
+    model_path="/path/to/Wan2.1-T2V-14B",
+    model_cls="wan2.1",
+    task="t2v",
+)
+
+pipe.create_generator(
+    attn_mode="sage_attn2",
+    infer_steps=50,
+    height=480, # 720
+    width=832, # 1280
+    num_frames=81,
+    guidance_scale=5.0,
+    sample_shift=5.0,
+)
+
+seed = 42
+prompt = "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
+negative_prompt = "镜头晃动，色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走"
+save_result_path="/path/to/save_results/output.mp4"
+
+pipe.generate(
+    seed=seed,
+    prompt=prompt,
+    negative_prompt=negative_prompt,
+    save_result_path=save_result_path,
+)
+```
+
 
 ## 📞 获取帮助