Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 20 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,13 +113,16 @@ Open http://localhost:7860 (Gradio) or http://localhost:8001 (API).

### 💡 Which Model Should I Choose?

| Your GPU VRAM | Recommended LM Model | Backend | Notes |
|---------------|---------------------|---------|-------|
| **≤6GB** | None (DiT only) | — | LM disabled by default; INT8 quantization + full CPU offload |
| **6-8GB** | `acestep-5Hz-lm-0.6B` | `pt` | Lightweight LM with PyTorch backend |
| **8-16GB** | `acestep-5Hz-lm-0.6B` / `1.7B` | `vllm` | 0.6B for 8-12GB, 1.7B for 12-16GB |
| **16-24GB** | `acestep-5Hz-lm-1.7B` | `vllm` | 4B available on 20GB+; no offload needed on 20GB+ |
| **≥24GB** | `acestep-5Hz-lm-4B` | `vllm` | Best quality, all models fit without offload |
| Your GPU VRAM | Recommended DiT | Recommended LM Model | Backend | Notes |
|---------------|----------------|---------------------|---------|-------|
| **≤6GB** | 2B turbo | None (DiT only) | — | LM disabled by default; INT8 quantization + full CPU offload |
| **6-8GB** | 2B turbo | `acestep-5Hz-lm-0.6B` | `pt` | Lightweight LM with PyTorch backend |
| **8-16GB** | 2B turbo/sft | `acestep-5Hz-lm-0.6B` / `1.7B` | `vllm` | 0.6B for 8-12GB, 1.7B for 12-16GB |
| **16-20GB** | 2B sft or XL turbo | `acestep-5Hz-lm-1.7B` | `vllm` | XL requires CPU offload below 20GB |
| **20-24GB** | XL turbo/sft | `acestep-5Hz-lm-1.7B` | `vllm` | XL fits without offload; 4B LM available |
| **≥24GB** | XL sft (or xl-base for extract/lego/complete) | `acestep-5Hz-lm-4B` | `vllm` | Best quality, all models fit without offload |

> **XL (4B) models** (`acestep-v15-xl-*`) offer higher audio quality with ~9GB VRAM for weights (vs ~4.7GB for 2B). They require ≥12GB VRAM (with offload + quantization) or ≥20GB (without offload). All LM models are fully compatible with XL.

The UI automatically selects the best configuration for your GPU. All settings (LM model, backend, offloading, quantization) are tier-aware and pre-configured.

Expand Down Expand Up @@ -244,6 +247,16 @@ See also the **LoRA Training** tab in Gradio UI for one-click training, or [Grad
| `acestep-v15-turbo` | ✅ | ✅ | ❌ | ❌ | 8 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Medium | [Link](https://huggingface.co/ACE-Step/Ace-Step1.5) |
| `acestep-v15-turbo-rl` | ✅ | ✅ | ✅ | ❌ | 8 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Medium | To be released |

### XL (4B) DiT Models

> XL models use a larger 4B-parameter DiT decoder (~9GB bf16) for higher audio quality. They require ≥12GB VRAM (with offload + quantization) or ≥20GB (without offload). All LM models are fully compatible.

| DiT Model | Pre-Training | SFT | RL | CFG | Step | Refer audio | Text2Music | Cover | Repaint | Extract | Lego | Complete | Quality | Diversity | Fine-Tunability | Hugging Face |
|-----------|:------------:|:---:|:--:|:---:|:----:|:-----------:|:----------:|:-----:|:-------:|:-------:|:----:|:--------:|:-------:|:---------:|:---------------:|--------------|
| `acestep-v15-xl-base` | ✅ | ❌ | ❌ | ✅ | 50 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | High | High | Easy | [Link](https://huggingface.co/ACE-Step/acestep-v15-xl-base) |
| `acestep-v15-xl-sft` | ✅ | ✅ | ❌ | ✅ | 50 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Easy | [Link](https://huggingface.co/ACE-Step/acestep-v15-xl-sft) |
| `acestep-v15-xl-turbo` | ✅ | ✅ | ❌ | ❌ | 8 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Medium | [Link](https://huggingface.co/ACE-Step/acestep-v15-xl-turbo) |
Comment on lines +256 to +258
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace generic [Link] text with descriptive labels (MD059).

At Line 256-258, link text is non-descriptive and will keep markdownlint warnings active.

Suggested doc-only patch
-| `acestep-v15-xl-base` | ✅ | ❌ | ❌ | ✅ | 50 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | High | High | Easy | [Link](https://huggingface.co/ACE-Step/acestep-v15-xl-base) |
-| `acestep-v15-xl-sft` | ✅ | ✅ | ❌ | ✅ | 50 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Easy | [Link](https://huggingface.co/ACE-Step/acestep-v15-xl-sft) |
-| `acestep-v15-xl-turbo` | ✅ | ✅ | ❌ | ❌ | 8 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Medium | [Link](https://huggingface.co/ACE-Step/acestep-v15-xl-turbo) |
+| `acestep-v15-xl-base` | ✅ | ❌ | ❌ | ✅ | 50 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | High | High | Easy | [acestep-v15-xl-base on Hugging Face](https://huggingface.co/ACE-Step/acestep-v15-xl-base) |
+| `acestep-v15-xl-sft` | ✅ | ✅ | ❌ | ✅ | 50 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Easy | [acestep-v15-xl-sft on Hugging Face](https://huggingface.co/ACE-Step/acestep-v15-xl-sft) |
+| `acestep-v15-xl-turbo` | ✅ | ✅ | ❌ | ❌ | 8 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Medium | [acestep-v15-xl-turbo on Hugging Face](https://huggingface.co/ACE-Step/acestep-v15-xl-turbo) |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| `acestep-v15-xl-base` ||||| 50 |||||||| High | High | Easy | [Link](https://huggingface.co/ACE-Step/acestep-v15-xl-base) |
| `acestep-v15-xl-sft` ||||| 50 |||||||| Very High | Medium | Easy | [Link](https://huggingface.co/ACE-Step/acestep-v15-xl-sft) |
| `acestep-v15-xl-turbo` ||||| 8 |||||||| Very High | Medium | Medium | [Link](https://huggingface.co/ACE-Step/acestep-v15-xl-turbo) |
| `acestep-v15-xl-base` ||||| 50 |||||||| High | High | Easy | [acestep-v15-xl-base on Hugging Face](https://huggingface.co/ACE-Step/acestep-v15-xl-base) |
| `acestep-v15-xl-sft` ||||| 50 |||||||| Very High | Medium | Easy | [acestep-v15-xl-sft on Hugging Face](https://huggingface.co/ACE-Step/acestep-v15-xl-sft) |
| `acestep-v15-xl-turbo` ||||| 8 |||||||| Very High | Medium | Medium | [acestep-v15-xl-turbo on Hugging Face](https://huggingface.co/ACE-Step/acestep-v15-xl-turbo) |
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 256-256: Link text should be descriptive

(MD059, descriptive-link-text)


[warning] 257-257: Link text should be descriptive

(MD059, descriptive-link-text)


[warning] 258-258: Link text should be descriptive

(MD059, descriptive-link-text)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 256 - 258, The table rows use non-descriptive link
text "[Link]" for each model entry (`acestep-v15-xl-base`, `acestep-v15-xl-sft`,
`acestep-v15-xl-turbo`); update each markdown link so the anchor text describes
the destination (for example "Hugging Face — acestep-v15-xl-base", "Hugging Face
— acestep-v15-xl-sft", "Hugging Face — acestep-v15-xl-turbo" or similar) instead
of generic "[Link]" to satisfy MD059 and improve accessibility.


### LM Models

| LM Model | Pretrain from | Pre-Training | SFT | RL | CoT metas | Query rewrite | Audio Understanding | Composition Capability | Copy Melody | Hugging Face |
Expand Down
26 changes: 14 additions & 12 deletions docs/en/GPU_COMPATIBILITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,18 @@ ACE-Step 1.5 automatically adapts to your GPU's available VRAM, adjusting genera

## GPU Tier Configuration

| VRAM | Tier | LM Models | Recommended LM | Backend | Max Duration (LM / No LM) | Max Batch (LM / No LM) | Offload | Quantization |
|------|------|-----------|-----------------|---------|----------------------------|-------------------------|---------|--------------|
| ≤4GB | Tier 1 | None | — | pt | 4 min / 6 min | 1 / 1 | CPU + DiT | INT8 |
| 4-6GB | Tier 2 | None | — | pt | 8 min / 10 min | 1 / 1 | CPU + DiT | INT8 |
| 6-8GB | Tier 3 | 0.6B | 0.6B | pt | 8 min / 10 min | 2 / 2 | CPU + DiT | INT8 |
| 8-12GB | Tier 4 | 0.6B | 0.6B | vllm | 8 min / 10 min | 2 / 4 | CPU + DiT | INT8 |
| 12-16GB | Tier 5 | 0.6B, 1.7B | 1.7B | vllm | 8 min / 10 min | 4 / 4 | CPU | INT8 |
| 16-20GB | Tier 6a | 0.6B, 1.7B | 1.7B | vllm | 8 min / 10 min | 4 / 8 | CPU | INT8 |
| 20-24GB | Tier 6b | 0.6B, 1.7B, 4B | 1.7B | vllm | 8 min / 8 min | 8 / 8 | None | None |
| ≥24GB | Unlimited | All (0.6B, 1.7B, 4B) | 4B | vllm | 10 min / 10 min | 8 / 8 | None | None |
| VRAM | Tier | XL (4B) DiT | LM Models | Recommended LM | Backend | Max Duration (LM / No LM) | Max Batch (LM / No LM) | Offload | Quantization |
|------|------|:-----------:|-----------|-----------------|---------|----------------------------|-------------------------|---------|--------------|
| ≤4GB | Tier 1 | ❌ | None | — | pt | 4 min / 6 min | 1 / 1 | CPU + DiT | INT8 |
| 4-6GB | Tier 2 | ❌ | None | — | pt | 8 min / 10 min | 1 / 1 | CPU + DiT | INT8 |
| 6-8GB | Tier 3 | ❌ | 0.6B | 0.6B | pt | 8 min / 10 min | 2 / 2 | CPU + DiT | INT8 |
| 8-12GB | Tier 4 | ❌ | 0.6B | 0.6B | vllm | 8 min / 10 min | 2 / 4 | CPU + DiT | INT8 |
| 12-16GB | Tier 5 | ⚠️ | 0.6B, 1.7B | 1.7B | vllm | 8 min / 10 min | 4 / 4 | CPU | INT8 |
| 16-20GB | Tier 6a | ✅ (offload) | 0.6B, 1.7B | 1.7B | vllm | 8 min / 10 min | 4 / 8 | CPU | INT8 |
| 20-24GB | Tier 6b | ✅ | 0.6B, 1.7B, 4B | 1.7B | vllm | 8 min / 8 min | 8 / 8 | None | None |
| ≥24GB | Unlimited | ✅ | All (0.6B, 1.7B, 4B) | 4B | vllm | 10 min / 10 min | 8 / 8 | None | None |

> **XL (4B) DiT column**: ❌ = not supported, ⚠️ = marginal (offload + quantization required, reduced batch; works on 12-16GB with aggressive offload), ✅ (offload) = supported with CPU offload, ✅ = fully supported. XL models use ~9GB VRAM for weights (vs ~4.7GB for 2B). All LM models are compatible with XL.

### Column Descriptions

Expand Down Expand Up @@ -75,8 +77,8 @@ The detection happens at startup in `acestep/gpu_config.py` (`is_legacy_cuda_gpu
1. **Very Low VRAM (≤6GB)**: Use DiT-only mode without LM initialization. INT8 quantization and full CPU offload are mandatory. VAE decode may fall back to CPU automatically.
2. **Low VRAM (6-8GB)**: The 0.6B LM model can be used with `pt` backend. Keep offload enabled.
3. **Medium VRAM (8-16GB)**: Use the 0.6B or 1.7B LM model. `vllm` backend works well on Tier 4+.
4. **High VRAM (16-24GB)**: Enable larger LM models (1.7B recommended). Quantization becomes optional on 20GB+.
5. **Very High VRAM (≥24GB)**: All models fit without offloading or quantization. Use 4B LM for best quality.
4. **High VRAM (16-24GB)**: Enable larger LM models (1.7B recommended). Quantization becomes optional on 20GB+. XL (4B) DiT models are supported — with offload on 16GB, without offload on 20GB+.
5. **Very High VRAM (≥24GB)**: All models fit without offloading or quantization. Use XL DiT + 4B LM for best quality.

## Debug Mode: Simulating Different GPU Configurations

Expand Down
25 changes: 17 additions & 8 deletions docs/en/INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -650,9 +650,14 @@ python -m acestep.model_downloader --all # Download all models
# Main model (vae, Qwen3-Embedding-0.6B, acestep-v15-turbo, acestep-5Hz-lm-1.7B)
huggingface-cli download ACE-Step/Ace-Step1.5 --local-dir ./checkpoints

# Optional models
# Optional LM models
huggingface-cli download ACE-Step/acestep-5Hz-lm-0.6B --local-dir ./checkpoints/acestep-5Hz-lm-0.6B
huggingface-cli download ACE-Step/acestep-5Hz-lm-4B --local-dir ./checkpoints/acestep-5Hz-lm-4B

# XL (4B) DiT models - requires ≥12GB VRAM (with offload)
huggingface-cli download ACE-Step/acestep-v15-xl-base --local-dir ./checkpoints/acestep-v15-xl-base
huggingface-cli download ACE-Step/acestep-v15-xl-sft --local-dir ./checkpoints/acestep-v15-xl-sft
huggingface-cli download ACE-Step/acestep-v15-xl-turbo --local-dir ./checkpoints/acestep-v15-xl-turbo
```

### Available Models
Expand All @@ -667,20 +672,24 @@ huggingface-cli download ACE-Step/acestep-5Hz-lm-4B --local-dir ./checkpoints/ac
| acestep-v15-turbo-shift1 | Turbo DiT with shift1 | [Link](https://huggingface.co/ACE-Step/acestep-v15-turbo-shift1) |
| acestep-v15-turbo-shift3 | Turbo DiT with shift3 | [Link](https://huggingface.co/ACE-Step/acestep-v15-turbo-shift3) |
| acestep-v15-turbo-continuous | Turbo DiT with continuous shift (1-5) | [Link](https://huggingface.co/ACE-Step/acestep-v15-turbo-continuous) |
| **acestep-v15-xl-base** | XL (4B) Base DiT — higher quality, ≥12GB VRAM | [Link](https://huggingface.co/ACE-Step/acestep-v15-xl-base) |
| **acestep-v15-xl-sft** | XL (4B) SFT DiT — higher quality, ≥12GB VRAM | [Link](https://huggingface.co/ACE-Step/acestep-v15-xl-sft) |
| **acestep-v15-xl-turbo** | XL (4B) Turbo DiT — higher quality, ≥12GB VRAM | [Link](https://huggingface.co/ACE-Step/acestep-v15-xl-turbo) |

---

## 💡 Which Model Should I Choose?

ACE-Step automatically adapts to your GPU's VRAM. The UI pre-configures all settings (LM model, backend, offloading, quantization) based on your detected GPU tier:

| Your GPU VRAM | Recommended LM Model | Backend | Notes |
|---------------|---------------------|---------|-------|
| **≤6GB** | None (DiT only) | — | LM disabled by default; INT8 quantization + full CPU offload |
| **6-8GB** | `acestep-5Hz-lm-0.6B` | `pt` | Lightweight LM with PyTorch backend |
| **8-16GB** | `0.6B` / `1.7B` | `vllm` | 0.6B for 8-12GB, 1.7B for 12-16GB |
| **16-24GB** | `acestep-5Hz-lm-1.7B` | `vllm` | 4B available on 20GB+; no offload on 20GB+ |
| **≥24GB** | `acestep-5Hz-lm-4B` | `vllm` | Best quality, all models fit without offload |
| Your GPU VRAM | Recommended DiT | Recommended LM Model | Backend | Notes |
|---------------|----------------|---------------------|---------|-------|
| **≤6GB** | 2B turbo | None (DiT only) | — | LM disabled; INT8 quantization + full CPU offload |
| **6-8GB** | 2B turbo | `acestep-5Hz-lm-0.6B` | `pt` | Lightweight LM with PyTorch backend |
| **8-16GB** | 2B turbo/sft | `0.6B` / `1.7B` | `vllm` | 0.6B for 8-12GB, 1.7B for 12-16GB |
| **16-20GB** | 2B sft or XL turbo | `acestep-5Hz-lm-1.7B` | `vllm` | XL requires CPU offload below 20GB |
| **20-24GB** | XL turbo/sft | `acestep-5Hz-lm-1.7B` | `vllm` | XL fits without offload; 4B LM available |
| **≥24GB** | XL sft (or xl-base for extract/lego/complete) | `acestep-5Hz-lm-4B` | `vllm` | Best quality, all models fit without offload |

> 📖 For detailed GPU compatibility information (tier table, duration limits, batch sizes, adaptive UI defaults, memory optimization), see [GPU Compatibility Guide](GPU_COMPATIBILITY.md).

Expand Down
17 changes: 16 additions & 1 deletion docs/en/Tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ Based on your hardware:

With a planning scheme, you still need to choose an executor. DiT is the core of ACE-Step 1.5—it handles various tasks and decides how to interpret LM-generated codes.

We've open-sourced **4 Turbo models**, **1 SFT model**, and **1 Base model**.
We've open-sourced **4 Turbo models**, **1 SFT model**, and **1 Base model** — plus their **XL (4B)** counterparts for higher audio quality.

#### Turbo Series (Recommended for Daily Use)

Expand Down Expand Up @@ -250,13 +250,28 @@ This greatly expands **customization and playability**—train a model unique to

> For the detailed LoRA training guide, see the [LoRA Training Tutorial](./LoRA_Training_Tutorial.md). You can also use the "LoRA Training" tab in Gradio UI for one-click training.

#### XL (4B) Models

XL models use a larger 4B-parameter DiT decoder for higher audio quality. They come in the same three variants (base, sft, turbo) and behave identically — just with better generation quality. **All LM models (0.6B / 1.7B / 4B) are fully compatible with XL.**

**Requirements:** XL models need ~9GB VRAM for weights (vs ~4.7GB for 2B). Minimum 12GB VRAM with offload + quantization, 20GB+ recommended.

| XL Model (full name) | Steps | CFG | VRAM | Notes |
|----------------------|:-----:|:---:|:----:|-------|
| `acestep-v15-xl-turbo` | 8 | ❌ | ≥12GB | Fast + high quality, best daily driver on 20GB+ GPUs |
| `acestep-v15-xl-sft` | 50 | ✅ | ≥12GB | Highest quality, tunable CFG |
| `acestep-v15-xl-base` | 50 | ✅ | ≥12GB | All tasks including extract/lego/complete |

#### DiT Selection Summary

| Model | Steps | CFG | Speed | Exclusive Tasks | Recommended Scenarios |
|-------|:-----:|:---:|:-----:|-----------------|----------------------|
| `turbo` (default) | 8 | ❌ | ⚡⚡⚡ | — | Daily use, rapid iteration |
| `sft` | 50 | ✅ | ⚡ | — | Pursuing details, like tuning |
| `base` | 50 | ✅ | ⚡ | extract, lego, complete | Special tasks, large-scale fine-tuning |
| **`acestep-v15-xl-turbo`** | 8 | ❌ | ⚡⚡ | — | Best daily driver on 20GB+ GPUs |
| **`acestep-v15-xl-sft`** | 50 | ✅ | ⚡ | — | Highest quality, ≥12GB VRAM |
| **`acestep-v15-xl-base`** | 50 | ✅ | ⚡ | extract, lego, complete | All tasks with higher quality, ≥12GB VRAM |

### Combination Strategies

Expand Down
22 changes: 12 additions & 10 deletions docs/ja/GPU_COMPATIBILITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,18 @@ ACE-Step 1.5 は GPU の VRAM に自動的に適応し、生成時間の制限

## GPU ティア構成

| VRAM | ティア | LM モデル | 推奨 LM | バックエンド | 最大時間 (LM有 / LM無) | 最大バッチ (LM有 / LM無) | オフロード | 量子化 |
|------|--------|-----------|---------|-------------|------------------------|--------------------------|------------|--------|
| ≤4GB | Tier 1 | なし | — | pt | 4分 / 6分 | 1 / 1 | CPU + DiT | INT8 |
| 4-6GB | Tier 2 | なし | — | pt | 8分 / 10分 | 1 / 1 | CPU + DiT | INT8 |
| 6-8GB | Tier 3 | 0.6B | 0.6B | pt | 8分 / 10分 | 2 / 2 | CPU + DiT | INT8 |
| 8-12GB | Tier 4 | 0.6B | 0.6B | vllm | 8分 / 10分 | 2 / 4 | CPU + DiT | INT8 |
| 12-16GB | Tier 5 | 0.6B, 1.7B | 1.7B | vllm | 8分 / 10分 | 4 / 4 | CPU | INT8 |
| 16-20GB | Tier 6a | 0.6B, 1.7B | 1.7B | vllm | 8分 / 10分 | 4 / 8 | CPU | INT8 |
| 20-24GB | Tier 6b | 0.6B, 1.7B, 4B | 1.7B | vllm | 8分 / 8分 | 8 / 8 | なし | なし |
| ≥24GB | 無制限 | 全モデル (0.6B, 1.7B, 4B) | 4B | vllm | 10分 / 10分 | 8 / 8 | なし | なし |
| VRAM | ティア | XL (4B) DiT | LM モデル | 推奨 LM | バックエンド | 最大時間 (LM有 / LM無) | 最大バッチ (LM有 / LM無) | オフロード | 量子化 |
|------|--------|:-----------:|-----------|---------|-------------|------------------------|--------------------------|------------|--------|
| ≤4GB | Tier 1 | ❌ | なし | — | pt | 4分 / 6分 | 1 / 1 | CPU + DiT | INT8 |
| 4-6GB | Tier 2 | ❌ | なし | — | pt | 8分 / 10分 | 1 / 1 | CPU + DiT | INT8 |
| 6-8GB | Tier 3 | ❌ | 0.6B | 0.6B | pt | 8分 / 10分 | 2 / 2 | CPU + DiT | INT8 |
| 8-12GB | Tier 4 | ❌ | 0.6B | 0.6B | vllm | 8分 / 10分 | 2 / 4 | CPU + DiT | INT8 |
| 12-16GB | Tier 5 | ⚠️ | 0.6B, 1.7B | 1.7B | vllm | 8分 / 10分 | 4 / 4 | CPU | INT8 |
| 16-20GB | Tier 6a | ✅ (オフロード) | 0.6B, 1.7B | 1.7B | vllm | 8分 / 10分 | 4 / 8 | CPU | INT8 |
| 20-24GB | Tier 6b | ✅ | 0.6B, 1.7B, 4B | 1.7B | vllm | 8分 / 8分 | 8 / 8 | なし | なし |
| ≥24GB | 無制限 | ✅ | 全モデル (0.6B, 1.7B, 4B) | 4B | vllm | 10分 / 10分 | 8 / 8 | なし | なし |

> **XL (4B) DiT 列**: ❌ = 非対応, ⚠️ = 限定的(オフロード + 量子化が必要、12-16GBでは積極的なオフロードで動作可能), ✅ (オフロード) = CPUオフロードで対応, ✅ = 完全対応。XLモデルの重みは約9GB(bf16)、2Bは約4.7GB。すべてのLMモデルがXLと互換性があります。

### 列の説明

Expand Down
15 changes: 8 additions & 7 deletions docs/ja/INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -531,13 +531,14 @@ huggingface-cli download ACE-Step/acestep-5Hz-lm-4B --local-dir ./checkpoints/ac

ACE-Step は GPU の VRAM に自動適応します。UI は検出された GPU ティアに基づいてすべての設定(LM モデル、バックエンド、オフロード、量子化)を事前構成します:

| GPU VRAM | 推奨 LM モデル | バックエンド | 備考 |
|----------|---------------|-------------|------|
| **≤6GB** | なし(DiTのみ) | — | LM はデフォルトで無効;INT8 量子化 + 完全 CPU オフロード |
| **6-8GB** | `acestep-5Hz-lm-0.6B` | `pt` | 軽量 LM、PyTorch バックエンド |
| **8-16GB** | `0.6B` / `1.7B` | `vllm` | 8-12GB は 0.6B、12-16GB は 1.7B |
| **16-24GB** | `acestep-5Hz-lm-1.7B` | `vllm` | 20GB+ で 4B 利用可能;20GB+ でオフロード不要 |
| **≥24GB** | `acestep-5Hz-lm-4B` | `vllm` | 最高品質、すべてのモデルがオフロードなしで動作 |
| GPU VRAM | 推奨 DiT | 推奨 LM モデル | バックエンド | 備考 |
|----------|---------|---------------|-------------|------|
| **≤6GB** | 2B turbo | なし(DiTのみ) | — | LM はデフォルトで無効;INT8 量子化 + 完全 CPU オフロード |
| **6-8GB** | 2B turbo | `acestep-5Hz-lm-0.6B` | `pt` | 軽量 LM、PyTorch バックエンド |
| **8-16GB** | 2B turbo/sft | `0.6B` / `1.7B` | `vllm` | 8-12GB は 0.6B、12-16GB は 1.7B |
| **16-20GB** | 2B sft または XL turbo | `acestep-5Hz-lm-1.7B` | `vllm` | XL は 20GB 未満で CPU オフロードが必要 |
| **20-24GB** | XL turbo/sft | `acestep-5Hz-lm-1.7B` | `vllm` | XL はオフロード不要;4B LM 利用可能 |
| **≥24GB** | XL sft(extract/lego/complete には xl-base) | `acestep-5Hz-lm-4B` | `vllm` | 最高品質、すべてのモデルがオフロードなしで動作 |

> 📖 GPU 互換性の詳細(ティアテーブル、時間制限、バッチサイズ、アダプティブ UI デフォルト、メモリ最適化)は [GPU 互換性ガイド](GPU_COMPATIBILITY.md) を参照してください。

Expand Down
15 changes: 15 additions & 0 deletions docs/ja/Tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,13 +250,28 @@ Baseは**タスクの集大成**で、SFTとTurboを超える3つの独占タス

> LoRA訓練の詳細ガイドについては、[LoRA トレーニングチュートリアル](./LoRA_Training_Tutorial.md)を参照してください。Gradio UIの「LoRA Training」タブからワンクリックで訓練することもできます。

#### XL (4B) モデル

XLモデルは、より大きな4BパラメータのDiTデコーダを使用し、生成品質が向上します。標準モデルと同じ3つのバリアント(base、sft、turbo)で、動作は完全に同じです。**すべてのLMモデル(0.6B / 1.7B / 4B)がXLと完全互換です。**

**必要条件:** XLモデルの重みは約9GB(bf16)、2Bは約4.7GB。最小12GB VRAM(オフロード + 量子化が必要)、20GB+推奨。

| XLモデル(フルネーム) | ステップ | CFG | VRAM | 備考 |
|----------------------|:-------:|:---:|:----:|------|
| `acestep-v15-xl-turbo` | 8 | ❌ | ≥12GB | 高速 + 高品質、20GB+ GPUの最適な日常選択 |
| `acestep-v15-xl-sft` | 50 | ✅ | ≥12GB | 最高品質、CFG調整可能 |
| `acestep-v15-xl-base` | 50 | ✅ | ≥12GB | extract/lego/completeを含む全タスク対応 |

#### DiT選択のまとめ

| モデル | ステップ | CFG | 速度 | 独占タスク | 推奨シナリオ |
|--------|:-------:|:---:|:----:|----------|------------|
| `turbo`(デフォルト) | 8 | ❌ | ⚡⚡⚡ | — | 日常使用、迅速な反復 |
| `sft` | 50 | ✅ | ⚡ | — | 詳細を追求、調整が好き |
| `base` | 50 | ✅ | ⚡ | extract, lego, complete | 特殊タスク、大規模微調整 |
| **`acestep-v15-xl-turbo`** | 8 | ❌ | ⚡⚡ | — | 20GB+ GPUの最適な日常選択 |
| **`acestep-v15-xl-sft`** | 50 | ✅ | ⚡ | — | 最高品質、≥12GB VRAM |
| **`acestep-v15-xl-base`** | 50 | ✅ | ⚡ | extract, lego, complete | より高品質な全タスク対応、≥12GB VRAM |

### 組み合わせ戦略

Expand Down
Loading
Loading