ComfyUI-TD-Qwen3TTS

English | 中文

ComfyUI nodes for Qwen3-TTS, supporting high-quality text-to-speech generation, voice design, and voice cloning.

Original Project: QwenLM/Qwen3-TTS

English

Features

Qwen3-TTS Integration: Seamlessly use Qwen3-TTS models within ComfyUI.
Voice Design: Generate speech with specific voice characteristics using natural language prompts.
Voice Cloning: Support for custom voice generation (Voice Cloning) using reference audio or speaker ID.
Multi-Role Dialog: Generate complex conversations with multiple characters, supporting both preset and custom voices.
Batch Generation: Efficiently create and manage multiple custom voices with preview capabilities.
Flexible Configuration: Support for bf16/fp16/fp32 precision and multiple attention implementations (sdpa, flash_attention_2, eager).

Installation

Clone this repository into your ComfyUI custom_nodes directory:

cd ComfyUI/custom_nodes
git clone https://github.com/AICoderTudou/ComfyUI-TD-Qwen3TTS.git

Install dependencies:
```
cd ComfyUI-TD-Qwen3TTS
pip install -r requirements.txt
```
Note: flash-attn is optional but recommended for better performance on NVIDIA GPUs. If installation fails on Windows, the plugin will default to sdpa (Scaled Dot Product Attention).
Download Models:
- Official Download: Hugging Face Collection
- Alternative Download (Quark Drive): https://pan.quark.cn/s/010e3ca25022
- Download Qwen3-TTS models (e.g., Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice).
- Place them in: ComfyUI/models/Qwen3-TTS-Models/.
- The directory structure should look like:
```
ComfyUI/models/Qwen3-TTS-Models/
└── Qwen3-TTS-12Hz-1.7B-CustomVoice/
    ├── config.json
    ├── model.safetensors
    └── ...
```

Usage

Load Model: Use the TD Qwen3 TTS Model Loader node. Select your model, precision, and acceleration method (sdpa is recommended if flash_attn is not installed).
Voice Design: Use TD Qwen3 TTS Voice Design to generate speech from text with a specific voice description (e.g., "A young female voice, cheerful tone").
Custom Voice: Use TD Qwen3 TTS Custom Voice for specific speakers or voice cloning tasks.
Multi-Role Dialog: Use TD Qwen3 TTS Multi Dialog to generate conversations between multiple characters.
- Supports mixing preset voices and custom voices.
- Connect speaker_list to dynamically inject custom characters.
- Consistent Voices: Uses deterministic seeding to ensure the same character always sounds the same.
Batch Speaker Generation: Use TD Qwen3 TTS Batch Generate Speaker to create multiple custom voices at once.
- Visual Manager: Click "Manage Speakers" to add/edit roles and preview generated audio directly in the UI.
- Dynamic Input: Supports passing a JSON string or Python List string (e.g., [{'name': 'Role', 'instruct': '...'}]) to the speakers_config input.
- Connects seamlessly to the Multi Dialog node.
Define Speaker: Use TD Qwen3 TTS Define Speaker to create a single custom speaker from an audio reference (Voice Cloning).
JSON Parsing: Use TD Parse Json to parse JSON strings or Python lists/dicts and extract specific values by key or index. useful for handling dynamic configurations.

License

This project is licensed under the Apache 2.0 License.

中文

功能特点

Qwen3-TTS 集成: 在 ComfyUI 中无缝使用 Qwen3-TTS 模型。
语音设计 (Voice Design): 通过自然语言提示词生成具有特定特征的语音。
语音克隆 (Voice Cloning): 支持使用参考音频或说话人 ID 生成自定义语音。
多角色对话 (Multi-Role Dialog): 生成包含多个角色的复杂对话，支持预设声音和自定义声音的混合使用。
批量生成 (Batch Generation): 高效创建和管理多个自定义声音，并支持实时预览。
灵活配置: 支持 bf16/fp16/fp32 精度选择，以及多种注意力加速方式 (sdpa, flash_attention_2, eager)。

原项目开源地址: QwenLM/Qwen3-TTS

安装说明

将本仓库克隆到您的 ComfyUI custom_nodes 目录下：

cd ComfyUI/custom_nodes
git clone https://github.com/your-username/ComfyUI-TD-Qwen3TTS.git

安装依赖：
```
cd ComfyUI-TD-Qwen3TTS
pip install -r requirements.txt
```
注意：flash-attn 是可选的，但在 NVIDIA 显卡上推荐使用以获得更好性能。如果在 Windows 上安装失败，插件将默认使用 sdpa (Scaled Dot Product Attention) 加速，无需额外操作。
下载模型：
- 模型官方下载地址: Hugging Face Collection
- 模型网盘下载地址 (夸克): https://pan.quark.cn/s/010e3ca25022
- 下载 Qwen3-TTS 模型 (例如 Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice)。
- 将模型文件夹放置在：ComfyUI/models/Qwen3-TTS-Models/ 目录下。
- 目录结构应如下所示：
```
ComfyUI/models/Qwen3-TTS-Models/
└── Qwen3-TTS-12Hz-1.7B-CustomVoice/
    ├── config.json
    ├── model.safetensors
    └── ...
```

使用方法

加载模型: 使用 TD Qwen3 TTS Model Loader 节点。选择您的模型、精度和加速方式（如果未安装 flash_attn，推荐使用 sdpa）。
语音设计: 使用 TD Qwen3 TTS Voice Design 节点，通过输入提示词（如“一个年轻女性的声音，语气愉快”）来生成语音。
自定义语音: 使用 TD Qwen3 TTS Custom Voice 节点进行特定说话人或语音克隆任务。
多角色对话: 使用 TD Qwen3 TTS Multi Dialog 节点生成多个角色之间的对话。
- 支持混合预设声音和自定义声音。
- 连接 speaker_list 以动态注入自定义角色。
- 声音一致性: 使用确定性种子 (seed) 确保同一角色的声音始终保持一致。
批量生成角色: 使用 TD Qwen3 TTS Batch Generate Speaker 节点一次性创建多个自定义声音。
- 可视化管理: 点击“Manage Speakers”添加/编辑角色，并可直接在 UI 中预览生成的音频。
- 动态输入: 支持将 JSON 字符串或 Python 列表字符串（例如 [{'name': '角色名', 'instruct': '...'}]）传递给 speakers_config 输入端。
- 可无缝连接到多角色对话节点。
定义角色: 使用 TD Qwen3 TTS Define Speaker 节点通过参考音频创建一个自定义角色（语音克隆）。
JSON 解析: 使用 TD Parse Json 节点解析 JSON 字符串或 Python 列表/字典，并通过键名或索引提取特定值。适用于处理动态配置数据。

更新日期: 2026-01-27

许可证

本项目采用 Apache 2.0 许可证进行授权。

常见问题 (FAQ)

1. "Model type 'base' does not support preset speaker 'Vivian'..." 报错怎么办？

原因：您正在使用 Base 模型（例如 Qwen3-TTS-12Hz-1.7B-Base）。Base 模型主要用于语音克隆，不包含任何预设说话人（如 Vivian, Serena 等）。预设说话人仅存在于 CustomVoice 模型中。
解决方法：
- 方法 A（推荐）：如果您想使用预设说话人，请在加载器中选择 CustomVoice 版本的模型。
- 方法 B（语音克隆）：如果您必须使用 Base 模型，您需要为您使用的每一个角色名（包括 "Vivian"）提供参考音频。使用 TD Qwen3 TTS Define Speaker 节点上传一段音频，将 name 设置为对应的角色名（如 "Vivian"），然后连接到对话节点的 speaker_list。

2. 日志显示 "code_predictor_config is None"

原因：这是正常的提示信息 (INFO)，并非错误。它表示模型加载时检测到配置文件中缺少部分非必要配置，并自动使用了正确的默认值进行初始化。
影响：无任何负面影响。模型会自动使用默认参数（hidden_size=1024等）完成初始化，您可以忽略此日志继续使用。

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Qwen3-TTS		Qwen3-TTS
WorkFlow		WorkFlow
js		js
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
nodes.py		nodes.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ComfyUI-TD-Qwen3TTS

English

Features

Installation

Usage

License

中文

功能特点

安装说明

使用方法

许可证

常见问题 (FAQ)

1. "Model type 'base' does not support preset speaker 'Vivian'..." 报错怎么办？

2. 日志显示 "code_predictor_config is None"

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

ComfyUI-TD-Qwen3TTS

English

Features

Installation

Usage

License

中文

功能特点

安装说明

使用方法

许可证

常见问题 (FAQ)

1. "Model type 'base' does not support preset speaker 'Vivian'..." 报错怎么办？

2. 日志显示 "code_predictor_config is None"

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages