LEMAS‑TTS is a multilingual zero‑shot text‑to‑speech system, supporting 10 languages:
- Chinese
- English
- Spanish
- Russian
- French
- German
- Italian
- Portuguese
- Indonesian
- Vietnamese
git clone https://github.com/LEMAS-Project/LEMAS-TTS.git
cd ./LEMAS-TTS
# create a dedicated environment
conda create -n lemas-tts python=3.10
conda activate lemas-ttsyou can install the system dependencies as follows or via anaconda:
sudo apt-get update
sudo apt-get install -y ffmpegor
conda install -c conda-forge ffmpegpip install -r requirements.txt
# or, if you package it locally:
# pip install -e .(Install PyTorch + Torchaudio according to your device (CUDA / ROCm / CPU / MPS), following the official PyTorch instructions.)
Download the pretrained models from https://huggingface.co/LEMAS-Project/LEMAS-TTS
Then place the pretrained_models/ folder next to the lemas_tts/ package
root; the code locates the repo root by looking for this folder.
All commands below assume:
cd ./LEMAS-TTS
export PYTHONPATH="$PWD:${PYTHONPATH}"You can try the model via our Hugging Face space: https://huggingface.co/spaces/LEMAS-Project/LEMAS-TTS
Locally, you can run the Gradio web app with:
python lemas_tts/scripts/inference_gradio.pyYou can customize host/port and sharing:
python lemas_tts/scripts/inference_gradio.py --host 0.0.0.0 --port 7860 --shareFor simple TTS (text only, without reference audio), use:
- Python entry:
lemas_tts.scripts.tts_multilingual - Shell helper:
lemas_tts/scripts/tts_multilingual.sh
Example:
cd ./LEMAS-TTS
bash lemas_tts/scripts/tts_multilingual.shThe shell script demonstrates how to:
- Select
multilingual_grlormultilingual_prosody - Point to
pretrained_models/ckpts/...andpretrained_models/data/... - Choose frontend type (currently only support
phone) - Configure sampling parameters: NFE steps, CFG strength, Sway, speed, etc.
Or you can call the Python module directly, following the examples in bash scripts.
You can enable UVR5 denoising on the reference audio via --denoise.
For editing a region of an utterance given word‑level alignment JSONs, use:
- Python entry:
lemas_tts.scripts.speech_edit_multilingual - Shell helper:
lemas_tts/scripts/speech_edit_multilingual.sh
The Python script expects:
--wav_dir: directory with input*.wavfiles--align_dir: directory with Azure‑style alignment JSONs--save_dir: directory for edited outputs
Example:
cd ./LEMAS-TTS
bash lemas_tts/scripts/speech_edit_multilingual.shThe script supports both prosody‑enabled and non‑prosody variants; see the
inline comments in speech_edit_multilingual.sh for a prosody example.
This project builds heavily on the following open‑source works:
- F5‑TTS – core model architecture and many components of the inference pipeline.
- UVR5 – music source separation / vocal denoising, used here as an optional pre‑processing step.
If you use LEMAS‑TTS in your work, please also consider citing and acknowledging these upstream projects.
@article{zhao2026lemas,
title={LEMAS: A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models},
author={Zhao, Zhiyuan and Lin, Lijian and Zhu, Ye and Xie, Kai and Liu, Yunfei and Li, Yu},
journal={arXiv preprint arXiv:2601.04233},
year={2026}
}
This repository is released under the CC‑BY‑NC‑4.0 license.
See https://creativecommons.org/licenses/by-nc/4.0/ for more details.