🗣️ Natq – Arabic Text-to-Speech (TTS) System

📌 Overview

Natq is an open-source Arabic Text-to-Speech (TTS) system that uses deep learning models to convert Arabic text into natural-sounding speech.
We designed and trained multiple architectures on public datasets with a focus on Arabic language intricacies, including diacritization and phoneme modeling.

Graduation Project – Faculty of Computing and Artificial Intelligence, Helwan University
Department of Artificial Intelligence
Supervised by: Dr. Yasser Hifny & Dr. Ahmed Hesham

🎯 Objectives

Create a modular and efficient Arabic TTS system using public datasets.
Handle Arabic-specific challenges such as diacritics and phoneme mapping.
Compare performance of different spectrogram generators and vocoders.
Support live speech synthesis via a web app (React + FastAPI).

🧠 Architecture

The system pipeline is composed of:

Text Preprocessing
Diacritization: CATT Transformer
Grapheme-to-Phoneme (G2P): Nawar Halabi’s Phenomizer
Spectrogram Generation: FastPitch, FastSpeech2, Mixer-TTS, or Spark-TTS
Vocoder: HiFi-GAN for waveform synthesis

📚 Datasets

Dataset	Hours	Diacritized	Accent	Notes
Arabic_Speech_Corpus	4.1	✅	Levantine	Open-source
ClArTTS	12	✅	Classical MSA	Based on LibriVox audiobook

🛠 Models Used

🔹 Spectrogram Generators

FastPitch – Parallel and pitch-controllable TTS
FastSpeech2 – Variance-aware and efficient
Mixer-TTS – MLP-Mixer based parallel synthesis
Spark-TTS – End-to-end LLM-based TTS with zero-shot speaker cloning

🔹 Vocoder

HiFi-GAN – Fast, high-fidelity waveform generation

🧪 Evaluation

Objective

Spectrogram quality
Inference time
Model size

Subjective

Mean Opinion Score (MOS) from human listeners

🔈 Audio Samples

وَالسَّلَامُ عَلَى أَشْرَفِ الْأَنْبِيَاءِ وَالْمُرْسَلِينَ سَيِّدِنَا مُحَمَّدٍ

Model	Dataset	Sample
FastPitch	ClArTTS	▶️ Listen
Mixer-TTS	ASC	▶️ Listen
Spark-TTS	ClArTTS	▶️ Listen
VITS_facebook	—	▶️ Listen
T5_MBZUAI	—	▶️ Listen
FastSpeech2	—	▶️ Listen

👥 Contributors

📂 Project Structure

d:/coding/Natq/
├── .gitignore
├── FastPitch/
│   ├── .vscode/
│   │   └── sttings.json
│   ├── arabic_phoneme_tokenizer.py
│   ├── catt/
│   ├── custom_arabic_to_phones.py
│   └── ...
├── FastSpeech2/
│   ├── arabic_phoneme_tokenizer.py
│   ├── catt/
│   └── ...
├── Media/
│   ├── audio_samples/
│   │   ├── FastPitch-TTS_MOS/
│   │   │   ├── bism_fp.mp4
│   │   │   ├── bism_fp.wav
│   │   │   └── ...
│   │   ├── Mixer-TTS_MOS/
│   │   │   ├── bism_mix.mp4
│   │   │   ├── bism_mix.wav
│   │   │   └── ...
│   │   └── Spark-TTS_MOS/
│   │       ├── bism_spark.mp4
│   │       ├── bism_spark.wav
│   │       └── ...
│   └── images/
│       ├── Demo.jpg
│       └── Fully_Logo_White.png
├── Natq-Frontend/
│   ├── .gitignore
│   ├── eslint.config.js
│   └── ...
├── README.md
└── Spark/
    ├── catt/
    ├── download_model_files.ipynb
    └── ...

🌐 Web Application

We built a simple TTS web interface using FastAPI for the backend and React for the frontend.

🧪 Test it locally:

cd backend
uvicorn main:app --reload

Then in another terminal

cd ../app
npm install
npm start

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🗣️ Natq – Arabic Text-to-Speech (TTS) System

📌 Overview

🎯 Objectives

🧠 Architecture

📚 Datasets

🛠 Models Used

🔹 Spectrogram Generators

🔹 Vocoder

🧪 Evaluation

Objective

Subjective

🔈 Audio Samples

وَالسَّلَامُ عَلَى أَشْرَفِ الْأَنْبِيَاءِ وَالْمُرْسَلِينَ سَيِّدِنَا مُحَمَّدٍ

👥 Contributors

📂 Project Structure

🌐 Web Application

Then in another terminal

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
FastPitch		FastPitch
FastSpeech2		FastSpeech2
Media		Media
Natq-Frontend		Natq-Frontend
Spark		Spark
.gitignore		.gitignore
FastPitch_MixerTTS_Demo.ipynb		FastPitch_MixerTTS_Demo.ipynb
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🗣️ Natq – Arabic Text-to-Speech (TTS) System

📌 Overview

🎯 Objectives

🧠 Architecture

📚 Datasets

🛠 Models Used

🔹 Spectrogram Generators

🔹 Vocoder

🧪 Evaluation

Objective

Subjective

🔈 Audio Samples

وَالسَّلَامُ عَلَى أَشْرَفِ الْأَنْبِيَاءِ وَالْمُرْسَلِينَ سَيِّدِنَا مُحَمَّدٍ

👥 Contributors

📂 Project Structure

🌐 Web Application

Then in another terminal

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages