Skip to content

eid-osama/Natq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Natq Logo

🗣️ Natq – Arabic Text-to-Speech (TTS) System

📌 Overview

Natq is an open-source Arabic Text-to-Speech (TTS) system that uses deep learning models to convert Arabic text into natural-sounding speech.
We designed and trained multiple architectures on public datasets with a focus on Arabic language intricacies, including diacritization and phoneme modeling.


Natq Logo

Graduation Project – Faculty of Computing and Artificial Intelligence, Helwan University
Department of Artificial Intelligence
Supervised by: Dr. Yasser Hifny & Dr. Ahmed Hesham

🎯 Objectives

  • Create a modular and efficient Arabic TTS system using public datasets.
  • Handle Arabic-specific challenges such as diacritics and phoneme mapping.
  • Compare performance of different spectrogram generators and vocoders.
  • Support live speech synthesis via a web app (React + FastAPI).

🧠 Architecture

The system pipeline is composed of:

  1. Text Preprocessing
  2. Diacritization: CATT Transformer
  3. Grapheme-to-Phoneme (G2P): Nawar Halabi’s Phenomizer
  4. Spectrogram Generation: FastPitch, FastSpeech2, Mixer-TTS, or Spark-TTS
  5. Vocoder: HiFi-GAN for waveform synthesis

Pipeline Diagram


📚 Datasets

Dataset Hours Diacritized Accent Notes
Arabic_Speech_Corpus 4.1 Levantine Open-source
ClArTTS 12 Classical MSA Based on LibriVox audiobook

🛠 Models Used

🔹 Spectrogram Generators

  • FastPitch – Parallel and pitch-controllable TTS
  • FastSpeech2 – Variance-aware and efficient
  • Mixer-TTS – MLP-Mixer based parallel synthesis
  • Spark-TTS – End-to-end LLM-based TTS with zero-shot speaker cloning

🔹 Vocoder

  • HiFi-GAN – Fast, high-fidelity waveform generation

Spectrogram Comparison


🧪 Evaluation

Objective

  • Spectrogram quality
  • Inference time
  • Model size

Subjective

  • Mean Opinion Score (MOS) from human listeners

MOS Table


🔈 Audio Samples

وَالسَّلَامُ عَلَى أَشْرَفِ الْأَنْبِيَاءِ وَالْمُرْسَلِينَ سَيِّدِنَا مُحَمَّدٍ

Model Dataset Sample
FastPitch ClArTTS ▶️ Listen
Mixer-TTS ASC ▶️ Listen
Spark-TTS ClArTTS ▶️ Listen
VITS_facebook ▶️ Listen
T5_MBZUAI ▶️ Listen
FastSpeech2 ▶️ Listen

👥 Contributors


📂 Project Structure

d:/coding/Natq/
├── .gitignore
├── FastPitch/
│   ├── .vscode/
│   │   └── sttings.json
│   ├── arabic_phoneme_tokenizer.py
│   ├── catt/
│   ├── custom_arabic_to_phones.py
│   └── ...
├── FastSpeech2/
│   ├── arabic_phoneme_tokenizer.py
│   ├── catt/
│   └── ...
├── Media/
│   ├── audio_samples/
│   │   ├── FastPitch-TTS_MOS/
│   │   │   ├── bism_fp.mp4
│   │   │   ├── bism_fp.wav
│   │   │   └── ...
│   │   ├── Mixer-TTS_MOS/
│   │   │   ├── bism_mix.mp4
│   │   │   ├── bism_mix.wav
│   │   │   └── ...
│   │   └── Spark-TTS_MOS/
│   │       ├── bism_spark.mp4
│   │       ├── bism_spark.wav
│   │       └── ...
│   └── images/
│       ├── Demo.jpg
│       └── Fully_Logo_White.png
├── Natq-Frontend/
│   ├── .gitignore
│   ├── eslint.config.js
│   └── ...
├── README.md
└── Spark/
    ├── catt/
    ├── download_model_files.ipynb
    └── ...

🌐 Web Application

We built a simple TTS web interface using FastAPI for the backend and React for the frontend.

Natq Logo

🧪 Test it locally:

cd backend
uvicorn main:app --reload

Then in another terminal

cd ../app
npm install
npm start

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors