Azpidatzi

Advanced speech-to-text transcription service with web interface

Azpidatzi is a modern, containerized audio transcription platform that converts speech to text with high accuracy. Built with FastAPI and Astro, it supports multiple audio formats, YouTube integration, and provides word-level timestamps with optimized subtitle generation.

Screenshots

Upload an audio file or download a video from YouTube.

Download or edit the transcript directly in the browser.

More screenshots here.

✨ Features

Multi-format Support: Upload WAV, MP3, M4A, FLAC, and other audio formats
YouTube Video Download: Direct transcription from YouTube video URLs
Advanced Transcription: WhisperX-powered with word-level alignment
Multi-language: Support for custom Whisper models and alignment models
GPU Acceleration: Automatic GPU detection and utilization
Voice Activity Detection: SILERO VAD for improved accuracy
Speaker Diarization: Speaker diarization with WhisperX
Subtitle Generation: Multiple SRT formats (optimized, basic, unaligned)
Modern Web Interface: Clean, responsive UI built with Astro and TailwindCSS
Transcript Editor: Transcription web editor included

🚀 Quick Start

Prerequisites

Docker and Docker Compose v2
(Optional) NVIDIA drivers and NVIDIA Container Toolkit for GPU acceleration. Recommended for better performance and larger models.

Run with Docker Compose (Recommended)

# Clone the repository
git clone <repository-url>
cd azpidatzi

# Start all services
docker compose up --build

# Access the application
# Frontend: http://localhost:4321
# Backend API: http://localhost:8000
# API Documentation: http://localhost:8000/docs

Note: Docker Compose is the supported and recommended way to run Azpidatzi. All development, testing, and deployment should be done using Docker containers.

Development Commands

A comprehensive Makefile is provided for common operations:

# Start all services
make up

# View logs
make logs

# Run tests
make test

# Stop all services
make down

# Show all available commands
make help

Available Commands:

make build - Build all Docker images
make up - Start all services
make down - Stop all services
make logs - Show logs from all services
make logs-backend - Show backend logs only
make logs-frontend - Show frontend logs only
make test - Run backend tests
make test-verbose - Run tests with verbose output
make rebuild - Stop, rebuild and restart services
make clean - Remove all containers and images

🏗️ Architecture

Backend (FastAPI)

FastAPI: Modern, fast web framework
WhisperX: Advanced speech recognition with alignment
FFmpeg: Audio processing and format conversion
GPU Support: CUDA, MPS, and CPU fallback

Frontend (Astro)

Astro: Static site generator for optimal performance
TailwindCSS: Utility-first CSS framework
Vanilla JavaScript: No framework dependencies

📚 Documentation

API Documentation: Complete REST API reference
Frontend Documentation: Frontend development guide

Interactive API Documentation

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

🧪 Testing

All testing should be done using Docker Compose to ensure consistency:

# Run all tests
make test

# Run tests with verbose output
make test-verbose

# Run specific test
docker compose run --rm backend pytest -q -k test_name

# Run with output capture disabled
docker compose run --rm backend pytest -q -s

🔧 Development

Recommended: Docker Development

The recommended approach is to use Docker Compose for all development tasks:

# Start development environment
docker compose up --build

# Rebuild after code changes
docker compose build

# View logs during development
docker compose logs -f backend
docker compose logs -f frontend

# Run tests
docker compose run --rm backend pytest -q

Alternative: Local Development

Note: Local development requires additional setup and is not the supported environment.

For backend local development (requires Python 3.8+ and FFmpeg):

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000

For frontend local development (requires Node.js 18+):

cd frontend
npm install
npm run dev

Important: The frontend is configured to use http://backend:8000 in Docker and http://localhost:8000 for local development. Always ensure the backend is running when developing locally.

🐳 Docker Configuration

GPU Support

The application automatically detects and uses available GPUs:

# For NVIDIA GPUs
docker compose -f docker-compose.gpu.yml up --build

# Or with explicit GPU configuration
docker compose up --build --gpus all

Environment Variables

HUGGING_FACE_HUB_TOKEN: Required for SILERO VAD functionality
CUDA_VISIBLE_DEVICES: Control GPU usage (optional)

📁 Project Structure

azpidatzi/
├── backend/                 # FastAPI backend service
│   ├── app/                # Application code
│   │   ├── main.py         # FastAPI application
│   │   ├── routers/        # API endpoints
│   │   └── services/       # Business logic
│   ├── tests/              # Test suite
│   ├── data/               # Runtime data storage
│   ├── API.md              # API documentation
│   └── requirements.txt    # Python dependencies
├── frontend/               # Astro frontend
│   ├── src/                # Source code
│   └── README.md           # Frontend documentation
├── docker-compose.yml      # Docker Compose configuration
├── docker-compose.gpu.yml  # GPU-enabled configuration
├── Makefile               # Development commands
└── README.md              # This file

Backend Architecture

FastAPI: Modern, fast web framework for building APIs
WhisperX: Advanced speech recognition with alignment capabilities
FFmpeg: Audio processing and format conversion
Pydantic: Data validation and serialization
Uvicorn: ASGI server for production deployment

Storage

Files, transcripts, and subtitles are stored under backend/data/:

data/files/ - Uploaded audio files
data/transcripts/ - Generated transcripts (JSON)
data/subtitles/ - Generated subtitle files (SRT)

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Run the test suite
Submit a pull request

📄 License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0). See the LICENSE file for the full text.

🆘 Support

For questions, issues, or contributions:

Check the API Documentation for technical details
Review this README for setup and development guidance
Open an issue on the repository for bugs or feature requests

📝 Acknowledgments

WhisperX for the speech recognition model
yt-dlp for the YouTube video downloader
wscribe-editor for the subtitle editor which has been modified and forked to support media and subtitle files via URL parameters and other improvements
FFmpeg for the audio processing

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
assets		assets
backend		backend
editor @ 2090477		editor @ 2090477
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Azpidatzi

Screenshots

✨ Features

🚀 Quick Start

Prerequisites

Run with Docker Compose (Recommended)

Development Commands

🏗️ Architecture

Backend (FastAPI)

Frontend (Astro)

📚 Documentation

Interactive API Documentation

🧪 Testing

🔧 Development

Recommended: Docker Development

Alternative: Local Development

🐳 Docker Configuration

GPU Support

Environment Variables

📁 Project Structure

Backend Architecture

Storage

🤝 Contributing

📄 License

🆘 Support

📝 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

Tknika/azpidatzi

Folders and files

Latest commit

History

Repository files navigation

Azpidatzi

Screenshots

✨ Features

🚀 Quick Start

Prerequisites

Run with Docker Compose (Recommended)

Development Commands

🏗️ Architecture

Backend (FastAPI)

Frontend (Astro)

📚 Documentation

Interactive API Documentation

🧪 Testing

🔧 Development

Recommended: Docker Development

Alternative: Local Development

🐳 Docker Configuration

GPU Support

Environment Variables

📁 Project Structure

Backend Architecture

Storage

🤝 Contributing

📄 License

🆘 Support

📝 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages