Skip to content

Tknika/azpidatzi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

59 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Azpidatzi

Azpidatzi Logo

Advanced speech-to-text transcription service with web interface

Azpidatzi is a modern, containerized audio transcription platform that converts speech to text with high accuracy. Built with FastAPI and Astro, it supports multiple audio formats, YouTube integration, and provides word-level timestamps with optimized subtitle generation.

Screenshots

Upload an audio file or download a video from YouTube.

Azpidatzi Screenshot

Download or edit the transcript directly in the browser.

Azpidatzi Editor Screenshot

More screenshots here.

✨ Features

  • Multi-format Support: Upload WAV, MP3, M4A, FLAC, and other audio formats
  • YouTube Video Download: Direct transcription from YouTube video URLs
  • Advanced Transcription: WhisperX-powered with word-level alignment
  • Multi-language: Support for custom Whisper models and alignment models
  • GPU Acceleration: Automatic GPU detection and utilization
  • Voice Activity Detection: SILERO VAD for improved accuracy
  • Speaker Diarization: Speaker diarization with WhisperX
  • Subtitle Generation: Multiple SRT formats (optimized, basic, unaligned)
  • Modern Web Interface: Clean, responsive UI built with Astro and TailwindCSS
  • Transcript Editor: Transcription web editor included

πŸš€ Quick Start

Prerequisites

  • Docker and Docker Compose v2
  • (Optional) NVIDIA drivers and NVIDIA Container Toolkit for GPU acceleration. Recommended for better performance and larger models.

Run with Docker Compose (Recommended)

# Clone the repository
git clone <repository-url>
cd azpidatzi

# Start all services
docker compose up --build

# Access the application
# Frontend: http://localhost:4321
# Backend API: http://localhost:8000
# API Documentation: http://localhost:8000/docs

Note: Docker Compose is the supported and recommended way to run Azpidatzi. All development, testing, and deployment should be done using Docker containers.

Development Commands

A comprehensive Makefile is provided for common operations:

# Start all services
make up

# View logs
make logs

# Run tests
make test

# Stop all services
make down

# Show all available commands
make help

Available Commands:

  • make build - Build all Docker images
  • make up - Start all services
  • make down - Stop all services
  • make logs - Show logs from all services
  • make logs-backend - Show backend logs only
  • make logs-frontend - Show frontend logs only
  • make test - Run backend tests
  • make test-verbose - Run tests with verbose output
  • make rebuild - Stop, rebuild and restart services
  • make clean - Remove all containers and images

πŸ—οΈ Architecture

Backend (FastAPI)

  • FastAPI: Modern, fast web framework
  • WhisperX: Advanced speech recognition with alignment
  • FFmpeg: Audio processing and format conversion
  • GPU Support: CUDA, MPS, and CPU fallback

Frontend (Astro)

  • Astro: Static site generator for optimal performance
  • TailwindCSS: Utility-first CSS framework
  • Vanilla JavaScript: No framework dependencies

πŸ“š Documentation

Interactive API Documentation

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

πŸ§ͺ Testing

All testing should be done using Docker Compose to ensure consistency:

# Run all tests
make test

# Run tests with verbose output
make test-verbose

# Run specific test
docker compose run --rm backend pytest -q -k test_name

# Run with output capture disabled
docker compose run --rm backend pytest -q -s

πŸ”§ Development

Recommended: Docker Development

The recommended approach is to use Docker Compose for all development tasks:

# Start development environment
docker compose up --build

# Rebuild after code changes
docker compose build

# View logs during development
docker compose logs -f backend
docker compose logs -f frontend

# Run tests
docker compose run --rm backend pytest -q

Alternative: Local Development

Note: Local development requires additional setup and is not the supported environment.

For backend local development (requires Python 3.8+ and FFmpeg):

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000

For frontend local development (requires Node.js 18+):

cd frontend
npm install
npm run dev

Important: The frontend is configured to use http://backend:8000 in Docker and http://localhost:8000 for local development. Always ensure the backend is running when developing locally.

🐳 Docker Configuration

GPU Support

The application automatically detects and uses available GPUs:

# For NVIDIA GPUs
docker compose -f docker-compose.gpu.yml up --build

# Or with explicit GPU configuration
docker compose up --build --gpus all

Environment Variables

  • HUGGING_FACE_HUB_TOKEN: Required for SILERO VAD functionality
  • CUDA_VISIBLE_DEVICES: Control GPU usage (optional)

πŸ“ Project Structure

azpidatzi/
β”œβ”€β”€ backend/                 # FastAPI backend service
β”‚   β”œβ”€β”€ app/                # Application code
β”‚   β”‚   β”œβ”€β”€ main.py         # FastAPI application
β”‚   β”‚   β”œβ”€β”€ routers/        # API endpoints
β”‚   β”‚   └── services/       # Business logic
β”‚   β”œβ”€β”€ tests/              # Test suite
β”‚   β”œβ”€β”€ data/               # Runtime data storage
β”‚   β”œβ”€β”€ API.md              # API documentation
β”‚   └── requirements.txt    # Python dependencies
β”œβ”€β”€ frontend/               # Astro frontend
β”‚   β”œβ”€β”€ src/                # Source code
β”‚   └── README.md           # Frontend documentation
β”œβ”€β”€ docker-compose.yml      # Docker Compose configuration
β”œβ”€β”€ docker-compose.gpu.yml  # GPU-enabled configuration
β”œβ”€β”€ Makefile               # Development commands
└── README.md              # This file

Backend Architecture

  • FastAPI: Modern, fast web framework for building APIs
  • WhisperX: Advanced speech recognition with alignment capabilities
  • FFmpeg: Audio processing and format conversion
  • Pydantic: Data validation and serialization
  • Uvicorn: ASGI server for production deployment

Storage

Files, transcripts, and subtitles are stored under backend/data/:

  • data/files/ - Uploaded audio files
  • data/transcripts/ - Generated transcripts (JSON)
  • data/subtitles/ - Generated subtitle files (SRT)

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Run the test suite
  6. Submit a pull request

πŸ“„ License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0). See the LICENSE file for the full text.

πŸ†˜ Support

For questions, issues, or contributions:

  • Check the API Documentation for technical details
  • Review this README for setup and development guidance
  • Open an issue on the repository for bugs or feature requests

πŸ“ Acknowledgments

  • WhisperX for the speech recognition model
  • yt-dlp for the YouTube video downloader
  • wscribe-editor for the subtitle editor which has been modified and forked to support media and subtitle files via URL parameters and other improvements
  • FFmpeg for the audio processing

About

Advanced speech-to-text transcription service with web interface

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published