Skip to content

A FastAPI-based web service that provides speech-to-text transcription and translation capabilities using OpenAI's Whisper model.

License

Notifications You must be signed in to change notification settings

duytechie/whisper-api-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Whisper API

A FastAPI-based web service that provides speech-to-text transcription and translation capabilities using OpenAI's Whisper model. This service offers RESTful endpoints for converting audio files into text and translating non-English audio to English text.

Python FastAPI OpenAI Whisper PyTorch Docker License: MIT

Features

  • Audio Transcription: Convert speech in audio files to text in the original language
  • Audio Translation: Translate non-English audio to English text
  • Multi-format Support: Support for various audio and video formats
  • OpenAI-Compatible API: RESTful endpoints compatible with OpenAI's audio API
  • Docker Support: Containerized deployment for easy scaling
  • Robust Error Handling: Comprehensive input validation and error responses

Supported File Formats

  • Audio: .mp3, .wav, .mpga, .webm, .m4a
  • Video: .mp4, .mpeg
  • Size Limit: Configurable file size limits for optimal performance

Prerequisites

  • Python 3.11 or higher
  • FFmpeg (for audio processing)
  • Docker (optional, for containerized deployment)

Quick Start

Local Installation

  1. Clone the repository:
git clone https://github.com/duytechie/whisper-api-server.git
cd whisper-api
  1. Install dependencies:
# Install CPU-optimized PyTorch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# Install Whisper and FastAPI
pip install -U openai-whisper
pip install "fastapi[standard]"
  1. Install FFmpeg:

Ubuntu/Debian:

sudo apt update && sudo apt install ffmpeg

macOS:

brew install ffmpeg
  1. Run the application:
python main.py

The application will be available at http://localhost:8000

Docker Deployment

# Build the Docker image
docker build -t whisper-api .

# Run the container
docker run -d --name whisper-service -p 8000:8000 whisper-api

API Usage

Interactive Documentation

Access the automatically generated API documentation at:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

API Endpoints

Health Check

GET /

Transcription

Convert audio to text in the original language:

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F "file=@example.mp3" \
  -F "model=small"

Translation

Translate audio to English text:

curl -X POST http://localhost:8000/v1/audio/translations \
  -F "file=@spanish_audio.mp3" \
  -F "model=small"

Available Whisper Models

Model Parameters Size Relative Speed
tiny 39 M ~1 GB ~32x
base 74 M ~1 GB ~16x
small 244 M ~2 GB ~6x
medium 769 M ~5 GB ~2x
large 1550 M ~10 GB 1x

Development

Local Development with Auto-reload

uvicorn whisper_api:app --reload --host 0.0.0.0 --port 8000

Technology Stack

  • FastAPI: Modern, fast web framework for building APIs
  • OpenAI Whisper: Automatic speech recognition system
  • PyTorch: Machine learning framework (CPU-optimized)
  • Uvicorn: ASGI server for FastAPI applications
  • Docker: Containerization platform
  • FFmpeg: Audio/video processing library

About

A FastAPI-based web service that provides speech-to-text transcription and translation capabilities using OpenAI's Whisper model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published