Whisper API

A FastAPI-based web service that provides speech-to-text transcription and translation capabilities using OpenAI's Whisper model. This service offers RESTful endpoints for converting audio files into text and translating non-English audio to English text.

Features

Audio Transcription: Convert speech in audio files to text in the original language
Audio Translation: Translate non-English audio to English text
Multi-format Support: Support for various audio and video formats
OpenAI-Compatible API: RESTful endpoints compatible with OpenAI's audio API
Docker Support: Containerized deployment for easy scaling
Robust Error Handling: Comprehensive input validation and error responses

Supported File Formats

Audio: .mp3, .wav, .mpga, .webm, .m4a
Video: .mp4, .mpeg
Size Limit: Configurable file size limits for optimal performance

Prerequisites

Python 3.11 or higher
FFmpeg (for audio processing)
Docker (optional, for containerized deployment)

Quick Start

Local Installation

Clone the repository:

git clone https://github.com/duytechie/whisper-api-server.git
cd whisper-api

Install dependencies:

# Install CPU-optimized PyTorch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# Install Whisper and FastAPI
pip install -U openai-whisper
pip install "fastapi[standard]"

Install FFmpeg:

Ubuntu/Debian:

sudo apt update && sudo apt install ffmpeg

macOS:

brew install ffmpeg

Run the application:

python main.py

The application will be available at http://localhost:8000

Docker Deployment

# Build the Docker image
docker build -t whisper-api .

# Run the container
docker run -d --name whisper-service -p 8000:8000 whisper-api

API Usage

Interactive Documentation

Access the automatically generated API documentation at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

API Endpoints

Health Check

GET /

Transcription

Convert audio to text in the original language:

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F "file=@example.mp3" \
  -F "model=small"

Translation

Translate audio to English text:

curl -X POST http://localhost:8000/v1/audio/translations \
  -F "file=@spanish_audio.mp3" \
  -F "model=small"

Available Whisper Models

Model	Parameters	Size	Relative Speed
tiny	39 M	~1 GB	~32x
base	74 M	~1 GB	~16x
small	244 M	~2 GB	~6x
medium	769 M	~5 GB	~2x
large	1550 M	~10 GB	1x

Development

Local Development with Auto-reload

uvicorn whisper_api:app --reload --host 0.0.0.0 --port 8000

Technology Stack

FastAPI: Modern, fast web framework for building APIs
OpenAI Whisper: Automatic speech recognition system
PyTorch: Machine learning framework (CPU-optimized)
Uvicorn: ASGI server for FastAPI applications
Docker: Containerization platform
FFmpeg: Audio/video processing library

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
docs		docs
.devcontainer.json		.devcontainer.json
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
main.py		main.py
whisper_api.py		whisper_api.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Whisper API

Features

Supported File Formats

Prerequisites

Quick Start

Local Installation

Docker Deployment

API Usage

Interactive Documentation

API Endpoints

Health Check

Transcription

Translation

Available Whisper Models

Development

Local Development with Auto-reload

Technology Stack

About

Uh oh!

Releases

Packages

Languages

License

duytechie/whisper-api-server

Folders and files

Latest commit

History

Repository files navigation

Whisper API

Features

Supported File Formats

Prerequisites

Quick Start

Local Installation

Docker Deployment

API Usage

Interactive Documentation

API Endpoints

Health Check

Transcription

Translation

Available Whisper Models

Development

Local Development with Auto-reload

Technology Stack

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages