A FastAPI-based web service that provides speech-to-text transcription and translation capabilities using OpenAI's Whisper model. This service offers RESTful endpoints for converting audio files into text and translating non-English audio to English text.
- Audio Transcription: Convert speech in audio files to text in the original language
- Audio Translation: Translate non-English audio to English text
- Multi-format Support: Support for various audio and video formats
- OpenAI-Compatible API: RESTful endpoints compatible with OpenAI's audio API
- Docker Support: Containerized deployment for easy scaling
- Robust Error Handling: Comprehensive input validation and error responses
- Audio:
.mp3,.wav,.mpga,.webm,.m4a - Video:
.mp4,.mpeg - Size Limit: Configurable file size limits for optimal performance
- Python 3.11 or higher
- FFmpeg (for audio processing)
- Docker (optional, for containerized deployment)
- Clone the repository:
git clone https://github.com/duytechie/whisper-api-server.git
cd whisper-api- Install dependencies:
# Install CPU-optimized PyTorch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Install Whisper and FastAPI
pip install -U openai-whisper
pip install "fastapi[standard]"- Install FFmpeg:
Ubuntu/Debian:
sudo apt update && sudo apt install ffmpegmacOS:
brew install ffmpeg- Run the application:
python main.pyThe application will be available at http://localhost:8000
# Build the Docker image
docker build -t whisper-api .
# Run the container
docker run -d --name whisper-service -p 8000:8000 whisper-apiAccess the automatically generated API documentation at:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
GET /Convert audio to text in the original language:
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-F "file=@example.mp3" \
-F "model=small"Translate audio to English text:
curl -X POST http://localhost:8000/v1/audio/translations \
-F "file=@spanish_audio.mp3" \
-F "model=small"| Model | Parameters | Size | Relative Speed |
|---|---|---|---|
| tiny | 39 M | ~1 GB | ~32x |
| base | 74 M | ~1 GB | ~16x |
| small | 244 M | ~2 GB | ~6x |
| medium | 769 M | ~5 GB | ~2x |
| large | 1550 M | ~10 GB | 1x |
uvicorn whisper_api:app --reload --host 0.0.0.0 --port 8000- FastAPI: Modern, fast web framework for building APIs
- OpenAI Whisper: Automatic speech recognition system
- PyTorch: Machine learning framework (CPU-optimized)
- Uvicorn: ASGI server for FastAPI applications
- Docker: Containerization platform
- FFmpeg: Audio/video processing library