Fast and accurate audio transcription CLI powered by OpenAI Whisper Large-v3 and faster-whisper.
Features:
- 🎯 Transcribe audio files with high accuracy
- 🎙️ Record from microphone and transcribe in one command
- 🗂️ Batch process entire folders
- 🧱 Optional paragraph formatting (
--format) for better readability - 🌐 Auto-translate transcriptions to any language (powered by OpenAI GPT)
- ⚡ 4-8x faster than standard Whisper implementations
- 🌍 99+ languages supported with auto-detection
# Transcribe a single file
whisp audio.mp3 # → audio.txt
# Record and transcribe
whisp record # → ~/Records/recording_TIMESTAMP.txt
# Batch process folder
whisp ./recordings/ # → recordings.txt
# With specific model and language
whisp audio.mp3 --model large --language de
# With translation to target language
whisp audio.mp3 --translate ru # → audio.txt + audio_ru.txt
# With paragraph formatting (no text edits, only paragraph breaks)
whisp audio.mp3 --format # → audio.txt + audio_formatted.txt💡 Note: If you installed with
pip install -r requirements.txtinstead ofpip install -e ., usepython whisp.pyinstead ofwhisp
- 🎯 High accuracy transcription with Whisper Large-v3
- ⚡ 4-8x faster than standard Whisper (using CTranslate2)
- 🚀 GPU (CUDA) support for accelerated processing
- 💾 Lower memory usage with int8 quantization on CPU
- 🌍 Automatic language detection or manual language specification
- 📝 Preview of transcription results
- 🔄 Multiple model options (large, large-v2, turbo, medium, small, base)
- 🎤 Voice activity detection (VAD) to skip silence
- 🎙️ Microphone recording mode - record and transcribe
- 🗂️ Batch mode - process entire folders
- 💾 M4A compression - save recordings 10x smaller with minimal quality loss
- 🧱 Optional paragraph formatting via LLM (
--format) - 🌐 Auto-translation - translate transcriptions to any language using OpenAI GPT API
- MP3
- WAV
- M4A
- FLAC
- OGG
- Other formats supported by ffmpeg
- Python 3.11, 3.12, or 3.13 (
⚠️ Python 3.14 not supported yet due to dependencies) - ffmpeg (for audio processing and format conversion)
macOS:
brew install python@3.11Ubuntu/Debian:
sudo apt update
sudo apt install python3.11 python3.11-venvWindows: Download Python 3.11 from python.org and install.
macOS:
brew install ffmpegUbuntu/Debian:
sudo apt update
sudo apt install ffmpegWindows: Download from ffmpeg.org and add to PATH.
cd whispmacOS/Linux:
python3.11 -m venv venvWindows:
python -m venv venv💡 Tip: Make sure you're using Python 3.11-3.13. Check with
python3.11 --version
macOS/Linux:
source venv/bin/activateWindows:
venv\Scripts\activateOption A: Install as editable package (recommended)
pip install --upgrade pip
pip install -e .Option B: Install from requirements.txt (use python whisp.py)
pip install --upgrade pip
pip install -r requirements.txt
⚠️ Note: The first run will take some time as the Whisper model (~3GB for large) will be downloaded. The model is cached locally for future use. Downloads can be interrupted and resumed.
After installing with Option A (pip install -e .), you can use the whisp command from anywhere:
whisp audio.mp3 # Instead of: python whisp.py audio.mp3
whisp record # Instead of: python whisp.py record
whisp ./recordings/ # Instead of: python whisp.py ./recordings/If you used Option B, use python whisp.py instead.
To run whisp from any directory without manually activating the virtual environment every time, you can add an alias to your shell configuration (e.g., ~/.zshrc).
Run this command from the project root directory:
echo "alias whisp=\"$(pwd)/venv/bin/whisp\"" >> ~/.zshrc
source ~/.zshrcBasic usage:
whisp input.mp3 output.txt
whisp input.mp3 # Auto-generates input.txtWith model and language specification:
whisp audio.wav transcript.txt --model large --language en
whisp audio.wav --model turbo # Auto-generates audio.txtExample commands:
# German lecture with maximum accuracy
whisp lecture.mp3 --model large --language de
# Fast podcast transcription
whisp podcast.m4a --model turbo --language en
# Auto-detect language with medium model
whisp interview.mp3 --model mediumAvailable models:
| Model | Size | Accuracy | Speed | Recommendation |
|---|---|---|---|---|
| turbo | ~800MB | Good | 8x faster | ✅ Default, fast multilingual |
| large | ~3GB | Best | Slow | For academic content (latest v3) |
| large-v2 | ~3GB | Best | Slightly faster | Previous large version |
| medium | ~1.5GB | Good | 2-3x faster | Balance of speed and quality |
| small | ~466MB | Basic | Fast | For simple tasks |
| base | ~145MB | Basic | Very fast | Minimal accuracy |
Record audio from your microphone and transcribe it automatically.
Basic recording:
whisp record output.txt
whisp record # Auto-saves to save_dir/recording_TIMESTAMP.txtRecording with specific model and language:
whisp record transcript.txt --model turbo --language de
whisp record --model turbo --language de # Auto-generates filenameHow it works:
- Shows list of available microphones (or uses
default_devicefrom config) - You select a device with arrow keys ↑↓ and Enter
- Press Enter to start recording
- Press Ctrl+D to stop recording (prevents accidental stops)
- Audio is automatically transcribed using selected model
- Both audio and transcription saved to
save_dirwith matching timestamps - Files named like:
recording_20251208_195410.m4aandrecording_20251208_195410.txt
Permissions on macOS:
- First run will ask for microphone permission
- If denied: System Settings → Privacy & Security → Microphone → Terminal
Process entire folders of audio files. All files are processed in natural sort order (1, 2, 10 instead of 1, 10, 2) and combined into a single output file.
Basic batch processing:
whisp ./lectures/ combined_transcript.txt --language de --model turbo
whisp ./recordings/ # Auto-generates recordings.txtSupported formats:
.mp3, .wav, .m4a, .flac, .ogg, .wma, .aac, .opus
Automatically translate transcriptions to any target language using OpenAI GPT API.
Pipeline order when options are enabled:
- Transcription
- Optional formatting (
--format) → saves<base>_formatted.txt - Optional translation (
--translate) → always translates from original transcription file
Basic translation:
whisp audio.mp3 --translate ru # → audio.txt + audio_ru.txt
whisp audio.mp3 --translate en --language de # German audio → English translationWith recording mode:
whisp record --translate ru
# Creates: recording_20251209_072149.m4a + recording_20251209_072149.txt + recording_20251209_072149_ru.txtWith batch mode:
whisp ./lectures/ --language de --translate ru
# Creates: lectures.txt + lectures_ru.txt (all files combined)Create an additional file with semantic paragraph breaks for readability.
whisp audio.mp3 --format
# Creates: audio.txt + audio_formatted.txt
whisp record --format --translate ru
# Creates: recording_*.txt + recording_*_formatted.txt + recording_*_ru.txtRules:
- Formatting never replaces the original transcript file
- Formatted file is saved as returned by the model
Setup:
- Get OpenAI API key at https://platform.openai.com/api-keys
- Add to
config.yaml:Or set environment variable:translation: openai_api_key: "sk-..." model: "gpt-5-mini" # Recommended: fast, excellent quality
export OPENAI_API_KEY="sk-..."
Features:
- High-quality contextual translation with GPT
- Automatic paragraph organization for readability
- Low cost: ~$0.05 per 1.5-hour lecture (gpt-5-mini)
- Preserves technical terms, names, and numbers
- Customizable translation prompt in config.yaml
- Supports all languages (en, ru, de, es, fr, ja, zh, etc.)
Cost estimate (gpt-5-mini):
- Short audio (5 min): ~$0.003
- Medium audio (30 min): ~$0.01
- Long lecture (1.5 hours): ~$0.05
💡 Tip: Translation is optional. If
--translateis not specified, only transcription is saved. 💡 Tip: Formatting is optional. Use--formatto get an extra<base>_formatted.txtfile.
# Show help
whisp --help
# Show version
whisp --versionThe application can be configured via config.yaml file. All settings have sensible defaults.
Configuration Loading Order (Priority High to Low):
~/.whisp/config.yamlin user home directory (User global)config.yamlin application directory (Default)
default_language: Auto-detect if empty, or specify (e.g., "en", "de", "ru")beam_size: Search beam size (default: 5) - higher = more accurate but slowervad_filter: Voice activity detection to skip silence (default: true)min_silence_duration_ms: Minimum silence duration for VAD (default: 500ms)
sample_rate: Recording quality (default: 16000 Hz)channels: Audio channels (default: 1 - mono)default_device: Pre-select microphone by name or show menu with-1save_dir: Directory for saved recordingskeep_recording: Keep audio file after transcription (default: false)compress_format:"m4a"(10x smaller) or"wav"(default: m4a)show_level_meter: Show real-time audio level (default: true)
default: Model to use if not specified (default: "turbo")compute_type_cpu: Quantization for CPU (default: "int8")compute_type_gpu: Precision for GPU (default: "float16")
preview_length: Characters to show in preview (default: 200)
openai_api_key: Your OpenAI API key (get at platform.openai.com/api-keys)model: GPT model for translation and--formatparagraph formatting (default: "gpt-5-mini")- Options: gpt-5-mini (recommended), gpt-5-nano, gpt-4o-mini
temperature: Creativity for translation (default: 1.0 for gpt-5-mini, 0.3 otherwise)--formattriestemperature=0.0for deterministic output; if model rejects it, falls back to model defaultsystem_prompt: AI translator's role/personauser_prompt: Specific translation instructionsformat_system_prompt: System prompt for--formatparagraphing stepformat_user_prompt: User prompt template for--format(supports{text})
- turbo - ✅ Best for most use cases: fast (8x) with good accuracy
- large - Maximum accuracy for academic/technical content (latest v3)
- large-v2 - Previous version, slightly faster than v3
- medium - Good balance for any language
- small - Fast transcription with acceptable quality
- base - Quick tests only
Whisper supports 99+ languages. Most popular:
en- Englishru- Russianes- Spanishfr- Frenchde- Germanit- Italianja- Japaneseko- Koreanzh- Chinese
Full list: Whisper Language Support
If you see errors about onnxruntime or dependency conflicts:
# Check your Python version
python --version
# Should be 3.11.x, 3.12.x, or 3.13.x
# If you have Python 3.14, you need to use Python 3.11-3.13Solution: Recreate your venv with Python 3.11:
# Remove old venv
rm -rf venv
# Create new venv with Python 3.11
python3.11 -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
# Reinstall dependencies
pip install --upgrade pip
pip install -r requirements.txtMake sure ffmpeg is installed and available in PATH:
ffmpeg -versionIf not installed:
- macOS:
brew install ffmpeg - Ubuntu:
sudo apt install ffmpeg - Windows: Download from ffmpeg.org
This usually happens with Python 3.14+. Make sure you're using Python 3.11-3.13:
python --version # Should show 3.11.x, 3.12.x, or 3.13.xIf transcription is slower than expected:
- Check that
faster-whisperis properly installed:
python -c "from faster_whisper import WhisperModel; print('OK')"- Verify you're using CPU int8 quantization (check console output)
- Try a smaller model (turbo or medium) for faster processing
- Ensure VAD filter is enabled in config.yaml (skips silence)
To use GPU acceleration, you may need to install CUDA-enabled dependencies. Check:
python -c "import torch; print(torch.cuda.is_available())"If your model download was interrupted, simply run the script again. Downloads are resumable — only missing files will be downloaded.
whisp/
├── whisp.py # Main script
├── pyproject.toml # Project metadata and install config
├── requirements.txt # Python dependencies
├── config.yaml # Configuration file
├── LICENSE # MIT License
├── README.md # Documentation
└── venv/ # Virtual environment
This project is licensed under the MIT License - see the LICENSE file for details.
This project uses the Whisper model from OpenAI. See Whisper License for details.
Questions and suggestions are welcome! Create issues or pull requests.
- faster-whisper GitHub - The optimized implementation we use
- CTranslate2 - Fast inference engine for Transformer models
- Whisper Large-v3 on HuggingFace
- OpenAI Whisper GitHub - Original Whisper repository
- Whisper Model Card - Technical details and benchmarks
Made with love in Germany 🇩🇪