🎙️ Whisp

Fast and accurate audio transcription CLI powered by OpenAI Whisper Large-v3 and faster-whisper.

Features:

🎯 Transcribe audio files with high accuracy
🎙️ Record from microphone and transcribe in one command
🗂️ Batch process entire folders
🧱 Optional paragraph formatting (--format) for better readability
🌐 Auto-translate transcriptions to any language (powered by OpenAI GPT)
⚡ 4-8x faster than standard Whisper implementations
🌍 99+ languages supported with auto-detection

🚀 Quick Start

# Transcribe a single file
whisp audio.mp3                          # → audio.txt

# Record and transcribe
whisp record                             # → ~/Records/recording_TIMESTAMP.txt

# Batch process folder
whisp ./recordings/                      # → recordings.txt

# With specific model and language
whisp audio.mp3 --model large --language de

# With translation to target language
whisp audio.mp3 --translate ru             # → audio.txt + audio_ru.txt

# With paragraph formatting (no text edits, only paragraph breaks)
whisp audio.mp3 --format                   # → audio.txt + audio_formatted.txt

💡 Note: If you installed with pip install -r requirements.txt instead of pip install -e ., use python whisp.py instead of whisp

✨ Full Feature List

🎯 High accuracy transcription with Whisper Large-v3
⚡ 4-8x faster than standard Whisper (using CTranslate2)
🚀 GPU (CUDA) support for accelerated processing
💾 Lower memory usage with int8 quantization on CPU
🌍 Automatic language detection or manual language specification
📝 Preview of transcription results
🔄 Multiple model options (large, large-v2, turbo, medium, small, base)
🎤 Voice activity detection (VAD) to skip silence
🎙️ Microphone recording mode - record and transcribe
🗂️ Batch mode - process entire folders
💾 M4A compression - save recordings 10x smaller with minimal quality loss
🧱 Optional paragraph formatting via LLM (--format)
🌐 Auto-translation - translate transcriptions to any language using OpenAI GPT API

🎵 Supported Audio Formats

MP3
WAV
M4A
FLAC
OGG
Other formats supported by ffmpeg

📋 Requirements

Python 3.11, 3.12, or 3.13 (⚠️ Python 3.14 not supported yet due to dependencies)
ffmpeg (for audio processing and format conversion)

Installing Python 3.11

macOS:

brew install python@3.11

Ubuntu/Debian:

sudo apt update
sudo apt install python3.11 python3.11-venv

Windows: Download Python 3.11 from python.org and install.

Installing ffmpeg

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt update
sudo apt install ffmpeg

Windows: Download from ffmpeg.org and add to PATH.

🚀 Installation

1. Clone the repository or copy the files

cd whisp

2. Create a virtual environment with Python 3.11

macOS/Linux:

python3.11 -m venv venv

Windows:

python -m venv venv

💡 Tip: Make sure you're using Python 3.11-3.13. Check with python3.11 --version

3. Activate the virtual environment

macOS/Linux:

source venv/bin/activate

Windows:

venv\Scripts\activate

4. Install the package

Option A: Install as editable package (recommended)

pip install --upgrade pip
pip install -e .

Option B: Install from requirements.txt (use python whisp.py)

pip install --upgrade pip
pip install -r requirements.txt

⚠️ Note: The first run will take some time as the Whisper model (~3GB for large) will be downloaded. The model is cached locally for future use. Downloads can be interrupted and resumed.

5. Using the `whisp` command

After installing with Option A (pip install -e .), you can use the whisp command from anywhere:

whisp audio.mp3              # Instead of: python whisp.py audio.mp3
whisp record                 # Instead of: python whisp.py record
whisp ./recordings/          # Instead of: python whisp.py ./recordings/

If you used Option B, use python whisp.py instead.

Creating a permanent alias (Optional)

To run whisp from any directory without manually activating the virtual environment every time, you can add an alias to your shell configuration (e.g., ~/.zshrc).

Run this command from the project root directory:

echo "alias whisp=\"$(pwd)/venv/bin/whisp\"" >> ~/.zshrc
source ~/.zshrc

💻 Usage

Single File Transcription

Basic usage:

whisp input.mp3 output.txt
whisp input.mp3                    # Auto-generates input.txt

With model and language specification:

whisp audio.wav transcript.txt --model large --language en
whisp audio.wav --model turbo      # Auto-generates audio.txt

Example commands:

# German lecture with maximum accuracy
whisp lecture.mp3 --model large --language de

# Fast podcast transcription
whisp podcast.m4a --model turbo --language en

# Auto-detect language with medium model
whisp interview.mp3 --model medium

Model Selection

Available models:

Model	Size	Accuracy	Speed	Recommendation
turbo	~800MB	Good	8x faster	✅ Default, fast multilingual
large	~3GB	Best	Slow	For academic content (latest v3)
large-v2	~3GB	Best	Slightly faster	Previous large version
medium	~1.5GB	Good	2-3x faster	Balance of speed and quality
small	~466MB	Basic	Fast	For simple tasks
base	~145MB	Basic	Very fast	Minimal accuracy

Recording Mode (Microphone Input)

Record audio from your microphone and transcribe it automatically.

Basic recording:

whisp record output.txt
whisp record                          # Auto-saves to save_dir/recording_TIMESTAMP.txt

Recording with specific model and language:

whisp record transcript.txt --model turbo --language de
whisp record --model turbo --language de  # Auto-generates filename

How it works:

Shows list of available microphones (or uses default_device from config)
You select a device with arrow keys ↑↓ and Enter
Press Enter to start recording
Press Ctrl+D to stop recording (prevents accidental stops)
Audio is automatically transcribed using selected model
Both audio and transcription saved to save_dir with matching timestamps
Files named like: recording_20251208_195410.m4a and recording_20251208_195410.txt

Permissions on macOS:

First run will ask for microphone permission
If denied: System Settings → Privacy & Security → Microphone → Terminal

Batch Mode (Directory Input)

Process entire folders of audio files. All files are processed in natural sort order (1, 2, 10 instead of 1, 10, 2) and combined into a single output file.

Basic batch processing:

whisp ./lectures/ combined_transcript.txt --language de --model turbo
whisp ./recordings/                    # Auto-generates recordings.txt

Supported formats: .mp3, .wav, .m4a, .flac, .ogg, .wma, .aac, .opus

Translation Mode

Automatically translate transcriptions to any target language using OpenAI GPT API.

Pipeline order when options are enabled:

Transcription
Optional formatting (--format) → saves <base>_formatted.txt
Optional translation (--translate) → always translates from original transcription file

Basic translation:

whisp audio.mp3 --translate ru              # → audio.txt + audio_ru.txt
whisp audio.mp3 --translate en --language de  # German audio → English translation

With recording mode:

whisp record --translate ru
# Creates: recording_20251209_072149.m4a + recording_20251209_072149.txt + recording_20251209_072149_ru.txt

With batch mode:

whisp ./lectures/ --language de --translate ru
# Creates: lectures.txt + lectures_ru.txt (all files combined)

Paragraph Formatting Mode (`--format`)

Create an additional file with semantic paragraph breaks for readability.

whisp audio.mp3 --format
# Creates: audio.txt + audio_formatted.txt

whisp record --format --translate ru
# Creates: recording_*.txt + recording_*_formatted.txt + recording_*_ru.txt

Rules:

Formatting never replaces the original transcript file
Formatted file is saved as returned by the model

Setup:

Get OpenAI API key at https://platform.openai.com/api-keys

Add to config.yaml:

translation:
  openai_api_key: "sk-..."
  model: "gpt-5-mini"  # Recommended: fast, excellent quality

Or set environment variable: export OPENAI_API_KEY="sk-..."

Features:

High-quality contextual translation with GPT
Automatic paragraph organization for readability
Low cost: ~$0.05 per 1.5-hour lecture (gpt-5-mini)
Preserves technical terms, names, and numbers
Customizable translation prompt in config.yaml
Supports all languages (en, ru, de, es, fr, ja, zh, etc.)

Cost estimate (gpt-5-mini):

Short audio (5 min): ~$0.003
Medium audio (30 min): ~$0.01
Long lecture (1.5 hours): ~$0.05

💡 Tip: Translation is optional. If --translate is not specified, only transcription is saved. 💡 Tip: Formatting is optional. Use --format to get an extra <base>_formatted.txt file.

Command Help

# Show help
whisp --help

# Show version
whisp --version

⚙️ Configuration

The application can be configured via config.yaml file. All settings have sensible defaults.

Configuration Loading Order (Priority High to Low):

~/.whisp/config.yaml in user home directory (User global)
config.yaml in application directory (Default)

Transcription Settings

default_language: Auto-detect if empty, or specify (e.g., "en", "de", "ru")
beam_size: Search beam size (default: 5) - higher = more accurate but slower
vad_filter: Voice activity detection to skip silence (default: true)
min_silence_duration_ms: Minimum silence duration for VAD (default: 500ms)

Recording Settings

sample_rate: Recording quality (default: 16000 Hz)
channels: Audio channels (default: 1 - mono)
default_device: Pre-select microphone by name or show menu with -1
save_dir: Directory for saved recordings
keep_recording: Keep audio file after transcription (default: false)
compress_format: "m4a" (10x smaller) or "wav" (default: m4a)
show_level_meter: Show real-time audio level (default: true)

Model Settings

default: Model to use if not specified (default: "turbo")
compute_type_cpu: Quantization for CPU (default: "int8")
compute_type_gpu: Precision for GPU (default: "float16")

Output Settings

preview_length: Characters to show in preview (default: 200)

Translation Settings

openai_api_key: Your OpenAI API key (get at platform.openai.com/api-keys)
model: GPT model for translation and --format paragraph formatting (default: "gpt-5-mini")
- Options: gpt-5-mini (recommended), gpt-5-nano, gpt-4o-mini
temperature: Creativity for translation (default: 1.0 for gpt-5-mini, 0.3 otherwise)
--format tries temperature=0.0 for deterministic output; if model rejects it, falls back to model default
system_prompt: AI translator's role/persona
user_prompt: Specific translation instructions
format_system_prompt: System prompt for --format paragraphing step
format_user_prompt: User prompt template for --format (supports {text})

🎯 Model Selection Guide

Recommended Models

turbo - ✅ Best for most use cases: fast (8x) with good accuracy
large - Maximum accuracy for academic/technical content (latest v3)
large-v2 - Previous version, slightly faster than v3
medium - Good balance for any language
small - Fast transcription with acceptable quality
base - Quick tests only

Supported Languages

Whisper supports 99+ languages. Most popular:

en - English
ru - Russian
es - Spanish
fr - French
de - German
it - Italian
ja - Japanese
ko - Korean
zh - Chinese

Full list: Whisper Language Support

🐛 Troubleshooting

Python version compatibility error

If you see errors about onnxruntime or dependency conflicts:

# Check your Python version
python --version

# Should be 3.11.x, 3.12.x, or 3.13.x
# If you have Python 3.14, you need to use Python 3.11-3.13

Solution: Recreate your venv with Python 3.11:

# Remove old venv
rm -rf venv

# Create new venv with Python 3.11
python3.11 -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Reinstall dependencies
pip install --upgrade pip
pip install -r requirements.txt

"ffmpeg not found" error

Make sure ffmpeg is installed and available in PATH:

ffmpeg -version

If not installed:

macOS: brew install ffmpeg
Ubuntu: sudo apt install ffmpeg
Windows: Download from ffmpeg.org

"Cannot install faster-whisper" error

This usually happens with Python 3.14+. Make sure you're using Python 3.11-3.13:

python --version  # Should show 3.11.x, 3.12.x, or 3.13.x

Slow performance

If transcription is slower than expected:

Check that faster-whisper is properly installed:

python -c "from faster_whisper import WhisperModel; print('OK')"

Verify you're using CPU int8 quantization (check console output)
Try a smaller model (turbo or medium) for faster processing
Ensure VAD filter is enabled in config.yaml (skips silence)

GPU not detected

To use GPU acceleration, you may need to install CUDA-enabled dependencies. Check:

python -c "import torch; print(torch.cuda.is_available())"

Interrupted model download

If your model download was interrupted, simply run the script again. Downloads are resumable — only missing files will be downloaded.

📦 Project Structure

whisp/
├── whisp.py               # Main script
├── pyproject.toml         # Project metadata and install config
├── requirements.txt       # Python dependencies
├── config.yaml            # Configuration file
├── LICENSE                # MIT License
├── README.md              # Documentation
└── venv/                  # Virtual environment

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

This project uses the Whisper model from OpenAI. See Whisper License for details.

🤝 Contributing

Questions and suggestions are welcome! Create issues or pull requests.

📚 Useful Links

faster-whisper GitHub - The optimized implementation we use
CTranslate2 - Fast inference engine for Transformer models
Whisper Large-v3 on HuggingFace
OpenAI Whisper GitHub - Original Whisper repository
Whisper Model Card - Technical details and benchmarks

Made with love in Germany 🇩🇪

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.claude		.claude
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
whisp.py		whisp.py

Folders and files

Latest commit

History

Repository files navigation

🎙️ Whisp

🚀 Quick Start

✨ Full Feature List

🎵 Supported Audio Formats

📋 Requirements

Installing Python 3.11

Installing ffmpeg

🚀 Installation

1. Clone the repository or copy the files

2. Create a virtual environment with Python 3.11

3. Activate the virtual environment

4. Install the package

5. Using the whisp command

Creating a permanent alias (Optional)

💻 Usage

Single File Transcription

Model Selection

Recording Mode (Microphone Input)

Batch Mode (Directory Input)

Translation Mode

Paragraph Formatting Mode (--format)

Command Help

⚙️ Configuration

Transcription Settings

Recording Settings

Model Settings

Output Settings

Translation Settings

🎯 Model Selection Guide

Recommended Models

Supported Languages

🐛 Troubleshooting

Python version compatibility error

"ffmpeg not found" error

"Cannot install faster-whisper" error

Slow performance

GPU not detected

Interrupted model download

📦 Project Structure

📄 License

🤝 Contributing

📚 Useful Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

5. Using the `whisp` command

Paragraph Formatting Mode (`--format`)

Packages