Fast parallel video-to-text transcription powered by OpenAI's Whisper AI.
Check out this: TranscriptAI - A modern Android application that enables seamless YouTube video transcript extraction and AI-powered analysis. Download subtitles, view them instantly, and export to your favorite AI platforms.
✓ Fast Parallel Processing - Split videos into chunks for 2-4x faster transcription
✓ Real-Time Progress - See actual transcription progress for all parallel processes
✓ Multiple Languages - English, Hindi, or auto-detect
✓ Multiple Models - Choose from 5 Whisper models (speed vs accuracy)
✓ Many Formats - Supports mp4, mov, avi, mkv, mp3, wav, m4a, flac, and more
✓ YouTube Subtitle Extraction - Instantly extract existing captions from YouTube videos
✓ Isolated Environment - All dependencies in .venv/ folder
✓ Clean Uninstall - Remove everything by deleting .venv/ folder
# 1. Install FFmpeg (one-time)
brew install ffmpeg
# 2. Set up Python environment
python3 setup.py
# 3. Place videos in input/ folder
# 4. Transcribe
python3 transcribe.py --default# 1-2. Same setup as above
# 3. Run interactive mode
python3 transcribe.py
# 4. Select "YouTube URL" option and paste linkDone! Find your transcripts in the output/ folder as .txt files.
- macOS (or Linux/Windows with modifications)
- Python 3.9+
- FFmpeg
- 4-8 GB RAM recommended
brew install ffmpegVerify installation:
ffmpeg -versionpython3 setup.pyThis creates:
.venv/folder with all Python packages isolated from your systeminput/folder where you'll place your videosoutput/folder where transcripts will be saved
# Quick start with defaults (English, base model, 2 threads)
python3 transcribe.py --default
# Custom settings
python3 transcribe.py --threads 4 --model 4 --lang hi# Interactive mode with YouTube support
python3 transcribe.py
# Then select:
# Option 2: YouTube URL
# Enter language and paste video URL
# Get instant transcript if subtitles availableWhen to use YouTube mode:
- Video has existing subtitles (instant results)
- Want to avoid processing time
When to use local files:
- No subtitles available
- Need higher accuracy than auto-generated captions
| Option | Description | Example |
|---|---|---|
--default |
Use defaults (English, base model, 2 threads) | --default |
--threads N |
Number of parallel chunks (1-8) | --threads 4 |
--model M |
Whisper model (1-5) | --model 3 |
--lang LANG |
Language (en, hi, auto) | --lang auto |
| Option | Language | Behavior |
|---|---|---|
en |
English | Default - Forces English |
hi |
Hindi | Forces Hindi |
auto |
Auto-detect | Whisper detects language automatically |
| # | Model | Size | Speed | Accuracy | Use Case |
|---|---|---|---|---|---|
| 1 | tiny | 40 MB | Fastest | Low | Quick tests |
| 2 | base | 140 MB | Fast | Good | Default - balanced |
| 3 | small | 470 MB | Medium | High | Better accuracy |
| 4 | medium | 1.4 GB | Slow | Very High | High quality needed |
| 5 | large | 2.9 GB | Slowest | Best | Maximum accuracy |
Recommendation: Start with base model. Upgrade to medium if accuracy is insufficient.
- Place videos in
input/folder - Run
python3 transcribe.py --default - Watch coordinated progress bars for each chunk
- Find transcripts in
output/folder
- Run
python3 transcribe.py(interactive mode) - Select "YouTube URL" option
- Choose language and paste URL
- Get instant transcript in
output/folder
While transcribing, you'll see real-time progress for each chunk:
Starting parallel transcription of 2 chunks...
Chunk 1: 75%|███████████████████| [01:30<00:30]
Chunk 2: 68%|█████████████████ | [01:25<00:35]
Each chunk shows percentage, time elapsed, and time remaining.
Video: mp4, mov, avi, mkv, flv, wmv Audio: mp3, wav, m4a, flac, ogg, aac
Extract YouTube subtitles directly through interactive mode:
python3 transcribe.py
# Select: 2) YouTube URL
# Choose language, paste URL
# Done - instant extraction!Direct extraction without interactive prompts:
python3 extensions/youtube-subtitles/youtube_subs.py <URL> --lang enSubtitle unavailable? The tool will show an error with instructions to download the video and use Whisper transcription instead.
For full documentation: See docs/youtube-subtitles.md
Parallel Processing:
- 2 threads: Good for 8 GB RAM
- 4 threads: Good for 16 GB RAM
- 8 threads: Requires 32 GB+ RAM
Model Selection:
- base model: ~6 min for 30-min video
- medium model: ~15 min for 30-min video
First Run:
- Whisper downloads models on first use
- Subsequent runs use cached models (much faster)
| Issue | Solution |
|---|---|
| "No video files found" | Place videos in input/ folder |
| "ModuleNotFoundError: whisper" | Run python3 setup.py |
| "ffmpeg: command not found" | Run brew install ffmpeg |
| Slow first run | Whisper is downloading model (happens once) |
| Out of memory | Use fewer threads or smaller model |
| Virtual env issues | rm -rf .venv && python3 setup.py |
rm -rf .venvThat's it! All Python dependencies removed.
brew uninstall ffmpegQ: Can I transcribe languages other than English and Hindi?
A: Yes! Use --lang auto and Whisper will auto-detect the language. For best results with a specific language, you can modify the code (see docs/architecture.md).
Q: Can I get timestamps with the transcription?
A: Not yet. Output is plain text.
Q: How much faster is parallel processing?
A: Typically 2-3x faster with 4 chunks on modern CPUs.
Q: Will this interfere with my other Python projects?
A: No. All packages are isolated in .venv/ and don't affect system Python.
For detailed architecture and implementation:
- Use Cases & Applications:
docs/use-cases.md- Real-world applications and AI-powered workflows - YouTube Subtitle Extraction:
docs/youtube-subtitles.md- Extract existing captions from YouTube videos - Parallel Processing:
docs/parallel-processing.md- How parallel video processing works - Technical Details:
docs/architecture.md- System design and implementation - AI Assistant Guide:
CLAUDE.md- Quick reference for Claude Code
Contributions welcome! See docs/architecture.md for technical details.
This project is licensed under the MIT License - see the LICENSE file for details.
Built with:
- OpenAI Whisper - Speech recognition
- FFmpeg - Video processing
- PyTorch - Machine learning framework
Need Help? Check docs/architecture.md for technical details.
Made with ❤️ by Vibe Coding!