PolyglotWhisperer

Video transcription and translation CLI for language learners.
Transcribe with Whisper (local or cloud API), translate with LLMs, play with dual subtitles — all in one pipeline.

Features

Whisper transcription with word-level timestamps — local (stable-ts, MLX/CUDA/CPU) or cloud API (Groq, OpenAI via LiteLLM)
Smart subtitle segmentation — spaCy POS tagging fixes dangling articles, prepositions, and Romance clitics (l', d', qu') across 26 languages
Subtitle download — optionally grabs existing subtitles from YouTube/etc. via yt-dlp (human-made preferred, --subs to enable), skips Whisper when available
LLM translation — any language pair via Ollama (local) or cloud LLMs (Groq, OpenAI, Claude, etc.)
Vocabulary analysis — CEFR difficulty estimation (A1–C2), rare word extraction with context and translations
Dual playback — original + translation subtitles in mpv, or browser-based web player (pgw serve)
Batch processing — multiple files, glob patterns, URL lists, with error-continue
Export — VTT, SRT, ASS, plain text, bilingual VTT, side-by-side PDF/EPUB
Shared cache — deduplicates downloads, audio extraction, and transcriptions across workspaces

Quick Start

Prerequisites

Python 3.12+, uv, ffmpeg
Optional: mpv (playback), Ollama (local LLM)

# macOS
brew install uv ffmpeg mpv
brew install pango           # required for PDF export (WeasyPrint)
brew install --cask ollama   # optional

# Ubuntu/Debian
sudo apt install ffmpeg mpv libpango-1.0-0 libpangoft2-1.0-0
curl -fsSL https://astral.sh/uv/install.sh | sh
curl -fsSL https://ollama.com/install.sh | sh   # optional

macOS PDF export: If PDF export fails with cannot load library 'libgobject-2.0-0', add this to your ~/.zshrc:
export DYLD_FALLBACK_LIBRARY_PATH=/opt/homebrew/lib
This lets the uv-managed Python find Homebrew's native libraries.

Installation

git clone https://github.com/RizhongLin/PolyglotWhisperer.git
cd PolyglotWhisperer
uv sync --all-extras

# Pull a local LLM for translation (optional)
ollama pull qwen3:8b

spaCy language models are downloaded automatically on first use.

Install only what you need

uv sync --extra transcribe    # Local Whisper (stable-ts, MLX)
uv sync --extra download      # URL downloading (yt-dlp)
uv sync --extra llm           # LLM translation (LiteLLM, Ollama)
uv sync --extra nlp           # spaCy NLP (POS tagging, lemmatizer)
uv sync --extra vocab         # Vocabulary analysis (wordfreq + spaCy)
uv sync --extra export        # PDF/EPUB export (WeasyPrint, ebooklib)

API Keys (for cloud providers)

cp .env.example .env   # edit and add your keys

LiteLLM routes to any provider via model prefix — set the matching API key in .env. See .env.example for supported providers.

Usage

# Full pipeline: download → transcribe → translate → play
pgw run "https://example.com/video" --translate en --no-play

# Refine transcription with LLM before translating
pgw run "https://example.com/video" --refine --translate en --no-play

# Cloud API transcription + translation (no local GPU needed)
pgw run "https://example.com/video" --backend api --llm-backend api --translate en --no-play

# Reuse existing subtitles from video page (skip Whisper if available)
pgw run "https://example.com/video" --subs --translate en --no-play

# Batch processing
pgw run *.mp4 --translate en --no-play
pgw run urls.txt --backend api --translate en --no-play

# Transcribe only
pgw transcribe video.mp4 -l fr
pgw transcribe *.mp4 --backend api -l fr

# Translate existing subtitles
pgw translate subtitles.fr.vtt --to en

# Vocabulary analysis
pgw vocab pgw_workspace/my-video/20260217_164802/

# Playback
pgw play pgw_workspace/my-video/20260217_164802/
pgw serve pgw_workspace/my-video/20260217_164802/   # web player

Configuration

Config layers (lowest to highest priority): packaged defaults → ~/.config/pgw/config.toml → ./pgw.toml → .env + env vars → CLI flags.

# pgw.toml
[whisper]
backend = "api"                       # "local" or "api"
api_model = "groq/whisper-large-v3-turbo"  # provider/model via LiteLLM
language = "fr"

[llm]
backend = "api"                       # "local" or "api"
local_model = "ollama_chat/qwen3:8b"  # Ollama for local backend
api_model = "openrouter/openai/gpt-oss-120b"  # any LiteLLM provider/model
target_language = "en"

Environment variables use PGW_ prefix: PGW_WHISPER__BACKEND=api, PGW_LLM__BACKEND=api, PGW_LLM__API_MODEL=<provider/model>.

Workspace Output

pgw_workspace/
├── .cache/                           # Shared cache (cross-workspace)
│   ├── audio/                        # Extracted audio
│   ├── compressed/                   # API-compressed MP3s
│   ├── transcriptions/               # Whisper results (local + API)
│   └── downloads/                    # yt-dlp downloads + subtitles
└── my-video/
    └── 20260217_164802/
        ├── video.mp4                 # Symlinked from source
        ├── audio.wav                 # Symlinked from cache
        ├── transcription.fr.vtt      # Original subtitles (from Whisper or downloaded)
        ├── transcription.fr.txt      # Plain text
        ├── translation.en.vtt        # Translated subtitles
        ├── translation.en.txt        # Translation plain text
        ├── bilingual.fr-en.vtt       # Dual-language VTT
        ├── parallel.fr-en.pdf        # Side-by-side PDF
        ├── parallel.fr-en.epub       # Side-by-side EPUB
        ├── vocabulary.fr.json        # CEFR analysis + rare words
        └── metadata.json

Transcription Backends

Backend	Technology	Pros	Limits
Local (default)	stable-ts	Best quality, word-level timestamps, custom regrouping	Requires GPU / model downloads
Cloud API	LiteLLM	Fast, cheap, no GPU, auto-compresses large files	API key required

# Local
pgw transcribe audio.wav -l fr                              # large-v3-turbo on MLX
pgw transcribe audio.wav -l fr --whisper-model medium        # smaller model

# Cloud API (any LiteLLM-supported provider)
pgw transcribe audio.wav --backend api -l fr
pgw transcribe audio.wav --backend api --whisper-model openai/whisper-1 -l fr

Vocabulary Analysis

Each processed video gets a vocabulary profile: CEFR level estimation via wordfreq, top 30 rare words with context and translation, spaCy lemmatization to group inflected forms.

pgw vocab pgw_workspace/my-video/20260217_164802/ --top 50

How It Works

Video/Audio/URL
  → Download (yt-dlp, cached) + fetch existing subtitles
  → Extract Audio (ffmpeg, cached)
  → Use downloaded subtitles OR Transcribe (Whisper + spaCy segmentation)
  → Refine transcription (LLM, optional — fixes ASR errors, punctuation)
  → Translate (LLM, optional — sentence-boundary chunking with overlap)
  → Export (VTT/TXT/bilingual VTT/PDF/EPUB) + Vocabulary Analysis
  → Play (mpv or web player)

Tech Stack

Component	Technology
Transcription	stable-ts (MLX/CUDA/CPU)
Cloud APIs	LiteLLM (Groq, OpenAI, Ollama, Claude)
NLP	spaCy (26 language codes) + wordfreq
Export	WeasyPrint (PDF) + ebooklib (EPUB)
Subtitles	pysubs2
Download	yt-dlp
Playback	mpv
CLI	Typer + Rich

Supported Languages

Whisper supports 100 languages — run pgw languages for the full list. spaCy POS tagging and clitic handling covers 26 language codes (including Norwegian no/nn aliases).

Common language codes

Code	Language	Code	Language	Code	Language
`fr`	French	`zh`	Chinese	`pl`	Polish
`en`	English	`ja`	Japanese	`sv`	Swedish
`de`	German	`ko`	Korean	`da`	Danish
`es`	Spanish	`ar`	Arabic	`fi`	Finnish
`it`	Italian	`ru`	Russian	`uk`	Ukrainian
`pt`	Portuguese	`hi`	Hindi	`vi`	Vietnamese
`nl`	Dutch	`tr`	Turkish

Roadmap

Whisper transcription (local + cloud API) with word-level timestamps
LLM translation + dual subtitle playback
spaCy subtitle segmentation + Romance clitic handling (26 language codes)
Audio cache, batch processing, vocabulary analysis, parallel text export
Streaming pipeline event system
Subtitle download from video pages, web player, content-addressable cache
Hosted demo (Gradio on Hugging Face Spaces)
Speaker diarization
Anki card generation from subtitle pairs

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
config		config
src/pgw		src/pgw
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PolyglotWhisperer

Features

Quick Start

Prerequisites

Installation

API Keys (for cloud providers)

Usage

Configuration

Workspace Output

Transcription Backends

Vocabulary Analysis

How It Works

Tech Stack

Supported Languages

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

PolyglotWhisperer

Features

Quick Start

Prerequisites

Installation

API Keys (for cloud providers)

Usage

Configuration

Workspace Output

Transcription Backends

Vocabulary Analysis

How It Works

Tech Stack

Supported Languages

Roadmap

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages