Skip to content

BBC-Esq/Elegant-Transcriber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

image

4x faster than the fastest Whisper implementation AND more accurate.

  • Batch transcribe multiple files in a directory and optionally all sub-directories.
  • Optional timestamps with configurable segment intervals
  • Works on GPU (CUDA) and CPU, Windows or Linux
  • Supported file types: AAC, AMR, ASF, AVI, FLAC, M4A, MKV, MP3, MP4, WAV, WEBM, WMA
  • Link to article on Medium

Installation

1. Windows (installer)

Download and run Elegant_Transcriber_Setup.exe (right-click and run as administrator)

2. Windows (from source)

Download the latest release, unzip and extract, then navigate to the directory containing main.py and run:

python -m venv .
.\Scripts\activate
python install.py
python main.py

3. Linux (from source)

Download the latest release, unzip and extract, then navigate to the directory containing main.py and run:

python3 -m venv .
source bin/activate
python install.py
python main.py

Benchmarks (GPU)

Library Model Batch Chunk VRAM Usage Time Real Time Quality Ranking
Elegant Transcriber (NeMo) Parakeet TDT 0.6B v2 1 90s ~3.3 GB 14.9s 580x #8
Transformers Whisper Large v3 32 Default ~12.4 GB 52.2s 166x #32
WhisperS2T Reborn (Ctranslate2) Whisper Large v3 32 Default ~13.4 GB 66.9s 129x #32
Faster-Whisper (Ctranslate2) Whisper Large v3 32 Default ~12.5 GB 75.9s 114x #32
WhisperX (Ctranslate2) Whisper Large v3 32 Default ~12.8 GB 71.8s 120x #32
Transformers Granite 4.0 1B Speech 12 30s ~6.3 GB 97.7s 88x #1
Elegant Transcriber (NeMo) Canary-Qwen-2.5b 1 40s ~11.1 GB 639.8ss 13.5x #2

All models were run in bfloat16.
All VRAM measurements include model weights and inference overhead and subtract background usage.
All parameters were chosen to achieve a maximum throughput of ~90% CUDA core usage on an RTX 4090.

Benchmarks (CPU)

  • ~13 minute private audio file.
  • CPU tests use a shorter audio sample to keep runtimes manageable.
Library Model Batch Chunk RAM Usage Time Real Time Quality Ranking
Elegant Transcriber Parakeet TDT 0.6B v2 1 90s ~5.6 GB 29.0s 26.8x #8
Faster-Whisper (Ctranslate2) Whisper Large v3 1 Default ~6.5 GB 211.8s 3.67x #32
WhisperS2T Reborn (Ctranslate2) Whisper Large v3 1 Default ~6.6 GB 257.9s 3.02x #32
Transformers Whisper Large v3 1 Default ~6.6 GB 311.1s 2.50x #32
Elegant Transcriber (NeMo) Canary-Qwen-2.5b 1 40s ~11.1 GB 370.1ss 2.1x #2
WhisperX (Ctranslate2) Whisper Large v3 1 Default ~7.3 GB 396.4s 1.96x #32

All models were loaded in float32 for CPU compatibility.
20 threads were used on an Intel 13900k resulting in ~90% CPU usage.
I couldn't get Granite Speech to run...

Special Thanks

  • Nvidia for the Parkeet models, which are hands down the best balance of accuracy and compute time for most people IMHO.
  • IBM for Granite Speech Models, which, as of March, 2026, rank #1 on the ASR leaderboard in terms of accuracy. I'll include them in a later release.
  • OpenAI for the older Whisper models setting the gold standard for so many years.

About

Extremely fast and accurate audio transcriber surpassing Whisper. Fast on GPU or CPU.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors