Skip to content

jmcoimbra/sound2transcript

Repository files navigation

sound2transcript

CircleCI Codacy Badge License: MIT Platform: macOS

sound2transcript

Capture macOS system audio from video lectures and transcribe locally using whisper.cpp. No cloud. No subscription. Supports English and Brazilian Portuguese via auto-detection.

What it does

  • Routes system audio through BlackHole 2ch virtual driver
  • Records sessions as 16kHz mono WAV via ffmpeg
  • Transcribes with whisper-cli (default: large-v3-turbo with Metal GPU acceleration on Apple Silicon)
  • Outputs .txt (required), optionally .srt and .vtt
  • Garbage collects old recordings on schedule via launchd

Install

Option A: Homebrew (recommended)

brew tap jmcoimbra/tap
brew install sound2transcript

This installs stream-transcribe and sound2transcript-gc into your PATH, and pulls in ffmpeg and whisper-cpp as dependencies automatically.

After installing, complete the one-time setup:

# 1. Install the virtual audio driver
brew install --cask blackhole-2ch

# 2. Download the Whisper model (1.5 GB)
curl -L --progress-bar \
  -o "$(brew --prefix)/var/sound2transcript/models/ggml-medium.bin" \
  "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.bin"

# 3. Configure audio routing (required)
open docs/SETUP.md  # or see Audio Routing below

Option B: From source

git clone https://github.com/jmcoimbra/sound2transcript.git
cd sound2transcript
make install
make download-model

Requires Homebrew and macOS 13+.

Update

Homebrew

brew update
brew upgrade sound2transcript

That's it. Homebrew handles fetching the new version and replacing the binaries.

Your transcripts, recordings, model, and config are untouched - they live in the data directory, not in the Homebrew prefix.

From source

cd sound2transcript
git pull
make install

Re-runs the install, copying updated scripts over the existing ones. Your data and config are preserved.

Uninstall

Homebrew

brew uninstall sound2transcript
brew untap jmcoimbra/tap  # optional: remove the tap

From source

make uninstall

Both methods leave your data at ~/sound2transcript/ intact. Remove it manually if you no longer need it:

rm -rf ~/sound2transcript

Audio routing

See docs/SETUP.md - required one-time setup to route system audio through BlackHole before first use.

Choosing a model

See docs/MODELS.md for model comparison, architecture-specific install instructions (Apple Silicon vs Intel), and thread tuning.

Apple Silicon users: Make sure you're using ARM Homebrew (/opt/homebrew/bin/brew), not Intel Homebrew (/usr/local/bin/brew). Intel Homebrew runs under Rosetta 2 and produces binaries with no Metal GPU access - transcription will be 10-20x slower.

Use

Start recording:

stream-transcribe

Press Ctrl+C to stop. Transcription runs automatically. Output goes to ~/sound2transcript/transcripts/.

Keep the WAV file after transcription:

stream-transcribe --keep

To always keep recordings, set KEEP_RECORDINGS="1" in your config.

Check version:

stream-transcribe --version

Schedule garbage collection (optional)

make install-launchd

Runs daily at 03:30, removing old WAV files and enforcing disk caps.

Directory layout

~/sound2transcript/
├── models/         # whisper model files
├── recordings/     # intermediate WAV files (deleted after transcription unless --keep)
├── transcripts/    # output .txt / .srt / .vtt
├── logs/           # session and gc logs
└── config/         # config.env

Configuration

All settings are in ~/sound2transcript/config/config.env:

Variable Default Description
BLACKHOLE_DEVICE_NAME BlackHole 2ch Audio loopback device name
MODEL_PATH ~/sound2transcript/models/ggml-large-v3-turbo-q5_0.bin Whisper model path (guide)
LANG auto Language: auto, en, or pt
OUTPUT_TXT 1 Generate .txt output
OUTPUT_SRT 1 Generate .srt subtitles
OUTPUT_VTT 0 Generate .vtt subtitles
RECORDINGS_RETENTION_DAYS 3 Days to keep WAV files
TRANSCRIPTS_RETENTION_DAYS 90 Days to keep transcripts (0 = forever)
RECORDINGS_MAX_GB 10 Max disk for recordings
WHISPER_THREADS 4 CPU threads for transcription
SILENCE_THRESHOLD_DB -50 Volume threshold (dB) below which recording is flagged silent
KEEP_RECORDINGS 0 Keep WAV after transcription (1=keep, 0=delete). Override with --keep
LOG_LEVEL info Log verbosity: info, warn, or error

Development

make lint       # shellcheck + shfmt
make test       # bats-core tests
make check      # lint + test

Releasing a new version

  1. Bump the version in VERSION
  2. Commit: git commit -am "Bump version to X.Y.Z"
  3. Tag and push: make release
  4. Create the GitHub release: gh release create vX.Y.Z
  5. Update the SHA in the homebrew-tap formula

License

MIT

About

Capture what is in your sound output and create a transcript for it

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors