Skip to content

juancho-cl/TranscripService

Repository files navigation

TranscripService

Local video transcription using OpenAI Whisper — no API key, no cloud, no cost.

Python License Whisper


Features

  • Transcribe MP4 and other video formats to plain text, fully offline
  • Real timestamps sourced from Whisper's native speech segments ([MM:SS - MM:SS])
  • Live progress indicator during transcription
  • Batch processing: automatically transcribe all MP4 files in a folder
  • Dual output per video: raw transcript + formatted transcript with timestamps
  • Skip logic: batch script skips files already transcribed

How It Works

Video file (.mp4)
      │
      ▼
 ffprobe (FFmpeg)
  └─ extracts audio duration for progress estimation
      │
      ▼
 OpenAI Whisper (local, 'small' model)
  └─ speech-to-text with native segment timestamps
      │
      ▼
 Two output files:
  ├─ <name>_transcription.txt   (raw continuous text)
  └─ <name>_formatted.txt       (timestamped segments)

Whisper runs entirely on your machine — no audio is sent to any external service. The model (~461 MB) is downloaded automatically on first run.


Prerequisites

  • Python 3.8+
  • FFmpeg installed and available in your system PATH
    • Windows: ffmpeg.org/download or winget install ffmpeg
    • macOS: brew install ffmpeg
    • Linux: sudo apt install ffmpeg
  • ~500 MB free disk space (Whisper small model, downloaded once on first run)

Installation

git clone https://github.com/juancho-cl/TranscripService.git
cd TranscripService
pip install -r requirements.txt

Usage

Single file — basic

python transcribe.py "meeting_recording.mp4"

Output: meeting_recording_transcription.txt


Single file — with progress bar

python transcribe_with_progress.py "meeting_recording.mp4"

Output:

  • meeting_recording_transcription.txt
  • meeting_recording_formatted.txt

Single file — with real timestamps (recommended)

python transcribe_with_segments.py "meeting_recording.mp4"

Output:

  • meeting_recording_transcription.txt
  • meeting_recording_formatted.txt

Batch processing (Windows)

Place Script.bat and all your .mp4 files in the same folder, then run:

Script.bat

The script will:

  1. Process every .mp4 in the folder
  2. Skip files already transcribed
  3. Save outputs to output/
  4. Save logs to logs/

Script Comparison

Script Output files Timestamps Best for
transcribe.py 1 (raw text) None Quick transcription, minimal setup
transcribe_with_progress.py 2 (raw + formatted) Simulated Visual feedback during long files
transcribe_with_segments.py 2 (raw + formatted) Real (Whisper native) Accurate timestamped transcripts

Sample Output

[00:02 - 00:07]
So this is the current server and this is a shared folder.

[00:09 - 00:15]
Right now you can see there are two different folders here.

[00:15 - 00:22]
What I'm going to show you is how to map this as a network drive on your computer.

[00:22 - 00:28]
Once it's mapped, it will appear just like any other drive in File Explorer.

License

MIT — see LICENSE.

About

Local video transcription using OpenAI Whisper — real timestamps, progress tracking, and batch automation. No API key required.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors