VideoTranscriberPro is a lightweight web application that allows users to upload video files and receive automatic transcriptions using OpenAI's Whisper model. With a sleek frontend, Whisper-powered backend, and support for multiple file formats, itβs the perfect tool for fast, high-quality video transcription.
- π₯ Upload multiple videos at once (drag & drop interface)
- π§ Automatic transcription with OpenAI Whisper
- π Output with and without timestamps
- π Downloadable SRT subtitle support
- π Local file handling for privacy and performance
- π¨ Clean, responsive frontend UI (HTML/CSS/JS)
- π§ͺ Optional English translation support (when configured)
VideoTranscriberPro/
βββ app.py # Flask backend
βββ requirements.txt # Python dependencies
βββ templates/
β βββ index.html # Frontend UI
βββ static/
β βββ style.css # Styles
β βββ script.js # Frontend logic
βββ uploads/ # Temporary video uploads
βββ output/ # Transcription results
yaml Copy Edit
- Python 3.8+
- FFmpeg (must be installed and added to PATH)
- Windows: Download FFmpeg
- macOS:
brew install ffmpeg - Ubuntu:
sudo apt-get install ffmpeg
# Clone the repository
git clone https://github.com/mobius29er/VideoTranscriberPro.git
cd VideoTranscriberPro
# (Recommended) Create a virtual environment
python -m venv venv
# Activate (Windows)
venv\Scripts\activate
# Activate (macOS/Linux)
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run the Flask app
python app.py
Open your browser to http://localhost:5000
π§ͺ Usage
- Open the web app
- Drag & drop or select one or more videos
- Click Start Transcription
- Wait for processing (progress bar shows status)
- Download the result (with or without timestamps, SRT) (File will autogenerate also in the output folder for with, without, .srt. language, etc.)
π Output Files For each uploaded video, youβll get:
- β video_transcript.txt β raw transcript (no timestamps)
- β video_with_timestamps.txt β readable transcript with [HH:MM:SS - HH:MM:SS] markers
- β video.srt β subtitle file
- β (Optional) English translation .txt and .srt if translation is enabled
π§ Whisper Model Options Whisper supports several models:
Model Size Speed Accuracy tiny ~39 MB Very Fast Low base ~74 MB Fast Moderate small ~244 MB Medium Good medium ~769 MB Slower Very Good large ~1550 MB Slowest Best
To change the model, edit app.py:
model = whisper.load_model("medium") # or "small", "large", etc.
π Configuration
UPLOAD_FOLDER and OUTPUT_FOLDER: Can be changed in app.py
MAX_CONTENT_LENGTH: Controls max upload size (default: 500MB)
ALLOWED_EXTENSIONS: Adjust to accept more/less video types
Translations: Can be enabled if you add a translation module or flag
π§Ύ Example Output (with timestamps)
[00:00:00 - 00:00:05] Welcome to our demo on AI-powered transcription.
[00:00:05 - 00:00:10] In this video, weβll explore how Whisper works.
π» Tech Stack
- Flask β lightweight Python web framework
- Whisper β OpenAI speech recognition model
- JavaScript/CSS β frontend interactivity and styling
- FFmpeg β video to audio conversion tool
π¦ Requirements
--extra-index-url https://download.pytorch.org/whl/nightly/cu128
Flask==3.0.0
openai-whisper
torch==2.9.0.dev20250716+cu128
torchvision==0.24.0.dev20250717+cu128
torchaudio==2.8.0.dev20250717+cu128
werkzeug==3.0.1
numpy<2
pip install -r requirements.txt
π‘ Acknowledgements
Special thanks to PyVideoTrans for their excellent guide on enabling PyTorch Blackwell (RTX 5xxx series) support via CUDA 12.8 Nightly.
- CUDA 12.8 + Nightly Torch for RTX 5xxx (Blackwell) support
- Based on guidance from: https://pyvideotrans.com/en/blog/5090s
π Notes
- First-time run will auto-download the Whisper model
- FFmpeg must be correctly installed for audio extraction to work
- Your machine must support the chosen Whisper model (larger models require more memory)
π£ Future Features (Planned)
- β Language detection & translation toggle
- β SRT subtitle preview in browser
- β GPU support via PyTorch CUDA if available
- β³ User authentication (multi-user support)
- β³ Cloud deployment template (Render, Vercel, Heroku)
π€ Contributing
Contributions are welcome!
Please open an issue or submit a PR with any enhancements, fixes, or ideas.
π License
This project is licensed under the MIT License.
π€ Author
Jeremy Foxx
π Love this tool? Help support ongoing development: Sponsor me on GitHub