Skip to content

Latest commit

Β 

History

History
184 lines (127 loc) Β· 4.82 KB

File metadata and controls

184 lines (127 loc) Β· 4.82 KB

🎬 VideoTranscriberPro

VideoTranscriberPro is a lightweight web application that allows users to upload video files and receive automatic transcriptions using OpenAI's Whisper model. With a sleek frontend, Whisper-powered backend, and support for multiple file formats, it’s the perfect tool for fast, high-quality video transcription.

screenshot


πŸš€ Features

  • πŸŽ₯ Upload multiple videos at once (drag & drop interface)
  • 🧠 Automatic transcription with OpenAI Whisper
  • πŸ“ Output with and without timestamps
  • 🌐 Downloadable SRT subtitle support
  • πŸ“ Local file handling for privacy and performance
  • 🎨 Clean, responsive frontend UI (HTML/CSS/JS)
  • πŸ§ͺ Optional English translation support (when configured)

πŸ—‚ Project Structure

VideoTranscriberPro/

β”œβ”€β”€ app.py # Flask backend

β”œβ”€β”€ requirements.txt # Python dependencies

β”œβ”€β”€ templates/

β”‚ └── index.html # Frontend UI

β”œβ”€β”€ static/

β”‚ β”œβ”€β”€ style.css # Styles

β”‚ └── script.js # Frontend logic

β”œβ”€β”€ uploads/ # Temporary video uploads

└── output/ # Transcription results

yaml Copy Edit


βš™οΈ Installation & Setup

1. Prerequisites

  • Python 3.8+
  • FFmpeg (must be installed and added to PATH)
    • Windows: Download FFmpeg
    • macOS: brew install ffmpeg
    • Ubuntu: sudo apt-get install ffmpeg

2. Installation Steps

# Clone the repository
git clone https://github.com/mobius29er/VideoTranscriberPro.git
cd VideoTranscriberPro

# (Recommended) Create a virtual environment
python -m venv venv
# Activate (Windows)
venv\Scripts\activate
# Activate (macOS/Linux)
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run the Flask app
python app.py
Open your browser to http://localhost:5000

πŸ§ͺ Usage

  • Open the web app
  • Drag & drop or select one or more videos
  • Click Start Transcription
  • Wait for processing (progress bar shows status)
  • Download the result (with or without timestamps, SRT) (File will autogenerate also in the output folder for with, without, .srt. language, etc.)

πŸ“„ Output Files For each uploaded video, you’ll get:

  • βœ… video_transcript.txt – raw transcript (no timestamps)
  • βœ… video_with_timestamps.txt – readable transcript with [HH:MM:SS - HH:MM:SS] markers
  • βœ… video.srt – subtitle file
  • βœ… (Optional) English translation .txt and .srt if translation is enabled

🧠 Whisper Model Options Whisper supports several models:

Model Size Speed Accuracy tiny ~39 MB Very Fast Low base ~74 MB Fast Moderate small ~244 MB Medium Good medium ~769 MB Slower Very Good large ~1550 MB Slowest Best

To change the model, edit app.py:

model = whisper.load_model("medium")  # or "small", "large", etc.
πŸ›  Configuration
UPLOAD_FOLDER and OUTPUT_FOLDER: Can be changed in app.py

MAX_CONTENT_LENGTH: Controls max upload size (default: 500MB)

ALLOWED_EXTENSIONS: Adjust to accept more/less video types

Translations: Can be enabled if you add a translation module or flag

🧾 Example Output (with timestamps)

[00:00:00 - 00:00:05] Welcome to our demo on AI-powered transcription.
[00:00:05 - 00:00:10] In this video, we’ll explore how Whisper works.

πŸ’» Tech Stack

  • Flask – lightweight Python web framework
  • Whisper – OpenAI speech recognition model
  • JavaScript/CSS – frontend interactivity and styling
  • FFmpeg – video to audio conversion tool

πŸ“¦ Requirements

--extra-index-url https://download.pytorch.org/whl/nightly/cu128

Flask==3.0.0
openai-whisper
torch==2.9.0.dev20250716+cu128
torchvision==0.24.0.dev20250717+cu128
torchaudio==2.8.0.dev20250717+cu128
werkzeug==3.0.1
numpy<2
pip install -r requirements.txt

πŸ’‘ Acknowledgements

Special thanks to PyVideoTrans for their excellent guide on enabling PyTorch Blackwell (RTX 5xxx series) support via CUDA 12.8 Nightly.

πŸ“Œ Notes

  • First-time run will auto-download the Whisper model
  • FFmpeg must be correctly installed for audio extraction to work
  • Your machine must support the chosen Whisper model (larger models require more memory)

πŸ“£ Future Features (Planned)

  • βœ… Language detection & translation toggle
  • βœ… SRT subtitle preview in browser
  • βœ… GPU support via PyTorch CUDA if available
  • ⏳ User authentication (multi-user support)
  • ⏳ Cloud deployment template (Render, Vercel, Heroku)

🀝 Contributing

Contributions are welcome!

Please open an issue or submit a PR with any enhancements, fixes, or ideas.

πŸ“œ License

This project is licensed under the MIT License.

πŸ‘€ Author

Jeremy Foxx

πŸŽ‰ Love this tool? Help support ongoing development: Sponsor me on GitHub