A Flask web application that converts audio and video files to text using OpenAI's Whisper model.
- Upload multiple media files at once
- Drag and drop file upload interface
- Real-time processing progress tracking
- Support for various audio/video formats (MP3, MP4, WAV, M4A, etc.)
- Download transcribed text as .txt files
- Copy text to clipboard functionality
- Responsive web interface
- Install Python dependencies:
pip install -r requirements.txt- Install FFmpeg (required by Whisper):
# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg
# CentOS/RHEL
sudo yum install ffmpeg
# macOS
brew install ffmpeg- Run the Flask application:
python app.py-
Open your web browser and go to
http://localhost:5000 -
Upload one or more media files using the drag-and-drop interface
-
Wait for the conversion to complete and view/download your transcribed text
- Audio: MP3, M4A, WAV, MPGA
- Video: MP4, MPEG, WebM, MOV, AVI, FLV, MKV
You can modify the Whisper model in app.py by changing the model name in the load_whisper_model() function:
tiny.en- Fastest, English onlybase.en- Better accuracy, English onlysmall.en- Good balance of speed and accuracymedium.en- Higher accuracylarge- Best accuracy, supports multiple languages
GET /- Main upload pagePOST /upload- Upload files for processingGET /status/<job_id>- Check processing statusGET /result/<job_id>- View transcription resultGET /download/<job_id>- Download transcription as text file
- Maximum file size: 500MB
- Files are processed asynchronously in the background
- Uploaded files are automatically deleted after processing
- Transcribed text files are saved temporarily for download