A web-based user interface for Whisper ASR Webservice that combines speech recognition with LLM post-processing.
This means - everything is locally hosted.
- Click Record
- Speak stuff into the MIC
- Click Stop
- Wait 2-3 seconds
- Paste anywhere pre-formatted and polished text.
This means: Mic -> Whisper -> Raw Text -> LLM -> Formatted Text -> Clipboard.
- 🎤 Browser-based audio recording
- 🔄 Real-time transcription using Whisper ASR
- ✨ Automatic text formatting and correction using Ollama
- 📋 One-click copy to clipboard (or actually no-click, it's automatic)
- 🎯 Support for retry on failed transcriptions
- 💫 Clean, responsive UI with status indicators
- Docker and Docker Compose
- NVIDIA GPU with CUDA support (recommended)
- Ollama running locally with Qwen2.5 model (you can use any model - just tweak the HTML)
- Clone this repository:
git clone https://github.com/HumanFace-Tech/whisper-asr-with-ui.git
cd whisper-asr-with-ui- Start the services:
docker-compose up -d- Access the web interface:
- Open
http://localhost:9180in your browser
This project consists of three main components:
- Frontend: Simple HTML/JS interface for audio recording and display (written mostly by AI)
- Whisper ASR Service: Based on ahmetoner/whisper-asr-webservice
- Ollama Integration: Local LLM for local text post-processing ;)
whisper-asr-webservice:
environment:
- ASR_MODEL=base.en # Choose model size
- ASR_ENGINE=faster_whisper- Web Interface:
9180 - ASR Service:
9100 - Ollama Service:
11434(default)
To modify the web interface:
- Edit
index.htmlin the root directory (you can also change the model there, or the system prompt) - The changes will be reflected immediately through the Nginx server
- Original file was meant to be 1 big HTML file - on purpose, for simplicity
- HumanFace Tech - Author
- Sergiu Nagailic - aka - Nikro - Author
- Whisper ASR Webservice by Ahmet Öner
- Ollama for local LLM processing
- Material Icons for UI elements
This project is licensed under the MIT License - see the LICENSE file for details.
