Skip to content

HumanFace-Tech/whisper-asr-with-ui

 
 

Repository files navigation

Whisper ASR Box UI (with simple Mic-UI and LLM-based formatting)

A web-based user interface for Whisper ASR Webservice that combines speech recognition with LLM post-processing.

This means - everything is locally hosted.

How it works:

  1. Click Record
  2. Speak stuff into the MIC
  3. Click Stop
  4. Wait 2-3 seconds
  5. Paste anywhere pre-formatted and polished text.

This means: Mic -> Whisper -> Raw Text -> LLM -> Formatted Text -> Clipboard.

Features

  • 🎤 Browser-based audio recording
  • 🔄 Real-time transcription using Whisper ASR
  • ✨ Automatic text formatting and correction using Ollama
  • 📋 One-click copy to clipboard (or actually no-click, it's automatic)
  • 🎯 Support for retry on failed transcriptions
  • 💫 Clean, responsive UI with status indicators

Preview

Preview of Whisper ASR Box UI

Prerequisites

  • Docker and Docker Compose
  • NVIDIA GPU with CUDA support (recommended)
  • Ollama running locally with Qwen2.5 model (you can use any model - just tweak the HTML)

Quick Start

  1. Clone this repository:
git clone https://github.com/HumanFace-Tech/whisper-asr-with-ui.git
cd whisper-asr-with-ui
  1. Start the services:
docker-compose up -d
  1. Access the web interface:
  • Open http://localhost:9180 in your browser

Architecture

This project consists of three main components:

  1. Frontend: Simple HTML/JS interface for audio recording and display (written mostly by AI)
  2. Whisper ASR Service: Based on ahmetoner/whisper-asr-webservice
  3. Ollama Integration: Local LLM for local text post-processing ;)

Configuration

Docker Compose Environment Variables

whisper-asr-webservice:
  environment:
    - ASR_MODEL=base.en      # Choose model size
    - ASR_ENGINE=faster_whisper

Port Configuration

  • Web Interface: 9180
  • ASR Service: 9100
  • Ollama Service: 11434 (default)

Development

To modify the web interface:

  1. Edit index.html in the root directory (you can also change the model there, or the system prompt)
  2. The changes will be reflected immediately through the Nginx server
  3. Original file was meant to be 1 big HTML file - on purpose, for simplicity

Credits

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

OpenAI Whisper ASR Webservice API (with small Recorder App)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 66.5%
  • HTML 31.6%
  • Dockerfile 1.9%