Whisper ASR Box UI (with simple Mic-UI and LLM-based formatting)

A web-based user interface for Whisper ASR Webservice that combines speech recognition with LLM post-processing.

This means - everything is locally hosted.

How it works:

Click Record
Speak stuff into the MIC
Click Stop
Wait 2-3 seconds
Paste anywhere pre-formatted and polished text.

This means: Mic -> Whisper -> Raw Text -> LLM -> Formatted Text -> Clipboard.

Features

🎤 Browser-based audio recording
🔄 Real-time transcription using Whisper ASR
✨ Automatic text formatting and correction using Ollama
📋 One-click copy to clipboard (or actually no-click, it's automatic)
🎯 Support for retry on failed transcriptions
💫 Clean, responsive UI with status indicators

Preview

Prerequisites

Docker and Docker Compose
NVIDIA GPU with CUDA support (recommended)
Ollama running locally with Qwen2.5 model (you can use any model - just tweak the HTML)

Quick Start

Clone this repository:

git clone https://github.com/HumanFace-Tech/whisper-asr-with-ui.git
cd whisper-asr-with-ui

Start the services:

docker-compose up -d

Access the web interface:

Open http://localhost:9180 in your browser

Architecture

This project consists of three main components:

Frontend: Simple HTML/JS interface for audio recording and display (written mostly by AI)
Whisper ASR Service: Based on ahmetoner/whisper-asr-webservice
Ollama Integration: Local LLM for local text post-processing ;)

Configuration

Docker Compose Environment Variables

whisper-asr-webservice:
  environment:
    - ASR_MODEL=base.en      # Choose model size
    - ASR_ENGINE=faster_whisper

Port Configuration

Web Interface: 9180
ASR Service: 9100
Ollama Service: 11434 (default)

Development

To modify the web interface:

Edit index.html in the root directory (you can also change the model there, or the system prompt)
The changes will be reflected immediately through the Nginx server
Original file was meant to be 1 big HTML file - on purpose, for simplicity

Credits

HumanFace Tech - Author
Sergiu Nagailic - aka - Nikro - Author
Whisper ASR Webservice by Ahmet Öner
Ollama for local LLM processing
Material Icons for UI elements

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 312 Commits
.github		.github
app		app
docs		docs
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Dockerfile.gpu		Dockerfile.gpu
LICENCE		LICENCE
README.md		README.md
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.yml		docker-compose.yml
index.html		index.html
mkdocs.yml		mkdocs.yml
nginx-default.conf		nginx-default.conf
openapi.json		openapi.json
poetry.lock		poetry.lock
preview.png		preview.png
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Whisper ASR Box UI (with simple Mic-UI and LLM-based formatting)

How it works:

Features

Preview

Prerequisites

Quick Start

Architecture

Configuration

Docker Compose Environment Variables

Port Configuration

Development

Credits

License

About

Uh oh!

Releases

Packages

Languages

License

HumanFace-Tech/whisper-asr-with-ui

Folders and files

Latest commit

History

Repository files navigation

Whisper ASR Box UI (with simple Mic-UI and LLM-based formatting)

How it works:

Features

Preview

Prerequisites

Quick Start

Architecture

Configuration

Docker Compose Environment Variables

Port Configuration

Development

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages