Skip to content

MCERQUA/OpenVoiceUI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

514 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenVoiceUI Banner

OpenVoiceUI

The open-source voice AI that actually does work.

npm version MIT License GitHub Stars Website

Install, open localhost:5001, say "build me a dashboard", and watch it render live.


Watch the demo -- see voice-to-canvas in action


Install

Prerequisite: Docker must be installed and running for all install methods.

Pinokio (one-click)

Download Pinokio if you don't have it, then search "OpenVoiceUI" in the app store and click Install.

npm

npx openvoiceui setup     # interactive wizard — walks you through API keys + builds Docker images
npx openvoiceui start     # starts everything

Docker

git clone https://github.com/MCERQUA/OpenVoiceUI.git
cd OpenVoiceUI
cp .env.example .env        # edit with your API keys
docker compose up

Open localhost:5001 and start talking.


What is OpenVoiceUI?

OpenVoiceUI is a hands-free, AI-controlled computer. You talk — it builds. Live web apps, dashboards, games, full websites — rendered in real time while you watch. No mouse, no keyboard, no typing prompts into a chat box.

It runs on OpenClaw and works with any LLM. The AI agent can build and display apps mid-conversation, switch between projects with a voice command, generate music on the fly, delegate work to parallel sub-agents, and remember everything across sessions. It uses any Claude Code or OpenClaw skill — and the community can build and share more through the plugin system.

Self-hosted. Your hardware, your data. MIT licensed, forever free.

Core Features

  • Hands-Free AI Computer — Talk and watch it work. The AI builds apps, switches between projects, runs tasks, and displays results on a live visual canvas — all without touching a mouse or keyboard.
  • Live Canvas — AI renders real HTML pages mid-conversation: dashboards, tools, galleries, reports, full web apps. Not text responses — real interactive pages you can use.
  • AI Music Generation — Generate songs on the fly with your voice using Suno. Full music player with playlist management built in.
  • Custom Animated Interface — Choose from animated face modes (eye-face avatar, reactive halo-smoke orb) or install community-built faces through plugins. Build your own — the face system is fully extensible.
  • Sub-Agents — Delegate multiple tasks to parallel AI workers simultaneously and get results back.
  • Long-Term Memory — Optional context engine plugin curates knowledge every turn. Persists across sessions in human-readable markdown.
  • Desktop OS Interface — Themed desktop environment with window management (Windows XP, macOS, Ubuntu, Win95, Win 3.1).
  • Admin Dashboard — Mobile-responsive. Agent profiles, provider config, workspace file browser, plugin management, system health. Everything editable live.
  • Self-Hosted — Your hardware, your data. No vendor lock-in, no monthly fees.

And More

  • Image generation (FLUX.1, Stable Diffusion 3.5)
  • Video creation (Remotion Studio)
  • Voice cloning (Qwen3-TTS via fal.ai)
  • Cron jobs for scheduled automation
  • File explorer with drag-and-drop
  • Agent profiles — switch personas, voices, and LLM providers from the admin panel

Install Details

Option 1: Pinokio (one-click)

  1. Install Pinokio if you don't have it
  2. Search "OpenVoiceUI" in the Pinokio app store
  3. Click Install, then Start

Pinokio handles Docker, dependencies, and configuration automatically.

Option 2: npm

Requires Node.js 20+, Python 3.10+, and Docker.

npx openvoiceui setup     # interactive wizard — configures LLM, TTS, API keys, builds Docker images
npx openvoiceui start     # starts OpenClaw gateway + Supertonic TTS + voice UI

The setup wizard walks you through choosing an LLM provider, TTS provider, and entering API keys. Configuration is saved to .env and openclaw-data/.

npx openvoiceui stop      # stop all services
npx openvoiceui status    # check what's running
npx openvoiceui logs      # tail service logs

Option 3: Docker

Requires Docker and Docker Compose.

git clone https://github.com/MCERQUA/OpenVoiceUI.git
cd OpenVoiceUI
cp .env.example .env

Edit .env with your API keys (at minimum: an LLM provider key and optionally a TTS key). Then:

docker compose up -d

This starts three containers:

Container Port Purpose
openclaw 18791 LLM gateway — routes to your chosen LLM provider
supertonic (internal) Free local TTS — no API key needed
openvoiceui 5001 Voice UI + Canvas + Admin dashboard

Open http://localhost:5001 to use the voice interface, or http://localhost:5001/admin for the admin dashboard.

To stop: docker compose down

Option 4: VPS / Production

For running on an Ubuntu server with nginx and systemd:

git clone https://github.com/MCERQUA/OpenVoiceUI.git
cd OpenVoiceUI
cp .env.example .env               # edit with your API keys
sudo bash deploy/setup-sudo.sh     # creates dirs, installs systemd service
bash deploy/setup-nginx.sh         # generates nginx config (edit domain)

See deploy/ for the full production setup including SSL, nginx reverse proxy, and systemd service files.


Configuration

All configuration is in .env. Copy .env.example to .env and fill in your values.

Required:

  • An LLM provider API key (OpenAI, Anthropic, Groq, Z.AI, or any OpenClaw-compatible provider)
  • CLAWDBOT_AUTH_TOKEN — set during npx openvoiceui setup or in OpenClaw's setup wizard

Optional but recommended:

  • GROQ_API_KEY — enables Groq Orpheus TTS (fast, high quality, free tier)
  • SUNO_API_KEY — enables AI music generation
  • CLERK_PUBLISHABLE_KEY — enables login/auth (for multi-user or public deployments)

See .env.example for all available options with descriptions.


Works With Any Provider

LLM

Provider Status
OpenClaw Gateway Built-in — routes to OpenAI, Anthropic, Groq, Z.AI, and more
Z.AI (GLM-5-turbo) Built-in
Groq (Llama, Qwen) Via OpenClaw
Google Gemini Via OpenClaw
MiniMax Via OpenClaw
Ollama (local) Via adapter
Any LLM Drop-in gateway plugin

Text-to-Speech

Provider Status
Supertonic (local) Free, ships with Docker setup
Groq Orpheus Fast cloud TTS, free tier
Resemble AI Premium cloned voices
Qwen3-TTS (fal.ai) Voice cloning
Hume EVI Emotion-aware
ElevenLabs High quality, many voices

Speech-to-Text

Provider Status
Web Speech API Free, browser-native (default)
Deepgram Streaming, accurate
Groq Whisper Fast cloud transcription

Admin Dashboard

Access at localhost:5001/admin. Mobile-responsive.

  • Profiles — View and activate agent personas
  • Agent Editor — Edit name, voice, LLM provider, system prompt, features, and agent workspace files. 4 tabs: Profile, System Prompt, Features, Agent Files
  • Plugins — Install and manage face packs, gateways, and extensions
  • Canvas Pages — Toggle public/private, lock pages, delete with archive
  • Workspace Files — Browse and edit agent workspace. Audio playback, image preview built in.
  • Music (Suno) — View all generated songs, play inline, archive tracks
  • Provider Config — Select LLM, TTS, STT providers. Saves to active profile.
  • Health and Stats — CPU, RAM, disk, gateway status, session reset
  • Connector Tests — 12 automated endpoint diagnostics

Use Cases

Small Business — AI receptionist, appointment scheduler, report builder. Talk to your AI and get a live dashboard of today's leads, reviews, and tasks.

Digital Agencies — Deploy custom AI assistants per client. Multi-tenant ready. Each client gets their own voice-powered workspace.

Developers — Fork it, extend it, deploy it anywhere. MIT licensed. Build custom plugins, gateway adapters, and canvas pages on top of a voice-first platform.


How It's Different

OpenVoiceUI Typical Voice AI
Source Open source (MIT) Closed source
Canvas UI Live HTML rendering Text/audio only
Skills Any Claude Code or OpenClaw skill API endpoints
Music AI music generation (Suno) None
Memory Plugin-based long-term context Session only
Admin Full dashboard, mobile-ready Config files
Plugins Community face packs, pages, workflows None
Hosting Self-hosted, your data Vendor cloud only
Pricing Free forever Per-minute billing

Tech Stack

Layer Technology
Backend Python / Flask
Frontend Vanilla JS (ES modules, no framework)
Canvas Fullscreen iframe + SSE
STT Web Speech API, Deepgram, Groq Whisper
TTS Supertonic, Groq Orpheus, Resemble, Qwen3-TTS
LLM Any provider via OpenClaw gateway
Memory Context engine plugin (markdown knowledge base)
Auth Clerk (optional)
Deploy npm, Docker, Pinokio, VPS/systemd

Plugins

OpenVoiceUI has a plugin system for community-built extensions. Plugins can include animated face packs, canvas pages, workflow dashboards, gateway adapters, or any combination.

Plugin Type Description
BHB Animated Characters Face Pack Animated BigHead Billionaires character avatars with lip-sync, mood expressions, and show lore. By BHaleyart
Hermes Agent Gateway Self-improving AI agent with auto-generated skills, deep memory search, and autonomous tasks. Adds OpenClaw+Hermes hybrid and Hermes-only modes
SEO Platform Canvas Page Full SEO dashboard powered by DataForSEO — keyword research, rank tracking, backlink analysis, site audits, AI visibility, and local SEO
Twenty CRM Canvas Page Connect to a Twenty CRM instance for contact, company, deal, and task management with embedded CRM view and setup wizard
ByteRover Long-Term Memory Context Engine Persistent long-term memory that curates conversation knowledge every turn into a human-readable markdown knowledge base

Build your own. Face packs, canvas pages, workflow dashboards, gateway adapters (template), or STT/TTS adapters (template). See the plugins repo for submission guidelines.


Documentation

Contributing

We welcome contributions — especially plugins. Build a face pack, a canvas page, a workflow dashboard, or a full extension and submit it to the plugins repo. See CONTRIBUTING.md for code contribution guidelines and openvoiceui.com for full documentation.

License

MIT


Website  ·  GitHub  ·  npm  ·  Plugins