Local TTS for AI coding agents — speak responses aloud
Adds voice output to any AI coding agent (Claude Code, Kiro CLI, Gemini CLI, OpenCode, Crush, Amp). Uses kokoro-onnx (82M params) for fast, natural-sounding speech with ~1.5s latency.
All agents integrate via an MCP server that exposes a speak tool. The Kokoro model loads once and stays warm in memory, eliminating cold-start latency on subsequent calls.
- Auditory channel helps with focus and processing (especially for neurodivergent users)
- Hearing responses spoken reduces "wall of text" overwhelm
- Natural voice quality — not robotic, won't put you off
git clone <your-repo>/speaker.git
cd speaker
./scripts/install.shThis installs the speak-mcp MCP server and configures any detected AI tools.
In any agent session:
| Platform | Enable | Disable |
|---|---|---|
| Claude Code | /speak-start |
/speak-stop |
| Kiro CLI | @speak-start |
@speak-stop |
| Gemini CLI | @speak-start |
@speak-stop |
| OpenCode | @speak-start |
@speak-stop |
| Amp | @speak-start |
@speak-stop |
Voice is off by default. When enabled, the agent calls the speak MCP tool with its full response (excluding code blocks).
The speak-mcp entry point runs a FastMCP server exposing three tools: speak, list_voices, and speaker_status. The Kokoro model stays warm in memory — first call loads the model (~2s), subsequent calls have ~200ms overhead.
| Field | Value |
|---|---|
| Name | speak |
| Parameters | text: str, voice: str = "am_michael", speed: float = 1.0 |
| Returns | str — confirmation or error message |
Add to your agent's MCP config:
{
"mcpServers": {
"speaker": {
"command": "speak-mcp",
"args": []
}
}
}Then add to the agent's prompt:
The user can toggle voice with @speak-start and @speak-stop.
When enabled, call the speak tool with your full response text.
Exclude code blocks from spoken text.
See docs/agent-install.md for platform-specific configs.
kokoro-onnx voices follow the pattern {accent}{gender}_{name}:
| Voice | Description |
|---|---|
am_michael |
American male (default) — clear, natural |
af_heart |
American female — warm tone |
af_bella |
American female — bright |
am_adam |
American male — deeper |
bf_emma |
British female |
Pass speed as a tool parameter (0.5 = slow, 2.0 = fast, default 1.0).
- Agent calls the
speakMCP tool with response text - MCP server (
speak-mcp) synthesizes audio via kokoro-onnx - Audio resampled 24kHz->48kHz and played via sounddevice
- Model stays warm in memory for low-latency subsequent calls
Agent (Claude/Kiro/Gemini/...)
|
| MCP protocol (stdio)
v
speak-mcp (FastMCP server)
|
| SpeakerEngine (in-process)
v
kokoro-onnx -> sounddevice -> audio out
- Python 3.10+
- uv for installation
- macOS or Linux (kokoro-onnx runs on CPU via ONNX Runtime)
MIT
🔍 Explore this project — AI-generated overviews via Google NotebookLM
| 🎧 Listen to the Audio Overview | Two AI hosts discuss the project — great for commutes |
| 📊 Browse the Slide Deck | Presentation-ready project overview |
Generated by notebooklm-repo-artefacts