Skip to content

NetDevAutomate/Agent-Speaker

Repository files navigation

Speaker

Local TTS for AI coding agents — speak responses aloud

Adds voice output to any AI coding agent (Claude Code, Kiro CLI, Gemini CLI, OpenCode, Crush, Amp). Uses kokoro-onnx (82M params) for fast, natural-sounding speech with ~1.5s latency.

All agents integrate via an MCP server that exposes a speak tool. The Kokoro model loads once and stays warm in memory, eliminating cold-start latency on subsequent calls.

Why?

  • Auditory channel helps with focus and processing (especially for neurodivergent users)
  • Hearing responses spoken reduces "wall of text" overwhelm
  • Natural voice quality — not robotic, won't put you off

Quick Start

git clone <your-repo>/speaker.git
cd speaker
./scripts/install.sh

This installs the speak-mcp MCP server and configures any detected AI tools.

Usage

In any agent session:

Platform Enable Disable
Claude Code /speak-start /speak-stop
Kiro CLI @speak-start @speak-stop
Gemini CLI @speak-start @speak-stop
OpenCode @speak-start @speak-stop
Amp @speak-start @speak-stop

Voice is off by default. When enabled, the agent calls the speak MCP tool with its full response (excluding code blocks).

MCP Server

The speak-mcp entry point runs a FastMCP server exposing three tools: speak, list_voices, and speaker_status. The Kokoro model stays warm in memory — first call loads the model (~2s), subsequent calls have ~200ms overhead.

Tool Schema

Field Value
Name speak
Parameters text: str, voice: str = "am_michael", speed: float = 1.0
Returns str — confirmation or error message

Adding to Any Agent

Add to your agent's MCP config:

{
  "mcpServers": {
    "speaker": {
      "command": "speak-mcp",
      "args": []
    }
  }
}

Then add to the agent's prompt:

The user can toggle voice with @speak-start and @speak-stop.
When enabled, call the speak tool with your full response text.
Exclude code blocks from spoken text.

See docs/agent-install.md for platform-specific configs.

Voices

kokoro-onnx voices follow the pattern {accent}{gender}_{name}:

Voice Description
am_michael American male (default) — clear, natural
af_heart American female — warm tone
af_bella American female — bright
am_adam American male — deeper
bf_emma British female

Speed

Pass speed as a tool parameter (0.5 = slow, 2.0 = fast, default 1.0).

How It Works

  1. Agent calls the speak MCP tool with response text
  2. MCP server (speak-mcp) synthesizes audio via kokoro-onnx
  3. Audio resampled 24kHz->48kHz and played via sounddevice
  4. Model stays warm in memory for low-latency subsequent calls

Architecture

Agent (Claude/Kiro/Gemini/...)
  |
  | MCP protocol (stdio)
  v
speak-mcp (FastMCP server)
  |
  | SpeakerEngine (in-process)
  v
kokoro-onnx -> sounddevice -> audio out

Requirements

  • Python 3.10+
  • uv for installation
  • macOS or Linux (kokoro-onnx runs on CPU via ONNX Runtime)

License

MIT

Generated Artefacts

🔍 Explore this project — AI-generated overviews via Google NotebookLM

🎧 Listen to the Audio Overview Two AI hosts discuss the project — great for commutes
📊 Browse the Slide Deck Presentation-ready project overview

Generated by notebooklm-repo-artefacts

About

A small mcp/tool for agents (Kiro CLI, Claude Code and others...) to speak responses using a wrapper for kokoro-onnx (https://github.com/thewh1teagle/kokoro-onnx) — an 82M parameter ONNX TTS model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors