Working examples, integrations, and templates for the Speechmatics SDK's.
Comprehensive collection of code examples demonstrating real-world applications, third-party integrations, and best practices.
Examples β’ Integrations β’ Use Cases β’ Copy-Paste Ready
Browse Examples β’ Quick Start β’ Contributing β’ Portal β’ Documentation
Speechmatics is a leading Automatic Speech Recognition (ASR) platform providing highly accurate speech-to-text (STT) and text-to-speech (TTS) APIs. Whether you're building real-time voice assistants, conversational voice AI agents, transcription services, or call center tools, Speechmatics provides the foundation for accurate, scalable speech AI.
Flexible Deployment β Cloud SaaS, on-premises, air-gapped environments, or on-device edge deployment.
Advanced Features β Domain-specific models, custom dictionaries, speaker diarization, speaker identification, and speaker focus for multi-speaker scenarios and much more.
- What is Speechmatics?
- Quick Start
- Theory
- Example Categories
- Migration Guides
- Finding Examples
- Example Structure
- Contributing
- Support & Resources
1. Get your API Key portal.speechmatics.com
2. Install the SDK for your use case:
# Choose the package for your use case:
# Batch transcription
pip install speechmatics-batch
# Real-time streaming
pip install speechmatics-rt
# Voice agents
pip install speechmatics-voice
# Text-to-speech
pip install speechmatics-ttsπ¦ Package Details β’ Click to see what's included in each package
speechmatics-batch - Async batch transcription API
- Upload audio files for processing
- Get transcripts with highly accurate timestamps, speakers, entities
- Supports all audio intelligence features
speechmatics-rt - Real-time WebSocket streaming
- Stream audio for live transcription
- Ultra-low latency
- Partial and final transcripts
speechmatics-voice - Voice agent SDK
- Build conversational AI applications
- Speaker diarization and turn detection
- Optional ML-based smart turn:
pip install speechmatics-voice[smart]
speechmatics-tts - Text-to-speech
- Convert text to natural-sounding speech
- Multiple voices
- Streaming and batch modes
SDK Documentation | API Reference
# Clone the repository
git clone https://github.com/speechmatics/speechmatics-academy.git
cd speechmatics-academy
# Navigate to an example
cd basics/01-hello-world/python
# Setup virtual environment
python -m venv venv
# Activate virtual environment (Windows)
venv\Scripts\activate
# On Mac/Linux: source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Set up environment variables
cp ../.env.example .env
# Edit .env and add your SPEECHMATICS_API_KEY
# Run the example
python main.pyCaution
Never hardcode API keys in your source code. Always use environment variables (.env files) or secure secret management systems. Never commit .env to version control - only .env.example with placeholder values.
Use degit to copy individual examples:
# Install degit
npm install -g degit
# Copy an example
degit speechmatics/speechmatics-academy/basics/01-hello-world my-project
cd my-projectNew to speech recognition? Start here to understand the core concepts before diving into code.
| Topic | Description |
|---|---|
| Introduction to ASR | How automatic speech recognition converts audio to text using acoustic and language models |
| Introduction to LLMs | Understanding large language models and their role in voice AI applications |
| Prompt Engineering | Crafting effective prompts for voice agents and conversational AI |
| Choosing the Right Model | Comparing model types, capabilities, and when to use each |
Note
Theory guides are coming soon. In the meantime, check out the "How It Works" sections in each example.
Fundamental examples for getting started with the Speechmatics SDK.
| Example | Description | Packages | Difficulty |
|---|---|---|---|
| Hello World | The absolute simplest transcription example | Batch |
Beginner |
| Batch vs Real-time | Learn the difference between API modes | Batch RT |
Beginner |
| Configuration Guide | Common configuration options | Batch |
Beginner |
| Text-to-Speech | Convert text to natural-sounding speech | TTS |
Beginner |
| Channel Diarization | Multi-channel transcription with speaker attribution | Voice RT |
Beginner |
| Audio Intelligence | Extract insights with sentiment, topics, and summaries | Batch |
Intermediate |
| Multilingual & Translation | Transcribe 50+ languages and translate | RT |
Intermediate |
| Basic Turn Detection | Silence-based turn detection with Real-Time SDK | RT |
Intermediate |
| Intelligent Turn Detection | Smart turn detection with Voice SDK presets | Voice |
Intermediate |
| Speaker ID & Speaker Focus | Extract speaker IDs and control which speakers drive conversation | Voice |
Intermediate |
Third-party framework and service integrations.
| Integration | Example | Features | Languages |
|---|---|---|---|
| Simple Voice Assistant | WebRTC, VAD, diarization, focus speakers, passive filtering, LLM, TTS | Python | |
| Telephony with Twilio | Phone calls via SIP, LiveKit Agents, Krisp noise cancellation, LLM, TTS | Python | |
| Simple Voice Bot | Local audio, VAD, diarization, focus speakers, passive filtering, LLM, TTS, interruptions | Python | |
| Simple Voice Bot (Web) | Browser-based WebRTC, VAD, diarization, focus speakers, passive filtering, LLM, TTS | Python | |
| Outbound Dialer | REST API, outbound calls, Media Streams, Speechmatics STT, ElevenLabs TTS | Python | |
![]() |
Voice Assistant | Voice AI platform, Speechmatics STT, diarization, custom vocabulary, LLM, TTS | Python |
| Coming Soon | Vercel AI SDK integration | TypeScript |
Example applications for specific industries.
| Industry | Example | Features |
|---|---|---|
| Healthcare | Medical Transcription | Real-time, custom medical vocabulary, HIPAA compliance |
| Media | Video Captioning | SRT generation, timestamp sync, batch processing |
| Contact Center | Call Analytics | Channel diarization, sentiment analysis, topic detection, summarization |
| Business | AI Receptionist | LiveKit voice agent, Twilio SIP, Google Calendar booking, function calling |
| Entertainment | Santa Voice Agent | LiveKit, ElevenLabs TTS, custom vocabulary, Twilio SIP telephony |
Switching from another speech-to-text provider? Our migration guides help you transition smoothly with feature mappings, code comparisons, and practical examples.
| From | Guide | Features Covered | Status |
|---|---|---|---|
| Deepgram | Migration Guide | Batch, Streaming, Diarization, Custom Vocabulary | Available |
| AssemblyAI | Migration Guide | Transcription, Audio Intelligence, Real-time | Coming Soon |
| Google Cloud Speech | Migration Guide | Batch, Streaming, Multi-language | Coming Soon |
| AWS Transcribe | Migration Guide | Batch Jobs, Streaming, Custom Vocabulary | Coming Soon |
| Azure Speech | Migration Guide | REST API, WebSocket, Pronunciation | Coming Soon |
Note
Each migration guide includes:
- Feature Mapping - Direct equivalent features comparison
- Code Comparison - Side-by-side before/after examples
- Migration Checklist - Step-by-step migration process
- Advantages - Benefits of switching to Speechmatics
- Working Examples - Complete runnable code
Find examples for the SDK package you installed:
| Package | Description | Examples |
|---|---|---|
speechmatics-batch |
Async transcription of audio files | Hello World, Batch vs Real-time, Configuration Guide, Audio Intelligence, Multilingual & Translation, Video Captioning, Call Analytics |
speechmatics-rt |
Real-time transcription | Batch vs Real-time, Configuration Guide, Multilingual & Translation, Basic Turn Detection, Channel Diarization, Medical Transcription |
speechmatics-voice |
Voice agent with conversation management | Intelligent Turn Detection, Speaker ID & Speaker Focus, Twilio Outbound Dialer |
speechmatics-tts |
Text-to-speech synthesis | Text-to-Speech |
| Feature | Examples |
|---|---|
| Batch Transcription | Hello World, Batch vs Real-time, Configuration Guide, Audio Intelligence, Video Captioning, Call Analytics |
| Real-time | Batch vs Real-time, Configuration Guide, Basic Turn Detection, LiveKit Voice Assistant, Medical Transcription |
| Turn Detection | Basic Turn Detection, Intelligent Turn Detection |
| Voice Agents | Intelligent Turn Detection, Speaker ID & Speaker Focus, LiveKit Voice Assistant, Pipecat Voice Bot, Pipecat Voice Bot (Web), Twilio Outbound Dialer, VAPI Voice Assistant, AI Receptionist, Santa Voice Agent |
| Speaker Diarization | Configuration Guide, Speaker ID & Speaker Focus, Channel Diarization, LiveKit Voice Assistant, Call Analytics |
| Speaker Identification | Speaker ID & Speaker Focus |
| Sentiment Analysis | Audio Intelligence, Call Analytics |
| Topic Detection | Audio Intelligence, Call Analytics |
| Summarization | Audio Intelligence, Call Analytics |
| Translation | Multilingual & Translation |
| Text-to-Speech | Text-to-Speech |
| Integration | Examples | Documentation | Status |
|---|---|---|---|
| LiveKit | Simple Voice Assistant, Telephony with Twilio, AI Receptionist, Santa Voice Agent | LiveKit Docs | Available |
| Pipecat AI | Simple Voice Bot, Simple Voice Bot (Web) | Pipecat Docs | Available |
| Twilio | Outbound Dialer, Telephony with Twilio, AI Receptionist, Santa Voice Agent | Twilio Media Streams | Available |
| VAPI | Voice Assistant | docs.vapi.ai | Available |
| Language | Examples | Status |
|---|---|---|
| Python | Hello World, Batch vs Real-time, Configuration Guide, Audio Intelligence, Multilingual & Translation, Text-to-Speech, Basic Turn Detection, Intelligent Turn Detection, Speaker ID & Speaker Focus, Channel Diarization, LiveKit Voice Assistant, LiveKit Telephony, Pipecat Voice Bot, Pipecat Voice Bot (Web), Twilio Outbound Dialer, VAPI Voice Assistant, Medical Transcription, Video Captioning, Call Analytics, AI Receptionist, Santa Voice Agent | Available |
| Typescript | - | Coming Soon |
| C# | - | Coming Soon |
Every example follows a consistent structure:
example-name/
βββ python/
β βββ main.py # Primary Python implementation
β βββ requirements.txt # Python dependencies
β βββ .gitignore # Ignore venv/, __pycache__/, .env
βββ assets/ # Sample files, images, etc.
β βββ sample.wav # Sample audio (if needed)
β βββ agent.md # Agent prompt (for voice agents)
βββ .env.example # Environment variables template
βββ README.md # Main documentation (REQUIRED)
Note
Each example includes:
- What You'll Learn - Key concepts covered
- Prerequisites - Required setup
- Quick Start - Step-by-step instructions
- How It Works - Step-by-step explanation
- Key Features - Demonstrated capabilities
- Expected Output - Sample results
- Next Steps - Related examples
- Troubleshooting - Common issues
- Resources - Relevant documentation
We welcome contributions! There are many ways to help:
- Add New Examples - Share your implementations
- Improve Existing Examples - Fix bugs, add features
- Add Language Support - Port examples to other languages
- Fix Documentation - Improve README files
- Report Issues - Help us improve quality
- Choose category (basics/integrations/use-cases)
- Follow structure (see EXAMPLE_TEMPLATE.md)
- Add metadata to docs/index.yaml
- Write README using the template
- Test thoroughly
- Submit PR with clear description
See CONTRIBUTING.md for detailed guidelines.
Note
All examples must meet these standards:
- Clean, readable, well-commented Python code
- Follows SDK best practices
- Includes proper error handling
- No hardcoded secrets
- Complete documentation
- Tested end-to-end
- Metadata in index.yaml
- GitHub Issues: Report bugs or request examples
- GitHub Community Discussions: Ask questions, share projects
- Email Support: devrel@speechmatics.com
- SDK Repository: speechmatics-python-sdk
- API Documentation: docs.speechmatics.com
- Developer Portal: portal.speechmatics.com
- Blog: speechmatics.com/blog
- Example Template - Template for new examples
- Contributing Guide - How to contribute
This project is licensed under the MIT License - see the LICENSE file for details.
- SDK: github.com/speechmatics/speechmatics-python-sdk
- Docs: docs.speechmatics.com
- Portal: portal.speechmatics.com


