Skip to content

speechmatics/speechmatics-academy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Speechmatics Academy

Working examples, integrations, and templates for the Speechmatics SDK's.

Comprehensive collection of code examples demonstrating real-world applications, third-party integrations, and best practices.

Examples β€’ Integrations β€’ Use Cases β€’ Copy-Paste Ready

Browse Examples β€’ Quick Start β€’ Contributing β€’ Portal β€’ Documentation


What is Speechmatics?

Speechmatics is a leading Automatic Speech Recognition (ASR) platform providing highly accurate speech-to-text (STT) and text-to-speech (TTS) APIs. Whether you're building real-time voice assistants, conversational voice AI agents, transcription services, or call center tools, Speechmatics provides the foundation for accurate, scalable speech AI.

Flexible Deployment β€” Cloud SaaS, on-premises, air-gapped environments, or on-device edge deployment.

Advanced Features β€” Domain-specific models, custom dictionaries, speaker diarization, speaker identification, and speaker focus for multi-speaker scenarios and much more.


πŸ“‹ Table of Contents


⚑ Quick Start

Prerequisites

1. Get your API Key portal.speechmatics.com

2. Install the SDK for your use case:

# Choose the package for your use case:

# Batch transcription
pip install speechmatics-batch

# Real-time streaming
pip install speechmatics-rt

# Voice agents
pip install speechmatics-voice

# Text-to-speech
pip install speechmatics-tts
πŸ“¦ Package Details β€’ Click to see what's included in each package

speechmatics-batch - Async batch transcription API

  • Upload audio files for processing
  • Get transcripts with highly accurate timestamps, speakers, entities
  • Supports all audio intelligence features

speechmatics-rt - Real-time WebSocket streaming

  • Stream audio for live transcription
  • Ultra-low latency
  • Partial and final transcripts

speechmatics-voice - Voice agent SDK

  • Build conversational AI applications
  • Speaker diarization and turn detection
  • Optional ML-based smart turn: pip install speechmatics-voice[smart]

speechmatics-tts - Text-to-speech

  • Convert text to natural-sounding speech
  • Multiple voices
  • Streaming and batch modes

SDK Documentation | API Reference

Option 1: Clone and Run

# Clone the repository
git clone https://github.com/speechmatics/speechmatics-academy.git
cd speechmatics-academy

# Navigate to an example
cd basics/01-hello-world/python

# Setup virtual environment
python -m venv venv

# Activate virtual environment (Windows)
venv\Scripts\activate
# On Mac/Linux: source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp ../.env.example .env
# Edit .env and add your SPEECHMATICS_API_KEY

# Run the example
python main.py

Caution

Never hardcode API keys in your source code. Always use environment variables (.env files) or secure secret management systems. Never commit .env to version control - only .env.example with placeholder values.

Option 2: Direct Copy

Use degit to copy individual examples:

# Install degit
npm install -g degit

# Copy an example
degit speechmatics/speechmatics-academy/basics/01-hello-world my-project
cd my-project

πŸ“– Theory

New to speech recognition? Start here to understand the core concepts before diving into code.

Topic Description
Introduction to ASR How automatic speech recognition converts audio to text using acoustic and language models
Introduction to LLMs Understanding large language models and their role in voice AI applications
Prompt Engineering Crafting effective prompts for voice agents and conversational AI
Choosing the Right Model Comparing model types, capabilities, and when to use each

Note

Theory guides are coming soon. In the meantime, check out the "How It Works" sections in each example.


πŸ“š Example Categories

Fundamentals

Fundamental examples for getting started with the Speechmatics SDK.

Example Description Packages Difficulty
Hello World The absolute simplest transcription example Batch Beginner
Batch vs Real-time Learn the difference between API modes Batch RT Beginner
Configuration Guide Common configuration options Batch Beginner
Text-to-Speech Convert text to natural-sounding speech TTS Beginner
Channel Diarization Multi-channel transcription with speaker attribution Voice RT Beginner
Audio Intelligence Extract insights with sentiment, topics, and summaries Batch Intermediate
Multilingual & Translation Transcribe 50+ languages and translate RT Intermediate
Basic Turn Detection Silence-based turn detection with Real-Time SDK RT Intermediate
Intelligent Turn Detection Smart turn detection with Voice SDK presets Voice Intermediate
Speaker ID & Speaker Focus Extract speaker IDs and control which speakers drive conversation Voice Intermediate

Browse all basics examples


Integrations

Third-party framework and service integrations.

Integration Example Features Languages
LiveKit Simple Voice Assistant WebRTC, VAD, diarization, focus speakers, passive filtering, LLM, TTS Python
LiveKit Telephony with Twilio Phone calls via SIP, LiveKit Agents, Krisp noise cancellation, LLM, TTS Python
Pipecat Simple Voice Bot Local audio, VAD, diarization, focus speakers, passive filtering, LLM, TTS, interruptions Python
Pipecat Simple Voice Bot (Web) Browser-based WebRTC, VAD, diarization, focus speakers, passive filtering, LLM, TTS Python
Twilio Outbound Dialer REST API, outbound calls, Media Streams, Speechmatics STT, ElevenLabs TTS Python
VAPI
Voice Assistant Voice AI platform, Speechmatics STT, diarization, custom vocabulary, LLM, TTS Python
Vercel AI Coming Soon Vercel AI SDK integration TypeScript

Browse all integrations


Use Cases

Example applications for specific industries.

Industry Example Features
Healthcare Medical Transcription Real-time, custom medical vocabulary, HIPAA compliance
Media Video Captioning SRT generation, timestamp sync, batch processing
Contact Center Call Analytics Channel diarization, sentiment analysis, topic detection, summarization
Business AI Receptionist LiveKit voice agent, Twilio SIP, Google Calendar booking, function calling
Entertainment Santa Voice Agent LiveKit, ElevenLabs TTS, custom vocabulary, Twilio SIP telephony

Browse all use cases


πŸ”„ Migration Guides

Switching from another speech-to-text provider? Our migration guides help you transition smoothly with feature mappings, code comparisons, and practical examples.

From Guide Features Covered Status
Deepgram Migration Guide Batch, Streaming, Diarization, Custom Vocabulary Available
AssemblyAI Migration Guide Transcription, Audio Intelligence, Real-time Coming Soon
Google Cloud Speech Migration Guide Batch, Streaming, Multi-language Coming Soon
AWS Transcribe Migration Guide Batch Jobs, Streaming, Custom Vocabulary Coming Soon
Azure Speech Migration Guide REST API, WebSocket, Pronunciation Coming Soon

Note

Each migration guide includes:

  • Feature Mapping - Direct equivalent features comparison
  • Code Comparison - Side-by-side before/after examples
  • Migration Checklist - Step-by-step migration process
  • Advantages - Benefits of switching to Speechmatics
  • Working Examples - Complete runnable code

Browse all migration guides


πŸ” Finding Examples

Find examples for the SDK package you installed:

By Package

Package Description Examples
speechmatics-batch Async transcription of audio files Hello World, Batch vs Real-time, Configuration Guide, Audio Intelligence, Multilingual & Translation, Video Captioning, Call Analytics
speechmatics-rt Real-time transcription Batch vs Real-time, Configuration Guide, Multilingual & Translation, Basic Turn Detection, Channel Diarization, Medical Transcription
speechmatics-voice Voice agent with conversation management Intelligent Turn Detection, Speaker ID & Speaker Focus, Twilio Outbound Dialer
speechmatics-tts Text-to-speech synthesis Text-to-Speech

By Feature

Feature Examples
Batch Transcription Hello World, Batch vs Real-time, Configuration Guide, Audio Intelligence, Video Captioning, Call Analytics
Real-time Batch vs Real-time, Configuration Guide, Basic Turn Detection, LiveKit Voice Assistant, Medical Transcription
Turn Detection Basic Turn Detection, Intelligent Turn Detection
Voice Agents Intelligent Turn Detection, Speaker ID & Speaker Focus, LiveKit Voice Assistant, Pipecat Voice Bot, Pipecat Voice Bot (Web), Twilio Outbound Dialer, VAPI Voice Assistant, AI Receptionist, Santa Voice Agent
Speaker Diarization Configuration Guide, Speaker ID & Speaker Focus, Channel Diarization, LiveKit Voice Assistant, Call Analytics
Speaker Identification Speaker ID & Speaker Focus
Sentiment Analysis Audio Intelligence, Call Analytics
Topic Detection Audio Intelligence, Call Analytics
Summarization Audio Intelligence, Call Analytics
Translation Multilingual & Translation
Text-to-Speech Text-to-Speech

By Integration

Integration Examples Documentation Status
LiveKit Simple Voice Assistant, Telephony with Twilio, AI Receptionist, Santa Voice Agent LiveKit Docs Available
Pipecat AI Simple Voice Bot, Simple Voice Bot (Web) Pipecat Docs Available
Twilio Outbound Dialer, Telephony with Twilio, AI Receptionist, Santa Voice Agent Twilio Media Streams Available
VAPI Voice Assistant docs.vapi.ai Available

By Language

Language Examples Status
Python Hello World, Batch vs Real-time, Configuration Guide, Audio Intelligence, Multilingual & Translation, Text-to-Speech, Basic Turn Detection, Intelligent Turn Detection, Speaker ID & Speaker Focus, Channel Diarization, LiveKit Voice Assistant, LiveKit Telephony, Pipecat Voice Bot, Pipecat Voice Bot (Web), Twilio Outbound Dialer, VAPI Voice Assistant, Medical Transcription, Video Captioning, Call Analytics, AI Receptionist, Santa Voice Agent Available
Typescript - Coming Soon
C# - Coming Soon

By Difficulty

Difficulty Examples
Beginner Hello World, Batch vs Real-time, Configuration Guide, Text-to-Speech, Channel Diarization, VAPI Voice Assistant, Video Captioning, Call Analytics
Intermediate Audio Intelligence, Multilingual & Translation, Basic Turn Detection, Intelligent Turn Detection, Speaker ID & Speaker Focus, LiveKit Voice Assistant, Pipecat Voice Bot, Pipecat Voice Bot (Web), Medical Transcription
Advanced LiveKit Telephony, Twilio Outbound Dialer, AI Receptionist, Santa Voice Agent

πŸ“ Example Structure

Every example follows a consistent structure:

example-name/
β”œβ”€β”€ python/
β”‚   β”œβ”€β”€ main.py             # Primary Python implementation
β”‚   β”œβ”€β”€ requirements.txt    # Python dependencies
β”‚   └── .gitignore          # Ignore venv/, __pycache__/, .env
β”œβ”€β”€ assets/                 # Sample files, images, etc.
β”‚   β”œβ”€β”€ sample.wav          # Sample audio (if needed)
β”‚   └── agent.md            # Agent prompt (for voice agents)
β”œβ”€β”€ .env.example            # Environment variables template
└── README.md               # Main documentation (REQUIRED)

Note

Each example includes:

  1. What You'll Learn - Key concepts covered
  2. Prerequisites - Required setup
  3. Quick Start - Step-by-step instructions
  4. How It Works - Step-by-step explanation
  5. Key Features - Demonstrated capabilities
  6. Expected Output - Sample results
  7. Next Steps - Related examples
  8. Troubleshooting - Common issues
  9. Resources - Relevant documentation

🀝 Contributing

We welcome contributions! There are many ways to help:

Ways to Contribute

  1. Add New Examples - Share your implementations
  2. Improve Existing Examples - Fix bugs, add features
  3. Add Language Support - Port examples to other languages
  4. Fix Documentation - Improve README files
  5. Report Issues - Help us improve quality

Adding a New Example

  1. Choose category (basics/integrations/use-cases)
  2. Follow structure (see EXAMPLE_TEMPLATE.md)
  3. Add metadata to docs/index.yaml
  4. Write README using the template
  5. Test thoroughly
  6. Submit PR with clear description

See CONTRIBUTING.md for detailed guidelines.

Quality Standards

Note

All examples must meet these standards:

  • Clean, readable, well-commented Python code
  • Follows SDK best practices
  • Includes proper error handling
  • No hardcoded secrets
  • Complete documentation
  • Tested end-to-end
  • Metadata in index.yaml

πŸ†˜ Support & Resources

Getting Help

Resources

Documentation


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ”— Links


Built with ❀️ by the Speechmatics Community

Twitter β€’ LinkedIn β€’ YouTube