Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitattributes

This file was deleted.

38 changes: 19 additions & 19 deletions .github/RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ The Speechmatics Python SDK repository contains two separate packages:

- `speechmatics-rt` - Real-Time API Client
- `speechmatics-batch` - Batch API Client
- `speechmatics-flow` - Flow API Client
- `speechmatics-voice` - Voice Agent API Client
- `speechmatics-tts` - TTS API Client

Each package is released independently with its own versioning and release workflow.

Expand Down Expand Up @@ -91,55 +91,55 @@ To release a new version of the Batch SDK:
- Update GitHub release notes
- Announce the release

### 3. Flow SDK Release
### 3. Voice Agent SDK Release

To release a new version of the Flow SDK:
To release a new version of the Voice Agent SDK:

1. **Create a Release Tag**

```bash
git tag flow/v1.0.0
git push origin flow/v1.0.0
git tag voice/v1.0.0
git push origin voice/v1.0.0
```

2. **Automated Workflow**
The `release-flow.yaml` workflow will automatically:
The `release-voice.yaml` workflow will automatically:

- Extract version from tag (e.g., `flow/v1.0.0` → `1.0.0`)
- Extract version from tag (e.g., `voice/v1.0.0` → `1.0.0`)
- Run comprehensive tests across Python versions
- Update version in `sdk/flow/speechmatics/flow/__init__.py`
- Update version in `sdk/voice/speechmatics/voice/__init__.py`
- Build the package
- Publish to PyPI

3. **Manual Steps After Release**
- Verify the package is available on PyPI
- Test installation: `pip install speechmatics-flow==1.0.0`
- Test installation: `pip install speechmatics-voice==1.0.0`
- Update GitHub release notes
- Announce the release

### 4. Voice Agent SDK Release
### 4. TTS SDK Release

To release a new version of the Voice Agent SDK:
To release a new version of the TTS SDK:

1. **Create a Release Tag**

```bash
git tag voice/v1.0.0
git push origin voice/v1.0.0
git tag tts/v1.0.0
git push origin tts/v1.0.0
```

2. **Automated Workflow**
The `release-voice.yaml` workflow will automatically:
The `release-tts.yaml` workflow will automatically:

- Extract version from tag (e.g., `voice/v1.0.0` → `1.0.0`)
- Extract version from tag (e.g., `tts/v1.0.0` → `1.0.0`)
- Run comprehensive tests across Python versions
- Update version in `sdk/voice/speechmatics/voice/__init__.py`
- Update version in `sdk/tts/speechmatics/tts/__init__.py`
- Build the package
- Publish to PyPI

3. **Manual Steps After Release**
- Verify the package is available on PyPI
- Test installation: `pip install speechmatics-voice==1.0.0`
- Test installation: `pip install speechmatics-tts==1.0.0`
- Update GitHub release notes
- Announce the release

Expand All @@ -162,8 +162,8 @@ Both packages follow semantic versioning (SemVer):

- RT SDK: `rt/v{version}` (e.g., `rt/v1.0.0`)
- Batch SDK: `batch/v{version}` (e.g., `batch/v1.0.0`)
- Flow SDK: `flow/v{version}` (e.g., `flow/v1.0.0`)
- Voice Agent SDK: `voice/v{version}` (e.g., `voice/v1.0.0`)
- TTS SDK: `tts/v{version}` (e.g., `tts/v1.0.0`)

## Environment Setup

Expand All @@ -173,8 +173,8 @@ Both packages are published to PyPI using GitHub Actions with OpenID Connect (OI

- RT SDK: Uses `pypi-rt` environment
- Batch SDK: Uses `pypi-batch` environment
- Flow SDK: Uses `pypi-flow` environment
- Voice Agent SDK: Uses `pypi-voice` environment
- TTS SDK: Uses `pypi-tts` environment

### Required Secrets

Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@ cython_debug/

# Ruff stuff:
.ruff_cache/
**/output.wav

# PyPI configuration file
.pypirc
Expand Down
23 changes: 5 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Speechmatics Python SDK

[![License](https://img.shields.io/badge/license-MIT-yellow.svg)](https://github.com/speechmatics/speechmatics-python-sdk/blob/master/LICENSE)
[![PythonSupport](https://img.shields.io/badge/Python-3.9%2B-green)](https://www.python.org/)

A collection of Python clients for Speechmatics APIs packaged as separate installable packages. These packages replace the old [speechmatics-python](https://pypi.org/project/speechmatics-python) package, which will be deprecated soon.

Expand All @@ -10,31 +11,23 @@ Each client targets a specific Speechmatics API (e.g. real-time, batch transcrip

This repository contains the following packages:

### (Beta) Real-Time Client (`speechmatics-rt`)
### Real-Time Client (`speechmatics-rt`)

A Python client for Speechmatics Real-Time API.

```bash
pip install speechmatics-rt
```

### (Beta) Batch Client (`speechmatics-batch`)
### Batch Client (`speechmatics-batch`)

An async Python client for Speechmatics Batch API.

```bash
pip install speechmatics-batch
```

### (Beta) Flow Client (`speechmatics-flow`)

An async Python client for Speechmatics Flow API.

```bash
pip install speechmatics-flow
```

### (Beta) Voice Agent Client (`speechmatics-voice`)
### Voice Agent Client (`speechmatics-voice`)

A Voice Agent Python client for Speechmatics Real-Time API.

Expand All @@ -46,7 +39,7 @@ pip install speechmatics-voice
pip install speechmatics-voice[smart]
```

### (Beta) TTS Client (`speechmatics-tts`)
### TTS Client (`speechmatics-tts`)

An async Python client for Speechmatics TTS API.

Expand All @@ -69,10 +62,6 @@ speechmatics-python-sdk/
│ │ ├── pyproject.toml
│ │ └── README.md
│ │
│ ├── flow/
│ │ ├── pyproject.toml
│ │ └── README.md
│ │
│ ├── voice/
│ │ ├── pyproject.toml
│ │ └── README.md
Expand All @@ -84,7 +73,6 @@ speechmatics-python-sdk/
├── tests/
│ ├── batch/
│ ├── rt/
│ ├── flow/
│ ├── voice/
│ └── tts/
Expand Down Expand Up @@ -126,7 +114,6 @@ Each package can be installed separately:
```bash
pip install speechmatics-rt
pip install speechmatics-batch
pip install speechmatics-flow
pip install speechmatics-voice[smart]
pip install speechmatics-tts
```
Expand Down
43 changes: 43 additions & 0 deletions examples/tts/tts_autoplay/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Speechmatics TTS Async Streaming API Client

This example shows how to use the Speechmatics TTS API to generate audio from text and autoplay it using sounddevice through the systems default audio output device.
You must have an audio output device configured on their system for this example to work.
## How it Works

There are two main components in this example, an audio generator and an audio player. These components are run concurrently using asyncio as tasks, ochestrated by the main() function, to generate and play audio in real-time.
### audio_generator()

This producer function connects to the Speechmatics TTS API using the AsyncClient. It calls client.generate() with your text, the voice you want to use, and the output format - RAW_PCM_16000 in this example.
The code iterates over the audio data as it is streamed in chunks (iter_chunked), and accumulates in a bytearray buffer.
The while len(buffer) >= 2 loop reads each audio sample containing 2 bytes, from the buffer, and converts it to a numpy array of int-16 values, which is then put into the audio_queue.
The processed 2 byte sample is then removed from the front of the buffer.
END_OF_STREAM is used as a sentinel value to signal the end of the audio stream, with no more audio data to process.
If an error occurs during audio generation, the END_OF_STREAM sentinel value is still put into the queue to signal the end of the audio stream to prevent the consumer, audio_player(), from getting stuck in an infinite loop, and raises the exception.
### audio_player()

This consumer function initialises a sounddevice OutputStream, which is responsible for streaming the audio data to the default audio output device. Within the outputstream, the while True loop means there is continous processing of the incoming audio data.
sample = await asyncio.wait_for(play_queue.get(), timeout=0.1) fetches the next sample from the queue, or waits for 0.1 seconds if the queue is empty.
If the sample is END_OF_STREAM, the while loop breaks and the audio player exits.
If the sample is not END_OF_STREAM, it is converted to a numpy array of int-16 values and written to the audio output device using the sounddevice OutputStream.
play_queue.task_done() is called to signal that the sample has been processed.
If an error occurs during audio playback, the END_OF_STREAM sentinel value is still put into the queue to signal the end of the audio stream to prevent the audio_player() from getting stuck in an infinite loop, and raises the exception.

## Installation

```bash
pip install -r requirements.txt
```

## Usage

To run the example, use the following command:

```bash
python tts_stream_example.py
```

## Environment Variables

The client supports the following environment variables:

- `SPEECHMATICS_API_KEY`: Your Speechmatics API key
3 changes: 3 additions & 0 deletions examples/tts/tts_autoplay/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
sounddevice>=0.4.6
numpy>=1.24.3
speechmatics-tts>=0.1.0
119 changes: 119 additions & 0 deletions examples/tts/tts_autoplay/tts_stream_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
import asyncio
import sounddevice as sd
import numpy as np
from speechmatics.tts import AsyncClient, Voice, OutputFormat

# Configuration
TEXT = "Welcome to the future of audio generation from text! This audio is a demo of the async streaming Speechmatics' text to speech API."
VOICE = Voice.JACK
OUTPUT_FORMAT = OutputFormat.RAW_PCM_16000

# Audio Parameters
SAMPLE_RATE = 16000 #Hz
SAMPLE_WIDTH = 2 # 16-bit audio
CHANNELS = 1 # Mono audio
CHUNK_SIZE = 2048 # Size of audio chunks
BUFFER_SIZE = 4096 # Size of buffer

# Sentinel value to signal end of stream
END_OF_STREAM = None


# Core Async Functions

# 1. Producer: Generates audio and puts chunks into the queue:

async def audio_generator(audio_queue: asyncio.Queue, text: str, voice: str, output_format: str) -> None:
try:
async with AsyncClient() as client, await client.generate(
text=text,
voice=voice,
output_format=output_format
) as response:
buffer=bytearray()
async for chunk in response.content.iter_chunked(BUFFER_SIZE):
if not chunk:
continue
buffer.extend(chunk)

# Process complete frames (2 bytes per sample for 16-bit audio)
# Convert little-endian 16-bit signed int to np.int-16
while len(buffer) >= 2:
sample = int.from_bytes(buffer[:2], byteorder='little', signed=True)
await audio_queue.put(sample)
buffer = buffer[2:]

await audio_queue.put(END_OF_STREAM)
print("Audio generated and put into queue.")

except Exception as e:
print(f"[{'Generator'}] An error occurred in the audio generator: {e}")
await audio_queue.put(END_OF_STREAM)
raise

# 2. Consumer: Read audio data from queue and play it in real-time using sounddevice.
async def audio_player(play_queue: asyncio.Queue) -> None:
try:
with sd.OutputStream(
samplerate=SAMPLE_RATE,
channels=CHANNELS,
dtype='int16', # 16-bit PCM
blocksize=CHUNK_SIZE,
latency='high',
) as stream:
buffer=[]
while True:
try:
sample = await asyncio.wait_for(play_queue.get(), timeout=0.1)
if sample is END_OF_STREAM:
if buffer:
audio_data=np.array(buffer, dtype=np.int16)
stream.write(audio_data)
buffer=[]
break

buffer.append(sample)
if len(buffer) >= CHUNK_SIZE:
audio_data=np.array(buffer[:CHUNK_SIZE], dtype=np.int16)
stream.write(audio_data)
buffer=buffer[CHUNK_SIZE:]

play_queue.task_done()

except asyncio.TimeoutError:
if buffer:
audio_data=np.array(buffer, dtype=np.int16)
stream.write(audio_data)
buffer=[]
continue

except Exception as e:
print(f"[{'Player'}] An error occurred playing audio chunk {e}")
raise

except Exception as e:
print(f"[{'Player'}] An error occurred in the audio player: {e}")
raise
finally:
sd.stop()

# 3. Main Function: Orchestrate audio generation and audio stream
async def main() -> None:
play_queue = asyncio.Queue()

# Create tasks
tasks = [
asyncio.create_task(audio_generator(play_queue, TEXT, VOICE, OUTPUT_FORMAT)),
asyncio.create_task(audio_player(play_queue))
]

try:
await asyncio.gather(*tasks)

except Exception as e:
for task in tasks:
task.cancel()
await asyncio.gather(*tasks, return_exceptions=True)

if __name__ == "__main__":
asyncio.run(main())
Loading
Loading