feat: add OpenAI-compatible streaming server with WebSocket support #1751

coding-crying · 2025-12-27T13:13:08Z

Summary

Adds an OpenAI-compatible TTS server with both HTTP and WebSocket endpoints for real-time streaming TTS applications.

Endpoints

POST /v1/audio/speech - HTTP streaming (OpenAI API compatible)
WS /v1/audio/speech/stream - Bidirectional WebSocket streaming

WebSocket Protocol

Connect to ws://host:port/v1/audio/speech/stream
Send config: {"voice": "speaker.wav", "speed": 1.0}
Send text chunks: {"text": "Hello"}
Send end signal: {"event": "end"}
Receive binary PCM audio (int16, 24kHz, mono)
Server closes connection when complete

Files Added/Modified

openai_server.py: FastAPI server with HTTP + WebSocket TTS endpoints
run_openai_server.sh: Launch script with venv activation
cosyvoice/llm/llm.py: Add inference_bistream to CosyVoice3LM class for streaming support

Usage

bash run_openai_server.sh
# Server runs on http://0.0.0.0:50000

Use Case

This enables CosyVoice to be used as a drop-in TTS backend for:

LiveKit voice agents
Real-time conversational AI
Any application needing streaming TTS with WebSocket support

Tested with CosyVoice3-0.5B model.

Adds an OpenAI-compatible TTS server with both HTTP and WebSocket endpoints: ## Endpoints - `POST /v1/audio/speech` - HTTP streaming (OpenAI-compatible) - `WS /v1/audio/speech/stream` - Bidirectional WebSocket streaming ## WebSocket Protocol 1. Connect to `ws://host:port/v1/audio/speech/stream` 2. Send config: `{"voice": "speaker.wav", "speed": 1.0}` 3. Send text chunks: `{"text": "Hello"}` 4. Send end signal: `{"event": "end"}` 5. Receive binary PCM audio (int16, 24kHz, mono) 6. Server closes connection when complete ## Changes - `openai_server.py`: FastAPI server with HTTP + WebSocket TTS endpoints - `run_openai_server.sh`: Launch script with venv activation - `cosyvoice/llm/llm.py`: Add `inference_bistream` to CosyVoice3LM for streaming support ## Usage ```bash bash run_openai_server.sh # Server runs on http://0.0.0.0:50000 ``` Tested with CosyVoice3-0.5B model for real-time voice agent applications.

LongQIByte · 2026-01-04T12:41:42Z

Thanks for the PR! I’m not a maintainer, but I’ll pull the branch and try it out locally. I’ll share feedback if I find anything.

coding-crying · 2026-01-04T19:17:47Z

Thanks for the PR! I’m not a maintainer, but I’ll pull the branch and try it out locally. I’ll share feedback if I find anything.

After more testing, I was having some odd chunking splits on my 3090, but I think it may have been because my 3090 had an RTF above 1. If you have a beefier GPU let me know if this issue persists. :) -will

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add OpenAI-compatible streaming server with WebSocket support #1751

feat: add OpenAI-compatible streaming server with WebSocket support #1751

Uh oh!

coding-crying commented Dec 27, 2025

Uh oh!

LongQIByte commented Jan 4, 2026

Uh oh!

coding-crying commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add OpenAI-compatible streaming server with WebSocket support #1751

Are you sure you want to change the base?

feat: add OpenAI-compatible streaming server with WebSocket support #1751

Uh oh!

Conversation

coding-crying commented Dec 27, 2025

Summary

Endpoints

WebSocket Protocol

Files Added/Modified

Usage

Use Case

Uh oh!

LongQIByte commented Jan 4, 2026

Uh oh!

coding-crying commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants