Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,13 @@ REDIS_HOST=127.0.0.1
REDIS_PORT=6789
REDIS_PASSWORD=

# STT Provider: "gladia" (default)
# STT Provider: "gladia" (default) or "openai"
# STT_PROVIDER=gladia

# =============================================================================
# --- Gladia STT (STT_PROVIDER=gladia) ---
# =============================================================================

GLADIA_API_KEY=
# The following env vars serves as a translation locale mapper between
# <ISO 639-1> (Gladia) and <ISO 639-1>-<ISO 3166-1> (BBB) locale formats.
Expand Down Expand Up @@ -58,3 +62,17 @@ GLADIA_TRANSLATION_LANG_MAP="de:de-DE,en:en-US,es:es-ES,fr:fr-FR,hi:hi-IN,it:it-

#GLADIA_PRE_PROCESSING_AUDIO_ENHANCER=false
#GLADIA_PRE_PROCESSING_SPEECH_THRESHOLD=0.5

# =============================================================================
# --- OpenAI STT (STT_PROVIDER=openai) ---
# Supports the official OpenAI API and any OpenAI-compatible endpoint.
# =============================================================================

# OpenAI API key (required)
#OPENAI_API_KEY=

# Transcription model (default: gpt-4o-transcribe; use "whisper-1" for classic Whisper)
#OPENAI_STT_MODEL=gpt-4o-transcribe

# Base URL override — set this to use a compatible provider (e.g. a local Whisper server)
#OPENAI_BASE_URL=
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Final releases will consolidate all intermediate changes in chronological order.

## UNRELEASED

* feat(openai): add OpenAI STT provider support (official and compatible endpoints)
* feat: add GladiaSttAgent provider and factory
* refactor: move GladiaConfig to providers package, delete old agent module
* feat(tests): add unit and integration tests with pytest
Expand Down
50 changes: 43 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,18 @@
This application provides Speech-to-Text (STT) for BigBlueButton meetings using LiveKit
as their audio bridge.

Initially, the only supported STT engine is Gladia through the official [LiveKit Gladia Plugin](https://docs.livekit.io/agents/integrations/stt/gladia/).
Supported STT engines:

It'll be expanded in the future to support other STT plugins from the LiveKit Agents
ecosystem.
- **Gladia** — via the official [LiveKit Gladia plugin](https://docs.livekit.io/agents/integrations/stt/gladia/) (default)
- **OpenAI** — via the [LiveKit OpenAI plugin](https://docs.livekit.io/agents/models/stt/openai/); supports the official OpenAI API and any OpenAI-compatible endpoint

## Getting Started

### Environment prerequisites

- Python 3.10+
- A LiveKit instance
- A Gladia API key
- A Gladia API key **or** an OpenAI API key (depending on your chosen STT provider)
- uv:
- See installation instructions: https://docs.astral.sh/uv/getting-started/installation/

Expand Down Expand Up @@ -48,13 +48,17 @@ ecosystem.
LIVEKIT_API_KEY=...
LIVEKIT_API_SECRET=...

# Gladia API Key
# For Gladia (default provider):
GLADIA_API_KEY=...

# For OpenAI (set STT_PROVIDER=openai):
# STT_PROVIDER=openai
# OPENAI_API_KEY=...
```

Feel free to check `.env.example` for any other configurations of interest.

**All options ingested by the Gladia STT plugin are exposed via env vars**.
**All options ingested by the Gladia and OpenAI STT plugins are exposed via env vars**.

### Running

Expand Down Expand Up @@ -98,6 +102,30 @@ docker run --network host --rm -it --env-file .env bbb-livekit-stt

Pre-built images are available via GitHub Container Registry as well.

### OpenAI STT provider

Set `STT_PROVIDER=openai` to use OpenAI STT instead of Gladia.

**Official OpenAI API:**

```bash
STT_PROVIDER=openai
OPENAI_API_KEY=your-key
# OPENAI_STT_MODEL=gpt-4o-transcribe # default; use "whisper-1" for classic Whisper
```

**OpenAI-compatible endpoint** (e.g. a self-hosted Whisper server):

```bash
STT_PROVIDER=openai
OPENAI_API_KEY=any-value
OPENAI_BASE_URL=http://your-server:8000
OPENAI_STT_MODEL=your-model-name
```

> **Note**: OpenAI STT does not support real-time translation. Only the original
> transcript language is returned, matching the user's BBB speech locale.

### Development

#### Testing
Expand All @@ -114,12 +142,20 @@ Run with coverage:
uv run pytest tests/ --ignore=tests/integration --cov --cov-report=term-missing
```

Integration tests require a real Gladia API key and make live requests to the Gladia service. Set `GLADIA_API_KEY` and run:
Integration tests require a real API key and make live requests to the STT service.

For Gladia, set `GLADIA_API_KEY` and run:

```bash
GLADIA_API_KEY=your-key uv run pytest tests/integration -m integration
```

For OpenAI, set `OPENAI_API_KEY` and run:

```bash
OPENAI_API_KEY=your-key uv run pytest tests/integration -m integration
```

#### Linting

This project uses [ruff](https://docs.astral.sh/ruff/) for linting and formatting. To check for issues:
Expand Down
8 changes: 5 additions & 3 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ async def on_redis_message(message_data: str):
meeting_id = routing.get("meetingId")
user_id = routing.get("userId")

if meeting_id != agent.room.name:
if agent.room is None or meeting_id != agent.room.name:
return

if event_name == RedisManager.USER_SPEECH_LOCALE_CHANGED_EVT_MSG:
Expand Down Expand Up @@ -102,7 +102,8 @@ async def on_final_transcript(
original_lang = original_locale.split("-")[0]

for alternative in event.alternatives:
transcript_lang = alternative.language
# Some providers (e.g. OpenAI) may not report a language; fall back to original.
transcript_lang = alternative.language or original_lang
text = alternative.text
bbb_locale = None
start_time_adjusted = math.floor(open_time + alternative.start_time)
Expand Down Expand Up @@ -171,7 +172,8 @@ async def on_interim_transcript(
min_utterance_length = p_settings.get("min_utterance_length", 0)

for alternative in event.alternatives:
transcript_lang = alternative.language
# Some providers (e.g. OpenAI) may not report a language; fall back to original.
transcript_lang = alternative.language or original_lang
text = alternative.text
start_time_adjusted = math.floor(open_time + alternative.start_time)
end_time_adjusted = math.floor(open_time + alternative.end_time)
Expand Down
4 changes: 4 additions & 0 deletions providers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,8 @@ def create_agent(provider: str) -> BaseSttAgent:
from providers.gladia import GladiaSttAgent, gladia_config

return GladiaSttAgent(gladia_config)
if provider == "openai":
from providers.openai import OpenAiSttAgent, openai_config

return OpenAiSttAgent(openai_config)
raise ValueError(f"Unknown STT provider: {provider}")
Loading