Skip to content

feat(openai): add OpenAI STT provider with gpt-4o-transcribe support#13

Open
nathanael-h wants to merge 2 commits intobigbluebutton:stt/refactor/generic-providersfrom
nathanael-h:generic-providers-openai
Open

feat(openai): add OpenAI STT provider with gpt-4o-transcribe support#13
nathanael-h wants to merge 2 commits intobigbluebutton:stt/refactor/generic-providersfrom
nathanael-h:generic-providers-openai

Conversation

@nathanael-h
Copy link

Hello @prlanzarin here is a PR targeting the refactored branch. So we could consider it replaces #12

I want to be transparent regarding LLM usage, I used Claude to help me on this. Maintainers can edit this branch! I tested on BigBlueButton Server 3.0.22 (3368) and ran the tests locally as well. Live captions and subtitle in recording are working. I used an OpenAI compatible service, selfhosted : OpenWebui + Speaches + Systran/faster-whisper-base

I think this one is ready for review.

Related meta issue bigbluebutton/bigbluebutton#21059

Adds OpenAIConfig and OpenAISttAgent provider supporting the official
OpenAI API and any OpenAI-compatible endpoint. Includes unit tests,
integration tests, updated README, .env.example, and CHANGELOG.
@prlanzarin prlanzarin self-requested a review March 11, 2026 16:09
@prlanzarin
Copy link
Member

@nathanael-h Great, thanks! I'll review it.

@nathanael-h
Copy link
Author

I found bug (and a memory leak), I am working on fixing it, and maybe small other things.

The safety flush for _MAX_BUFFER_DURATION_S only existed in the silence
branch (elif was_speaking), not during continuous speech. Audio that
stayed above the RMS threshold without pausing would accumulate frames
indefinitely, causing the process to exhaust memory over long sessions.

- Move the max-buffer safety flush into the `if is_speaking:` branch so
  it fires even during uninterrupted speech or sustained background noise
- Remove the unreachable third elif branch (was dead code: buffer_duration
  is always 0 when both is_speaking and was_speaking are False)
- Lower _MAX_BUFFER_DURATION_S from 30s to 12s to reduce peak allocation
- Scope open_time as a local variable captured by flush_segment closure,
  fixing a shared-state bug where multiple participants would overwrite
  self.open_time and corrupt each other's segment timestamps
- Close audio_stream in finally block to release track resources
  deterministically instead of relying on GC
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants