feat(openai): add OpenAI STT provider with gpt-4o-transcribe support#13
Open
nathanael-h wants to merge 2 commits intobigbluebutton:stt/refactor/generic-providersfrom
Open
Conversation
Adds OpenAIConfig and OpenAISttAgent provider supporting the official OpenAI API and any OpenAI-compatible endpoint. Includes unit tests, integration tests, updated README, .env.example, and CHANGELOG.
Member
|
@nathanael-h Great, thanks! I'll review it. |
Author
|
I found bug (and a memory leak), I am working on fixing it, and maybe small other things. |
The safety flush for _MAX_BUFFER_DURATION_S only existed in the silence branch (elif was_speaking), not during continuous speech. Audio that stayed above the RMS threshold without pausing would accumulate frames indefinitely, causing the process to exhaust memory over long sessions. - Move the max-buffer safety flush into the `if is_speaking:` branch so it fires even during uninterrupted speech or sustained background noise - Remove the unreachable third elif branch (was dead code: buffer_duration is always 0 when both is_speaking and was_speaking are False) - Lower _MAX_BUFFER_DURATION_S from 30s to 12s to reduce peak allocation - Scope open_time as a local variable captured by flush_segment closure, fixing a shared-state bug where multiple participants would overwrite self.open_time and corrupt each other's segment timestamps - Close audio_stream in finally block to release track resources deterministically instead of relying on GC
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hello @prlanzarin here is a PR targeting the refactored branch. So we could consider it replaces #12
I want to be transparent regarding LLM usage, I used Claude to help me on this. Maintainers can edit this branch! I tested on BigBlueButton Server 3.0.22 (3368) and ran the tests locally as well. Live captions and subtitle in recording are working. I used an OpenAI compatible service, selfhosted : OpenWebui + Speaches + Systran/faster-whisper-base
I think this one is ready for review.
Related meta issue bigbluebutton/bigbluebutton#21059