fix: prevent gateway race condition when switching providers#1190
Closed
Jah-yee wants to merge 6 commits intoNousResearch:mainfrom
Closed
fix: prevent gateway race condition when switching providers#1190Jah-yee wants to merge 6 commits intoNousResearch:mainfrom
Jah-yee wants to merge 6 commits intoNousResearch:mainfrom
Conversation
Allows users to override the hardcoded 900s timeout when using local LLM providers like Ollama or LM Studio. Fixes NousResearch#1010
When llama.cpp returns function call responses, message.content can be a dict instead of a string, causing 'dict' object has no attribute 'strip' error. This fix adds type checking before calling .strip().
- Bridge stt.enabled from config.yaml to HERMES_STT_ENABLED env var - Check env var in _enrich_message_with_transcription before processing - When stt.enabled: false, voice messages pass through without transcription Fixes: NousResearch#1100
When users explicitly set at_hour or idle_minutes to null in their config.yaml, the from_dict() method now correctly applies default values instead of passing None to validation logic. Fixes: NousResearch#1119
When running 'hermes setup' or 'hermes model' while the gateway is running, _update_config_for_provider() writes to config.yaml immediately with the new provider/base_url but preserves the old model name. This creates a race condition where the gateway can send requests with an incompatible model name to the new provider. This fix: 1. Adds optional 'default_model' parameter to _update_config_for_provider() 2. When switching to affected providers (minimax, minimax-cn, zai, kimi-coding), pass a sensible default model to prevent the race 3. The model selection step later can still override this default Affected providers: MiniMax, MiniMax-CN, Z.AI, Kimi These providers use different model name formats than OpenRouter.
In setup.py, _update_config_for_provider was called without default_model for OpenAI Codex, causing a race condition where: 1. Provider is updated to openai-codex in config.yaml 2. Gateway picks up new provider 3. But model is still the old one (e.g., anthropic/claude-opus-4.6 from OpenRouter) 4. Gateway sends wrong model to Codex → fails This fix: - Line ~598: Pass 'gpt-5.3-codex' as default when first setting up Codex - Line ~936: Pass the selected model (or fallback to default) to ensure the config always has a valid model for the current provider This prevents the race condition where the gateway uses a model name from a different provider after provider switch.
Contributor
|
Closing — the core race condition fix ( The other changes bundled in this PR (context compressor non-string handling, SessionResetPolicy null values, STT enable/disable, configurable timeout) have also been addressed independently in the 1084 commits since this PR was opened. Thank you for identifying the race condition @Jah-yee! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When running
hermes setuporhermes modelwhile the gateway is running,_update_config_for_provider()writes to config.yaml immediately with the new provider/base_url but preserves the old model name. This creates a race condition where the gateway can send requests with an incompatible model name to the new provider.The Problem
anthropic/claude-opus-4.6configuredhermes setupand selects MiniMax as provider_update_config_for_provider()writes: provider=minimax, base_url=... but model still = anthropic/claude-opus-4.6anthropic/claude-opus-4.6to MiniMax API → failsThe Fix
default_modelparameter to_update_config_for_provider()inauth.pyAffected Providers
These providers use different model name formats than OpenRouter, so the old model name is always incompatible.