fix: prevent gateway race condition when switching providers by Jah-yee · Pull Request #1190 · NousResearch/hermes-agent

Jah-yee · 2026-03-13T17:19:31Z

When running hermes setup or hermes model while the gateway is running, _update_config_for_provider() writes to config.yaml immediately with the new provider/base_url but preserves the old model name. This creates a race condition where the gateway can send requests with an incompatible model name to the new provider.

The Problem

User has OpenRouter with anthropic/claude-opus-4.6 configured
User runs hermes setup and selects MiniMax as provider
_update_config_for_provider() writes: provider=minimax, base_url=... but model still = anthropic/claude-opus-4.6
Gateway picks up the config change and sends anthropic/claude-opus-4.6 to MiniMax API → fails

The Fix

Adds optional default_model parameter to _update_config_for_provider() in auth.py
When switching to affected providers (minimax, minimax-cn, zai, kimi-coding), pass a sensible default model
The model selection step later can still override this default

Affected Providers

MiniMax (default: MiniMax-M2.5)
MiniMax-CN (default: MiniMax-M2.5)
Z.AI (default: glm-4.7)
Kimi (default: kimi-k2.5)

These providers use different model name formats than OpenRouter, so the old model name is always incompatible.

Allows users to override the hardcoded 900s timeout when using local LLM providers like Ollama or LM Studio. Fixes NousResearch#1010

When llama.cpp returns function call responses, message.content can be a dict instead of a string, causing 'dict' object has no attribute 'strip' error. This fix adds type checking before calling .strip().

- Bridge stt.enabled from config.yaml to HERMES_STT_ENABLED env var - Check env var in _enrich_message_with_transcription before processing - When stt.enabled: false, voice messages pass through without transcription Fixes: NousResearch#1100

When users explicitly set at_hour or idle_minutes to null in their config.yaml, the from_dict() method now correctly applies default values instead of passing None to validation logic. Fixes: NousResearch#1119

When running 'hermes setup' or 'hermes model' while the gateway is running, _update_config_for_provider() writes to config.yaml immediately with the new provider/base_url but preserves the old model name. This creates a race condition where the gateway can send requests with an incompatible model name to the new provider. This fix: 1. Adds optional 'default_model' parameter to _update_config_for_provider() 2. When switching to affected providers (minimax, minimax-cn, zai, kimi-coding), pass a sensible default model to prevent the race 3. The model selection step later can still override this default Affected providers: MiniMax, MiniMax-CN, Z.AI, Kimi These providers use different model name formats than OpenRouter.

In setup.py, _update_config_for_provider was called without default_model for OpenAI Codex, causing a race condition where: 1. Provider is updated to openai-codex in config.yaml 2. Gateway picks up new provider 3. But model is still the old one (e.g., anthropic/claude-opus-4.6 from OpenRouter) 4. Gateway sends wrong model to Codex → fails This fix: - Line ~598: Pass 'gpt-5.3-codex' as default when first setting up Codex - Line ~936: Pass the selected model (or fallback to default) to ensure the config always has a valid model for the current provider This prevents the race condition where the gateway uses a model name from a different provider after provider switch.

teknium1 · 2026-03-17T11:26:15Z

Closing — the core race condition fix (default_model parameter on _update_config_for_provider()) is already on main. The function now writes a valid default model when switching providers to prevent the gateway from using an incompatible model name.

The other changes bundled in this PR (context compressor non-string handling, SessionResetPolicy null values, STT enable/disable, configurable timeout) have also been addressed independently in the 1084 commits since this PR was opened.

Thank you for identifying the race condition @Jah-yee!

Jah-yee and others added 6 commits March 12, 2026 16:38

feat: Add OPEN_AI_LLM_TIMEOUT env var for LLM timeout configuration

2b442e4

Allows users to override the hardcoded 900s timeout when using local LLM providers like Ollama or LM Studio. Fixes NousResearch#1010

fix: Handle non-string content in context compressor response

6c85b9e

When llama.cpp returns function call responses, message.content can be a dict instead of a string, causing 'dict' object has no attribute 'strip' error. This fix adds type checking before calling .strip().

fix: handle null values in SessionResetPolicy.from_dict()

edf0a4e

When users explicitly set at_hour or idle_minutes to null in their config.yaml, the from_dict() method now correctly applies default values instead of passing None to validation logic. Fixes: NousResearch#1119

teknium1 closed this Mar 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent gateway race condition when switching providers#1190

fix: prevent gateway race condition when switching providers#1190
Jah-yee wants to merge 6 commits intoNousResearch:mainfrom
Jah-yee:fix/provider-race-condition

Jah-yee commented Mar 13, 2026

Uh oh!

teknium1 commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jah-yee commented Mar 13, 2026

The Problem

The Fix

Affected Providers

Uh oh!

teknium1 commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants