Skip to content

Refine temperature/top_p handling for reasoning models#2277

Open
mayeco wants to merge 4 commits intoOpenHands:mainfrom
mayeco:patch-3
Open

Refine temperature/top_p handling for reasoning models#2277
mayeco wants to merge 4 commits intoOpenHands:mainfrom
mayeco:patch-3

Conversation

@mayeco
Copy link
Contributor

@mayeco mayeco commented Mar 3, 2026

Update handling of temperature and top_p for OpenAI models.

This fix issue #2278

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

Update handling of temperature and top_p for OpenAI models.
Copilot AI review requested due to automatic review settings March 3, 2026 13:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates chat-completion option normalization to avoid stripping sampling params (temperature, top_p) for non-OpenAI “reasoning effort” models (notably Gemini), while still handling OpenAI reasoning-model quirks.

Changes:

  • Only remove temperature/top_p for OpenAI reasoning models (o1*, o3*) instead of all models that support reasoning_effort.
  • Add model-name normalization logic in select_chat_options to gate this behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

openhands-sdk/openhands/sdk/llm/options/chat_options.py:59

  • The Gemini 2.5-pro substring check is case-sensitive and bypasses the model_lower normalization introduced above. For consistency with other model checks in this function (and with model_features.model_matches, which is case-insensitive), use model_lower (or another normalized form) so variants like provider-prefixed or differently-cased model IDs don’t skip this defaulting behavior.
        # Gemini 2.5-pro default to low if not set
        if "gemini-2.5-pro" in llm.model:
            if llm.reasoning_effort in {None, "none"}:
                out["reasoning_effort"] = "low"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@mayeco mayeco requested a review from Copilot March 3, 2026 13:52
@enyst enyst requested review from enyst and removed request for Copilot March 3, 2026 13:59
model_lower = llm.model.lower()
# Normalize to basename so provider-prefixed IDs like "openai/o1" are handled
model_name = model_lower.split("/")[-1]
if model_name.startswith(("o1", "o3")):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s not only o1, o3, there are also Claudes, open models, GPTs

I’d actually suggest that since we removed temperature by default, we could remove this temperature special cases for all

The other models will also get nothing set (from the SDK), so they all should respect user choice

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution! This bit of code might be unnecessary now, since we just merged #1989

@mayeco
Copy link
Contributor Author

mayeco commented Mar 5, 2026

yes I understand @enyst but #1989 but still not fix my issue.

If I have a reasoning model like gemini, then set the temperature to 0.0. then will be 1 (the default)

https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/options/chat_options.py#L43

{
  "generationConfig": {
    "maxOutputTokens": 65536,
    "temperature": 1,  // ← Incorrect: should be 0.0
    "thinkingConfig": {
      "includeThoughts": true,
      "thinkingLevel": "HIGH"
    }
  }
}

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I agree. I just don't fully understand, why keep the code for o1 and o3?

@mayeco
Copy link
Contributor Author

mayeco commented Mar 7, 2026

yes. maybe is not only o1 / o3 we need to make a list of models that the reasoning fail when temperature is set.
that is better @enyst ?

@enyst
Copy link
Collaborator

enyst commented Mar 8, 2026

Sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants