Skip to content

Conversation

@christineyu123
Copy link
Contributor

@christineyu123 christineyu123 commented Nov 6, 2025

Changes

  1. Added tool calling tests with 8 parameter combinations:

    • stream: True
    • tool_choice: "required"/"auto"
    • strict: True/False
  2. Against user-case = tool_calling featured models

# Tool calling test suite

# Hardcoded list of models to test for tool calling support
# These should be models from the featured list that support OpenAI tool calling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, we should test featured list that support OpenAI tool calling, we can filter OpenAI tool calling using use_cases field

Copy link
Contributor Author

@christineyu123 christineyu123 Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the flow would be : 1) obtain our featured model list (those are existing models) 2) filter use_case = "SUPPORT open ai tool" 3) shoot tool calling tests against the models kept from 2). Right?

Question 1: Wondering anyone know the use_case value to use, any pointer to reference doc?
Question 2: When I upload model, I do not remember I need indicate whether "model support tool calling" or not, wondering why and how the use_case is set.
Question 3: If there are models that partially support tool calling, e.g. for our buggy qwen and gpt-oss they support certain configurations but not others, what we do related to this test suites?

cc @ackizilkale please check the 3 questions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. and 2) Answered to you in slack.
  2. I think we need to make sure whether the toolcalling works and returns a buggy input, we need to fix it asap and reupload maybe. For reference PR regarding openrouter reported issue - https://github.com/Clarifai/model-uploads/pull/114

@christineyu123 christineyu123 marked this pull request as ready for review November 12, 2025 20:39
@christineyu123
Copy link
Contributor Author

Screenshot 2025-11-21 at 2 03 03 PM

Updates:

  • Current version get all featured models that have use_case containing function-calling, and shoot stream=false requests with different configs (e.g. strict: true/false, tool_choice: required/auto)
  • All tests are passing
  • stream=true is removed since tool_call with streaming is brittle for most models.

@luv-bansal @mogith-pn @ackizilkale let me know if all good or any other consideration

Comment on lines +25 to +30
TOOL_CALLING_CONFIGS = [
{"stream": False, "tool_choice": "required", "strict": True},
{"stream": False, "tool_choice": "required", "strict": False},
{"stream": False, "tool_choice": "auto", "strict": True},
{"stream": False, "tool_choice": "auto", "strict": False},
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we only having tests for stream: False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Given we observe flakiness with different models with different streaming+tool problems. I added For now, we only test non-streaming with tool calling and only test for non-streaming now.

Copy link
Contributor

@luv-bansal luv-bansal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@christineyu123 christineyu123 merged commit 2b48310 into master Nov 24, 2025
3 of 7 checks passed
@christineyu123 christineyu123 deleted the openai_tool_calling_tests branch November 24, 2025 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants