-
Notifications
You must be signed in to change notification settings - Fork 21
OpenAI tool calling tests #235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| # Tool calling test suite | ||
|
|
||
| # Hardcoded list of models to test for tool calling support | ||
| # These should be models from the featured list that support OpenAI tool calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, we should test featured list that support OpenAI tool calling, we can filter OpenAI tool calling using use_cases field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so the flow would be : 1) obtain our featured model list (those are existing models) 2) filter use_case = "SUPPORT open ai tool" 3) shoot tool calling tests against the models kept from 2). Right?
Question 1: Wondering anyone know the use_case value to use, any pointer to reference doc?
Question 2: When I upload model, I do not remember I need indicate whether "model support tool calling" or not, wondering why and how the use_case is set.
Question 3: If there are models that partially support tool calling, e.g. for our buggy qwen and gpt-oss they support certain configurations but not others, what we do related to this test suites?
cc @ackizilkale please check the 3 questions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- and 2) Answered to you in slack.
- I think we need to make sure whether the toolcalling works and returns a buggy input, we need to fix it asap and reupload maybe. For reference PR regarding openrouter reported issue - https://github.com/Clarifai/model-uploads/pull/114
Updates:
@luv-bansal @mogith-pn @ackizilkale let me know if all good or any other consideration |
| TOOL_CALLING_CONFIGS = [ | ||
| {"stream": False, "tool_choice": "required", "strict": True}, | ||
| {"stream": False, "tool_choice": "required", "strict": False}, | ||
| {"stream": False, "tool_choice": "auto", "strict": True}, | ||
| {"stream": False, "tool_choice": "auto", "strict": False}, | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we only having tests for stream: False?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Given we observe flakiness with different models with different streaming+tool problems. I added For now, we only test non-streaming with tool calling and only test for non-streaming now.
luv-bansal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me

Changes
Added tool calling tests with 8 parameter combinations:
Against user-case = tool_calling featured models