OpenAI tool calling tests #235

christineyu123 · 2025-11-06T22:30:21Z

Changes

Added tool calling tests with 8 parameter combinations:
- stream: True
- tool_choice: "required"/"auto"
- strict: True/False
Against user-case = tool_calling featured models

luv-bansal · 2025-11-07T07:24:21Z

tests/public_models/test_public_models_predicts.py

+# Tool calling test suite
+
+# Hardcoded list of models to test for tool calling support
+# These should be models from the featured list that support OpenAI tool calling


agreed, we should test featured list that support OpenAI tool calling, we can filter OpenAI tool calling using use_cases field

Ok, so the flow would be : 1) obtain our featured model list (those are existing models) 2) filter use_case = "SUPPORT open ai tool" 3) shoot tool calling tests against the models kept from 2). Right?

Question 1: Wondering anyone know the use_case value to use, any pointer to reference doc?
Question 2: When I upload model, I do not remember I need indicate whether "model support tool calling" or not, wondering why and how the use_case is set.
Question 3: If there are models that partially support tool calling, e.g. for our buggy qwen and gpt-oss they support certain configurations but not others, what we do related to this test suites?

cc @ackizilkale please check the 3 questions

and 2) Answered to you in slack.

I think we need to make sure whether the toolcalling works and returns a buggy input, we need to fix it asap and reupload maybe. For reference PR regarding openrouter reported issue - https://github.com/Clarifai/model-uploads/pull/114

christineyu123 · 2025-11-21T19:07:30Z

Updates:

Current version get all featured models that have use_case containing function-calling, and shoot stream=false requests with different configs (e.g. strict: true/false, tool_choice: required/auto)
All tests are passing
stream=true is removed since tool_call with streaming is brittle for most models.

@luv-bansal @mogith-pn @ackizilkale let me know if all good or any other consideration

luv-bansal · 2025-11-24T16:45:40Z

tests/public_models/openai_tool_calling_helper.py

+TOOL_CALLING_CONFIGS = [
+    {"stream": False, "tool_choice": "required", "strict": True},
+    {"stream": False, "tool_choice": "required", "strict": False},
+    {"stream": False, "tool_choice": "auto", "strict": True},
+    {"stream": False, "tool_choice": "auto", "strict": False},
+]


are we only having tests for stream: False?

Yes. Given we observe flakiness with different models with different streaming+tool problems. I added For now, we only test non-streaming with tool calling and only test for non-streaming now.

luv-bansal

Looks good to me

christineyu123 added 2 commits November 6, 2025 16:18

add tool call tests

268e71b

optimize tests

2b20aaa

christineyu123 requested review from ackizilkale, luv-bansal, mogith-pn, phatvo9, srikanthbachala20 and wemoveon2 November 6, 2025 22:30

remove qwen from hardcoded list for tests

03c398e

luv-bansal reviewed Nov 7, 2025

View reviewed changes

use use case to filter

b9a6f49

christineyu123 requested review from sainivedh and sanjaychelliah November 12, 2025 20:38

christineyu123 marked this pull request as ready for review November 12, 2025 20:39

christineyu123 added 2 commits November 19, 2025 12:20

remove hardcode list and streaming

a662126

fix

48fbffc

luv-bansal reviewed Nov 24, 2025

View reviewed changes

luv-bansal approved these changes Nov 24, 2025

View reviewed changes

christineyu123 merged commit 2b48310 into master Nov 24, 2025
3 of 7 checks passed

christineyu123 deleted the openai_tool_calling_tests branch November 24, 2025 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OpenAI tool calling tests #235

OpenAI tool calling tests #235

Uh oh!

christineyu123 commented Nov 6, 2025 •

edited

Loading

Uh oh!

luv-bansal Nov 7, 2025

Uh oh!

christineyu123 Nov 7, 2025 •

edited

Loading

Uh oh!

mogith-pn Nov 12, 2025

Uh oh!

christineyu123 commented Nov 21, 2025

Uh oh!

luv-bansal Nov 24, 2025

Uh oh!

christineyu123 Nov 24, 2025

Uh oh!

luv-bansal left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

OpenAI tool calling tests #235

OpenAI tool calling tests #235

Uh oh!

Conversation

christineyu123 commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

luv-bansal Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

christineyu123 Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mogith-pn Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

christineyu123 commented Nov 21, 2025

Uh oh!

luv-bansal Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

christineyu123 Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

luv-bansal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

christineyu123 commented Nov 6, 2025 •

edited

Loading

christineyu123 Nov 7, 2025 •

edited

Loading