Add FMAPI tool calling contract tests for DatabricksOpenAI by dhruv0811 · Pull Request #348 · databricks/databricks-ai-bridge

dhruv0811 · 2026-02-24T23:27:09Z

Summary

End-to-end FMAPI tool calling integration tests for DatabricksOpenAI and ChatDatabricks (LangGraph), mirroring user code patterns from app-templates. Both #269 (strict field) and #333 (empty assistant content) were caught by customers — these tests ensure those CUJs don't regress.

Also includes a bug fix for Gemini models on FMAPI (details below).

Tests

Models are dynamically discovered via workspace_client.serving_endpoints.list(), filtered to databricks-* + llm/v1/chat, then probed for tool calling support. Tests retry up to 3 times.

OpenAI (Agents SDK + McpServer):

Single-turn, multi-turn, streaming via Runner.run / Runner.run_streamed

LangChain (LangGraph create_react_agent):

Single-turn, multi-turn, streaming via agent.invoke / agent.ainvoke / agent.stream / agent.astream

Gated behind RUN_FMAPI_TOOL_CALLING_TESTS=1.

Bug fix: Gemini FMAPI tool calling compatibility

Gemini FMAPI doesn't conform to the OpenAI API spec in two ways during tool calling:

1. Request side — Rejects tool messages where content is a list of content blocks (e.g. [{"type": "text", "text": "hello"}]). The Agents SDK always produces this format when using MCP tools. Fix: _flatten_list_content_in_messages() flattens to a plain string before sending.

2. Response side (streaming) — Returns delta.content as a list instead of a string in streaming responses. The Agents SDK crashes with ValidationError: Input should be a valid string, input_type=list. Fix: _GeminiStreamWrapper / _AsyncGeminiStreamWrapper intercept stream chunks and flatten list content.

Reproduce request-side issue:

# Tool result with content as LIST — FAILS on Gemini FMAPI
curl -s -X POST "$HOST/serving-endpoints/databricks-gemini-2-5-flash/invocations" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"model":"databricks-gemini-2-5-flash","messages":[
    {"role":"user","content":"Echo hello"},
    {"role":"assistant","content":null,"tool_calls":[{"id":"echo","type":"function","function":{"name":"echo","arguments":"{\"msg\":\"hello\"}"}}]},
    {"role":"tool","tool_call_id":"echo","content":[{"type":"text","text":"hello"}]}
  ],"tools":[{"type":"function","function":{"name":"echo","description":"Echo","parameters":{"type":"object","properties":{"msg":{"type":"string"}},"required":["msg"]}}}],"max_tokens":100}'
# → 400: "Expecting 'content' to be a String"

# Same request with content as STRING — WORKS
# (change "content":[{"type":"text","text":"hello"}] to "content":"hello")

Note: Gemini 2.5 Pro + LangChain

Gemini 2.5 Pro is a reasoning model that consumes 200-600 reasoning tokens from the max_tokens budget before producing output. LangChain tests use max_tokens=1000 for this model (vs 200 for others) to accommodate the reasoning overhead.

Known model issues (skipped)

Model	Issue
`gpt-5-nano`	Too small for reliable tool calling
`gpt-oss-20b`, `gpt-oss-120b`, `llama-4-maverick`	Hallucinates tool names
`gemini-3-flash`, `gemini-3-pro`, `gemini-3-1-pro`	Requires `thought_signature` (Gemini 3.x)
`gemma-3-12b` (LangChain only)	Outputs raw tool call text instead of executing tools

Test plan

Full CI run via ai-oss runner: https://github.com/databricks-eng/ai-oss-integration-tests-runner/actions/runs/22597021048

integrations/openai/tests/integration_tests/test_fmapi_tool_calling.py

bbqiu

this looks great! left two comments

will there be a separate PR for the payloads that langgraph agents will generate?

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Move import logging and log = logging.getLogger(__name__) to module level - Remove inline import logging from retry functions and _discover_foundation_models - Fix stale _XFAIL_MODELS references in docstrings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dhruv0811 force-pushed the fmapi-tool-calling-contract-tests branch 6 times, most recently from ab0acbd to 01579ee Compare February 24, 2026 23:56

dhruv0811 requested a review from bbqiu February 25, 2026 00:02

bbqiu reviewed Feb 25, 2026

View reviewed changes

integrations/openai/tests/integration_tests/test_fmapi_tool_calling.py Outdated Show resolved Hide resolved

bbqiu reviewed Feb 25, 2026

View reviewed changes

integrations/openai/tests/integration_tests/test_fmapi_tool_calling.py Show resolved Hide resolved

bbqiu reviewed Feb 25, 2026

View reviewed changes

dhruv0811 force-pushed the fmapi-tool-calling-contract-tests branch 4 times, most recently from c269d7d to 888f3e9 Compare February 26, 2026 20:59

dhruv0811 requested a review from annzhang-db February 26, 2026 21:10

dhruv0811 force-pushed the fmapi-tool-calling-contract-tests branch 15 times, most recently from aaf77fe to 80a7a4e Compare February 27, 2026 00:51

dhruv0811 force-pushed the fmapi-tool-calling-contract-tests branch 5 times, most recently from ab54296 to bc97991 Compare February 27, 2026 01:03

dhruv0811 requested a review from bbqiu February 27, 2026 03:43

dhruv0811 force-pushed the fmapi-tool-calling-contract-tests branch 8 times, most recently from be72c8c to 2c2cb38 Compare March 2, 2026 21:31

Add FMAPI tool calling contract tests for DatabricksOpenAI

aa23d97

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dhruv0811 force-pushed the fmapi-tool-calling-contract-tests branch from e768ccb to aa23d97 Compare March 2, 2026 21:42

dhruv0811 and others added 3 commits March 2, 2026 13:56

Merge branch 'main' into fmapi-tool-calling-contract-tests

3644164

Fix missing Iterator/AsyncIterator imports for Gemini stream wrappers

6ebbfdc

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FMAPI tool calling contract tests for DatabricksOpenAI#348

Add FMAPI tool calling contract tests for DatabricksOpenAI#348
dhruv0811 wants to merge 4 commits intomainfrom
fmapi-tool-calling-contract-tests

dhruv0811 commented Feb 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

bbqiu left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dhruv0811 commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Bug fix: Gemini FMAPI tool calling compatibility

Note: Gemini 2.5 Pro + LangChain

Known model issues (skipped)

Test plan

Uh oh!

Uh oh!

Uh oh!

bbqiu left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dhruv0811 commented Feb 24, 2026 •

edited

Loading