Add FMAPI tool calling contract tests for DatabricksOpenAI#348
Open
Add FMAPI tool calling contract tests for DatabricksOpenAI#348
Conversation
ab0acbd to
01579ee
Compare
bbqiu
reviewed
Feb 25, 2026
integrations/openai/tests/integration_tests/test_fmapi_tool_calling.py
Outdated
Show resolved
Hide resolved
bbqiu
reviewed
Feb 25, 2026
bbqiu
reviewed
Feb 25, 2026
Collaborator
bbqiu
left a comment
There was a problem hiding this comment.
this looks great! left two comments
will there be a separate PR for the payloads that langgraph agents will generate?
c269d7d to
888f3e9
Compare
aaf77fe to
80a7a4e
Compare
ab54296 to
bc97991
Compare
be72c8c to
2c2cb38
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e768ccb to
aa23d97
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move import logging and log = logging.getLogger(__name__) to module level - Remove inline import logging from retry functions and _discover_foundation_models - Fix stale _XFAIL_MODELS references in docstrings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
End-to-end FMAPI tool calling integration tests for
DatabricksOpenAIandChatDatabricks(LangGraph), mirroring user code patterns from app-templates. Both #269 (strict field) and #333 (empty assistant content) were caught by customers — these tests ensure those CUJs don't regress.Also includes a bug fix for Gemini models on FMAPI (details below).
Tests
Models are dynamically discovered via
workspace_client.serving_endpoints.list(), filtered todatabricks-*+llm/v1/chat, then probed for tool calling support. Tests retry up to 3 times.OpenAI (Agents SDK + McpServer):
Runner.run/Runner.run_streamedLangChain (LangGraph
create_react_agent):agent.invoke/agent.ainvoke/agent.stream/agent.astreamGated behind
RUN_FMAPI_TOOL_CALLING_TESTS=1.Bug fix: Gemini FMAPI tool calling compatibility
Gemini FMAPI doesn't conform to the OpenAI API spec in two ways during tool calling:
1. Request side — Rejects tool messages where
contentis a list of content blocks (e.g.[{"type": "text", "text": "hello"}]). The Agents SDK always produces this format when using MCP tools. Fix:_flatten_list_content_in_messages()flattens to a plain string before sending.2. Response side (streaming) — Returns
delta.contentas a list instead of a string in streaming responses. The Agents SDK crashes withValidationError: Input should be a valid string, input_type=list. Fix:_GeminiStreamWrapper/_AsyncGeminiStreamWrapperintercept stream chunks and flatten list content.Reproduce request-side issue:
Note: Gemini 2.5 Pro + LangChain
Gemini 2.5 Pro is a reasoning model that consumes 200-600 reasoning tokens from the
max_tokensbudget before producing output. LangChain tests usemax_tokens=1000for this model (vs 200 for others) to accommodate the reasoning overhead.Known model issues (skipped)
gpt-5-nanogpt-oss-20b,gpt-oss-120b,llama-4-maverickgemini-3-flash,gemini-3-pro,gemini-3-1-prothought_signature(Gemini 3.x)gemma-3-12b(LangChain only)Test plan