Skip to content

bug: with_structured_output(method="json_schema") fails due to missing name, additionalProperties, and required mismatch in response format #350

@ktzsh

Description

@ktzsh

Description

When using ChatDatabricks.with_structured_output() with method="json_schema", the API returns errors because the constructed response_format payload is missing required fields that the OpenAI-compatible API expects. There are three cascading issues, all stemming from the same root cause.

from pydantic import BaseModel, Field
from databricks_langchain import ChatDatabricks

class TerminationReason(BaseModel):
    """Structured termination reason of a conversation between agent and user."""
    non_english: bool = Field(description="Whether or not the conversation is in English")
    frustration: bool = Field(description="If user sounds frustrated, angry or threatening, return True.")

conversation = [
    {"role": "user", "content": "I'm really upset that my order hasn't arrived yet."},
    {"role": "ai", "content": "I'm sorry to hear that. Let me check the status for you."}
]

llm = ChatDatabricks(endpoint="databricks-gpt-5")

result = llm.with_structured_output(
    schema=TerminationReason,
    method="json_schema",
).invoke(conversation)

Errors

  1. Missing response_format.json_schema.name
BadRequestError: Error code: 400"Missing required parameter: 'response_format.json_schema.name'."
The API requires a name field in the json_schema object. The current code does not include one.
  1. Missing additionalProperties: false (if name is added in response format)
    When strict: true, the OpenAI API Spec requires additionalProperties: false at every object-level node. model_json_schema() does not include this by default.
BadRequestError: Error code: 400"Invalid schema for response_format 'json_schema': In context=(),'additionalProperties' is required to be supplied and to be false."
  1. required array mismatch (if name and additionalProperties both are specified)
BadRequestError: Error code: 400"Invalid schema for response_format 'generic-schema-name': In context=(),'required' is required to be supplied and to be an array including every keyin properties. Extra required key 'x' supplied."

Root Cause

The current implementation constructs the response_format manually using raw model_json_schema() output, which is not compliant with the OpenAI structured output API requirements:

response_format = {
    "type": "json_schema",
    "json_schema": {
        "strict": True,
        "schema": (pydantic_schema.model_json_schema() if pydantic_schema else schema),
    },
}

The errors raised also depends on which model is being used and the provider. For example:-

  1. with databricks-gpt-oss only adding name field fixes everything
  2. with databricks-gpt-5... OpenAI enforces other attributes and they fail.

Possible Fixes

  1. Adding required fields directly within the response_format
  2. How langchain-openai handles this - uses _convert_to_openai_response_format which usesconvert_to_openai_function from langchain_core with strict=True which handles all three requirements:
    name: extracted from the Pydantic class name or JSON schema title key
    additionalProperties: false: recursively set on all object nodes via _recursive_set_additional_properties_false
    required: set to exactly list(properties.keys()) so it matches every property

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions