Fixes #211 with_structured_output for json_schema method#238
Fixes #211 with_structured_output for json_schema method#238WIll-Xu35 wants to merge 2 commits intodatabricks:mainfrom
Conversation
There was a problem hiding this comment.
Thanks for the contribution! It looks like this only fixes the first of several cascading errors users hit with method="json_schema" (as reported in #211 (comment))
When strict: true is set, the OpenAI-compatible API also requires:
- additionalProperties: false at every object-level node in the schema
- required to match all keys in properties (which can break with nested models / $ref)
raw model_json_schema() doesn't satisfy either of these, so users will still hit 400 errors after this fix. Could you take a look at my comment here, and include test coverage (unit tests + manual testing info in PR description) after you have made updates?
also make sure you rebase newest changes as there is a current conflict
| response_format = { | ||
| "type": "json_schema", | ||
| "json_schema": { | ||
| "name": kwargs.get("schema_name", "json_schema"), |
There was a problem hiding this comment.
the name should be derived and not hardcoded - here you should use convert_to_openai_function() ,(add an additional import from the langchain core function calling class for from langchain_core.utils.function_calling import convert_to_openai_function . we should also do this to address the other cascading errors (see my overall review comment)
can refer to how the partner library is doing this as a reference which calls convert_to_openai_function:
https://github.com/langchain-ai/langchain/blob/202d7f6c4a2ca8c7e5949d935bcf0ba9b0c23fb0/libs/partners/openai/langchain_openai/chat_models/base.py#L1449
suggested fix may look something like this, but please be sure to test!
from langchain_core.utils.function_calling import (
convert_to_openai_function, # add this import
convert_to_openai_tool,
)
# ... in with_structured_output, json_schema branch:
elif method == "json_schema":
if schema is None:
raise ValueError(
"schema must be specified when method is 'json_schema'. Received None."
)
function = convert_to_openai_function(
pydantic_schema if pydantic_schema else schema, strict=True
)
function["schema"] = function.pop("parameters")
response_format = {"type": "json_schema", "json_schema": function}
llm = self.bind(response_format=response_format)
| schema.model_json_schema() if is_pydantic_schema else schema # type: ignore[union-attr] | ||
| ), | ||
| "schema": schema.model_json_schema() if is_pydantic_schema else schema, # type: ignore[union-attr] | ||
| }, |
There was a problem hiding this comment.
the pr is also missing unit test coverage - we have an existing test at tests/unit_tests/test_chat_models.py (test_chat_model_with_structured_output)
in which this line
assert bind["response_format"]["json_schema"]["schema"] == JSON_SCHEMA would likely fail.
can you update this test and include asserts for the name/strict/schema fields? may look something like:
elif method == "json_schema":
js = bind["response_format"]["json_schema"]
assert js["name"] == "AnswerWithJustification"
assert js["strict"] is True
assert js["schema"]["additionalProperties"] is False
can you also add the follow unit tests:
- pydantic schema produces a valid response_format
- raw dict schema produces a valid response_format
- nested Pydantic model to validate $ref dereferencing?
|
see #351 |
Fixes #211
Add "name" key for response_format. It can optional input with key "schema_name". Otherwise default to "json_schema"
Change schema value type from tuple to dict.