Skip to content

[LangChain-Core v1]Did the openai:convert_to-openai_data-block function miss the processing of video? And other providers' converters also do not handle videos #33652

@Yzhhh0828

Description

@Yzhhh0828

Checked other resources

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I read what a minimal reproducible example is (https://stackoverflow.com/help/minimal-reproducible-example).
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Example Code

def convert_to_openai_data_block(
block: dict, api: Literal["chat/completions", "responses"] = "chat/completions"
) -> dict:
"""Format standard data content block to format expected by OpenAI.

"Standard data content block" can include old-style LangChain v0 blocks
(URLContentBlock, Base64ContentBlock, IDContentBlock) or new ones.
"""
if block["type"] == "image":
    chat_completions_block = convert_to_openai_image_block(block)
    if api == "responses":
        formatted_block = {
            "type": "input_image",
            "image_url": chat_completions_block["image_url"]["url"],
        }
        if chat_completions_block["image_url"].get("detail"):
            formatted_block["detail"] = chat_completions_block["image_url"][
                "detail"
            ]
    else:
        formatted_block = chat_completions_block

elif block["type"] == "file":
    if block.get("source_type") == "base64" or "base64" in block:
        # Handle v0 format (Base64CB): {"source_type": "base64", "data": "...", ...}
        # Handle v1 format (IDCB): {"base64": "...", ...}
        base64_data = block["data"] if "source_type" in block else block["base64"]
        file = {"file_data": f"data:{block['mime_type']};base64,{base64_data}"}
        if filename := block.get("filename"):
            file["filename"] = filename
        elif (extras := block.get("extras")) and ("filename" in extras):
            file["filename"] = extras["filename"]
        elif (extras := block.get("metadata")) and ("filename" in extras):
            # Backward compat
            file["filename"] = extras["filename"]
        else:
            # Can't infer filename
            warnings.warn(
                "OpenAI may require a filename for file uploads. Specify a filename"
                " in the content block, e.g.: {'type': 'file', 'mime_type': "
                "'...', 'base64': '...', 'filename': 'my-file.pdf'}",
                stacklevel=1,
            )
        formatted_block = {"type": "file", "file": file}
        if api == "responses":
            formatted_block = {"type": "input_file", **formatted_block["file"]}
    elif block.get("source_type") == "id" or "file_id" in block:
        # Handle v0 format (IDContentBlock): {"source_type": "id", "id": "...", ...}
        # Handle v1 format (IDCB): {"file_id": "...", ...}
        file_id = block["id"] if "source_type" in block else block["file_id"]
        formatted_block = {"type": "file", "file": {"file_id": file_id}}
        if api == "responses":
            formatted_block = {"type": "input_file", **formatted_block["file"]}
    elif "url" in block:  # Intentionally do not check for source_type="url"
        if api == "chat/completions":
            error_msg = "OpenAI Chat Completions does not support file URLs."
            raise ValueError(error_msg)
        # Only supported by Responses API; return in that format
        formatted_block = {"type": "input_file", "file_url": block["url"]}
    else:
        error_msg = "Keys base64, url, or file_id required for file blocks."
        raise ValueError(error_msg)

elif block["type"] == "audio":
    if "base64" in block or block.get("source_type") == "base64":
        # Handle v0 format: {"source_type": "base64", "data": "...", ...}
        # Handle v1 format: {"base64": "...", ...}
        base64_data = block["data"] if "source_type" in block else block["base64"]
        audio_format = block["mime_type"].split("/")[-1]
        formatted_block = {
            "type": "input_audio",
            "input_audio": {"data": base64_data, "format": audio_format},
        }
    else:
        error_msg = "Key base64 is required for audio blocks."
        raise ValueError(error_msg)
elif block["type"] == "video":
    .....
else:
    error_msg = f"Block of type {block['type']} is not supported."
    raise ValueError(error_msg)

return formatted_block

Error Message and Stack Trace (if applicable)

No response

Description

[LangChain-Core v1]Did the openai:convert_to-openai_data-block function miss the processing of video? And other providers' converters also do not handle videos

System Info

langchain v1.0.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugRelated to a bug, vulnerability, unexpected error with an existing feature

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions