Skip to content

feat: serialize LangChain multimodal content to ingest content blocks#487

Open
pratduv wants to merge 1 commit intomainfrom
feature/sc-56110/multimodal-content-blocks-langchain
Open

feat: serialize LangChain multimodal content to ingest content blocks#487
pratduv wants to merge 1 commit intomainfrom
feature/sc-56110/multimodal-content-blocks-langchain

Conversation

@pratduv
Copy link
Contributor

@pratduv pratduv commented Feb 25, 2026

User description

Summary

  • Convert LangChain's multimodal message format (image_url, audio_url, video_url, etc.) into Galileo's IngestContentBlock schema (TextContentBlock, DataContentBlock) instead of flattening list content to a plain string
  • Add IngestTraces client for the orbit ingest service (opt-in via GALILEO_INGEST_URL env var)
  • Add debug logging at serialization and ingest payload boundaries

Changes

Serialization (src/galileo/utils/serialization.py)

  • New _convert_langchain_content_block() maps LangChain's {"type": "image_url", "image_url": {"url": "..."}} format to DataContentBlock(type="image", url="..."), with base64 data URI detection
  • EventSerializer now converts list content on both AIMessage and BaseMessage subclasses to content block arrays instead of extracting only the first text element

Ingest client (src/galileo/traces.py)

  • New IngestTraces class sends traces directly to the orbit ingest service via httpx.AsyncClient
  • _log_ingest_content_blocks() helper logs content block types at DEBUG level before HTTP POST
  • New Routes.ingest_traces constant

Logger integration (src/galileo/logger/logger.py)

  • GalileoLogger creates an IngestTraces client when GALILEO_INGEST_URL is set, preferring it over the API-proxied path

Handler (src/galileo/handlers/langchain/handler.py)

  • Debug log in on_chat_model_start showing multimodal message count

Test plan

  • TestConvertLangchainContentBlock -- unit tests for text, image URL, base64, audio, unknown fallback
  • TestMultimodalContentSerialization -- round-trip serialization of AIMessage/HumanMessage with multimodal content
  • test_on_chat_model_start_multimodal -- integration test verifying LangChain callback produces structured content blocks
  • Updated existing Responses API test to expect content block arrays instead of flattened string

Dependencies

Requires galileo-core#feature/sc-56110 to be merged and released first (adds IngestContentBlock, TextContentBlock, DataContentBlock to galileo_core.schemas.shared.content_blocks).

Made with Cursor


Generated description

Below is a concise technical summary of the changes proposed in this PR:
Convert LangChain message serialization in EventSerializer to emit TextContentBlock/DataContentBlock arrays so multimodal content_blocks map directly to the ingest schema while preserving string-only legacy flows and surfacing debug logs for multimodal callbacks. Introduce the IngestTraces client along with the GALILEO_INGEST_URL wiring so GalileoLogger can post structured traces directly to the ingest service when available.

TopicDetails
Ingest Client Add the IngestTraces client, routes, and logger wiring so structured traces can be sent to the ingest service via GALILEO_INGEST_URL, including dependency updates.
Modified files (5)
  • poetry.lock
  • pyproject.toml
  • src/galileo/constants/routes.py
  • src/galileo/logger/logger.py
  • src/galileo/traces.py
Latest Contributors(2)
UserCommitDate
ci@rungalileo.iochore-release-v1.49.0March 06, 2026
Focadecombatefeat-support-crewai-1....February 25, 2026
Multimodal Serialization Convert LangChain content blocks into IngestContentBlock arrays through EventSerializer, ensure handlers report multimodal counts, and cover the flow with new serialization tests and assertions in both sync/async callbacks.
Modified files (4)
  • src/galileo/utils/serialization.py
  • tests/test_langchain.py
  • tests/test_langchain_async.py
  • tests/utils/test_serialization.py
Latest Contributors(2)
UserCommitDate
Focadecombatefeat-add-proto-plus-me...February 25, 2026
mikebrancfix-Tool-calls-not-app...November 18, 2025
This pull request is reviewed by Baz. Review like a pro on (Baz).

@pratduv pratduv requested a review from a team as a code owner February 25, 2026 23:21
@pratduv pratduv requested review from dmcwhorter and removed request for dmcwhorter February 25, 2026 23:21
@pratduv pratduv force-pushed the feature/sc-56110/multimodal-content-blocks-langchain branch from 5617134 to aefd885 Compare February 25, 2026 23:32
Comment on lines +223 to +227
@async_warn_catch_exception(logger=_logger)
async def ingest_traces(self, traces_ingest_request: TracesIngestRequest) -> dict[str, Any]:
if self.experiment_id:
traces_ingest_request.experiment_id = UUID(self.experiment_id)
elif self.log_stream_id:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IngestTraces.ingest_traces now reimplements the experiment/log_stream wiring (and by extension the model_dump/_log_ingest_content_blocks flow) that already exists in Traces.ingest_traces; keeping two copies means every future change to request preparation or logging must be applied twice. Can we extract a shared helper that sets experiment_id/log_stream_id, dumps the request, logs the content blocks, and optionally sets logging_method, then call it from both ingestion clients so we only maintain that plumbing in one place?

Finding type: Code Dedup and Conventions


  • Apply fix with Baz

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will clean this up in a follow-up, keeping it as-is for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, we can leave it as-is for now and plan the cleanup for a follow-up; I’ll remember this once the PR is merged.

elif self.experiment_id:
self._traces_client = Traces(project_id=self.project_id, experiment_id=self.experiment_id)

if os.environ.get("GALILEO_INGEST_URL"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only created when this URL is set

@pratduv pratduv force-pushed the feature/sc-56110/multimodal-content-blocks-langchain branch from 1d6a39e to 30195c9 Compare February 26, 2026 15:43
@pratduv
Copy link
Contributor Author

pratduv commented Feb 26, 2026

"""
Multimodal content ingestion via LangChain + Galileo callback.

Invokes the model with REAL calls for every LangChain message type that can
carry multimodal content, then flushes to the ingest service. A successful
flush (no errors) proves the Go ingest service can parse every content-block
variant.

Tests:
  1. Plain text messages
  2. HumanMessage with image_url
  3. HumanMessage with base64 image
  4. Streaming with multimodal input
  5. ToolMessage with multimodal content in conversation history
  6. AIMessage with list content in conversation history

Expects these env vars in ../.env (or the logstreams/.env):
    GALILEO_CONSOLE_URL, GALILEO_API_KEY, GALILEO_PROJECT, GALILEO_LOG_STREAM, OPENAI_API_KEY

Optional:
    GALILEO_INGEST_URL  - if set, uses the orbit ingest service instead of v2 API

Usage:
    cd nbs/examples/py_files/logstreams
    python langchain_multimodal_example.py
"""

import logging
from pathlib import Path

from dotenv import load_dotenv

load_dotenv(Path(__file__).resolve().parents[3] / ".env")

logging.basicConfig(level=logging.INFO, format="%(asctime)s %(name)s [%(levelname)s] %(message)s")

import contextlib  # noqa: E402

from langchain_core.messages import (  # noqa: E402
    AIMessage,
    HumanMessage,
    SystemMessage,
    ToolMessage,
)
from langchain_openai import ChatOpenAI  # noqa: E402

from galileo import galileo_context  # noqa: E402
from galileo.handlers.langchain import GalileoCallback  # noqa: E402

SAMPLE_IMAGE_URL = (
    "https://upload.wikimedia.org/wikipedia/commons/thumb/4/47/"
    "PNG_transparency_demonstration_1.png/280px-PNG_transparency_demonstration_1.png"
)

RED_SQUARE_B64 = (
    "data:image/png;base64,"
    "iVBORw0KGgoAAAANSUhEUgAAAGQAAABkCAIAAAD/gAIDAAABFUlEQVR4nO3OUQkAIABEsetfWiv4"
    "Nx4IC7Cd7XvkByF+EOIHIX4Q4gchfhDiByF+EOIHIX4Q4gchfhDiByF+EOIHIX4Q4gchfhDiByF+"
    "EOIHIX4Q4gchfhDiByF+EOIHIX4Q4gchfhDiByF+EOIHIX4Q4gchfhDiByF+EOIHIX4Q4gchfhDi"
    "ByF+EOIHIX4Q4gchfhDiByF+EOIHIX4Q4gchfhDiByF+EOIHIX4Q4gchfhDiByF+EOIHIX4Q4gch"
    "fhDiByF+EOIHIX4Q4gchfhDiByF+EOIHIX4Q4gchfhDiByF+EOIHIX4Q4gchfhDiByF+EOIHIX4Q"
    "4gchfhDiByF+EOIHIX4Q4gchfhDiByF+EOIHIReeLesrH9s1agAAAABJRU5ErkJggg=="
)


def run_llm_tests() -> None:
    """Run live LLM calls through the Galileo callback, covering all message types."""
    galileo_context.init()

    callback = GalileoCallback()
    model = ChatOpenAI(model="gpt-4o", temperature=0.0)

    # --- Test 1: Plain text ---
    print("\n--- Test 1: Text-only ---")  # noqa: T201
    resp = model.invoke(
        [SystemMessage(content="You are a helpful assistant. Be concise."), HumanMessage(content="What is 2 + 2?")],
        config={"callbacks": [callback]},
    )
    print(f"Response: {resp.content}")  # noqa: T201

    # --- Test 2: HumanMessage with image URL ---
    print("\n--- Test 2: HumanMessage + image URL ---")  # noqa: T201
    resp = model.invoke(
        [
            SystemMessage(content="You are a helpful assistant that can analyze images. Be concise."),
            HumanMessage(
                content=[
                    {"type": "text", "text": "What do you see in this image? Describe it briefly."},
                    {"type": "image_url", "image_url": {"url": SAMPLE_IMAGE_URL}},
                ]
            ),
        ],
        config={"callbacks": [callback]},
    )
    print(f"Response: {resp.content}")  # noqa: T201

    # --- Test 3: HumanMessage with base64 image ---
    print("\n--- Test 3: HumanMessage + base64 image ---")  # noqa: T201
    with contextlib.suppress(Exception):
        resp = model.invoke(
            [
                SystemMessage(content="You are a helpful assistant. Be concise."),
                HumanMessage(
                    content=[
                        {"type": "text", "text": "What color is this image?"},
                        {"type": "image_url", "image_url": {"url": RED_SQUARE_B64}},
                    ]
                ),
            ],
            config={"callbacks": [callback]},
        )
        print(f"Response: {resp.content}")  # noqa: T201

    # --- Test 4: Streaming with multimodal input ---
    print("\n--- Test 4: Streaming multimodal ---")  # noqa: T201
    chunks = []
    for chunk in model.stream(
        [
            SystemMessage(content="You are a helpful assistant that can analyze images. Be concise."),
            HumanMessage(
                content=[
                    {"type": "text", "text": "Describe this image in one sentence."},
                    {"type": "image_url", "image_url": {"url": SAMPLE_IMAGE_URL}},
                ]
            ),
        ],
        config={"callbacks": [callback]},
    ):
        chunks.append(chunk)
    print(f"Streaming: {len(chunks)} chunks received")  # noqa: T201

    # --- Test 5: ToolMessage with multimodal content in conversation history ---
    # LangChain passes the entire conversation via on_chat_model_start, which
    # serializes every message through EventSerializer. Placing a ToolMessage
    # with list[dict] content in history tests that code path end-to-end.
    print("\n--- Test 5: ToolMessage (multimodal) in history ---")  # noqa: T201
    resp = model.invoke(
        [
            SystemMessage(content="You are a helpful assistant. Be concise."),
            HumanMessage(content="Generate a chart of our sales data."),
            AIMessage(
                content="",
                tool_calls=[{"id": "call_chart", "name": "generate_chart", "args": {"data": "sales"}}],
            ),
            ToolMessage(
                content=[
                    {"type": "text", "text": "Chart generated successfully:"},
                    {"type": "image_url", "image_url": {"url": SAMPLE_IMAGE_URL}},
                ],
                tool_call_id="call_chart",
            ),
            HumanMessage(content="Summarize what you see in the chart you generated."),
        ],
        config={"callbacks": [callback]},
    )
    print(f"Response: {resp.content}")  # noqa: T201

    # --- Test 6: AIMessage with list content in conversation history ---
    # Tests serialization of AIMessage where .content is a list of dicts
    # (e.g., from a model that returns structured multimodal output).
    print("\n--- Test 6: AIMessage (list content) in history ---")  # noqa: T201
    resp = model.invoke(
        [
            SystemMessage(content="You are a helpful assistant. Be concise."),
            HumanMessage(
                content=[
                    {"type": "text", "text": "What is in this image?"},
                    {"type": "image_url", "image_url": {"url": SAMPLE_IMAGE_URL}},
                ]
            ),
            AIMessage(
                content=[
                    {"type": "text", "text": "The image shows overlapping colored dice on a transparent background."},
                ]
            ),
            HumanMessage(content="Are the dice all the same color?"),
        ],
        config={"callbacks": [callback]},
    )
    print(f"Response: {resp.content}")  # noqa: T201

    # --- Flush to Galileo ---
    print("\n--- Flushing to Galileo ---")  # noqa: T201
    galileo_context.flush()
    print("Flush complete.")  # noqa: T201


def main() -> None:
    run_llm_tests()


if __name__ == "__main__":
    main()

@pratduv pratduv force-pushed the feature/sc-56110/multimodal-content-blocks-langchain branch from 30195c9 to 3a955c2 Compare March 5, 2026 19:28
@pratduv pratduv requested a review from savula15 March 9, 2026 15:43
@pratduv pratduv force-pushed the feature/sc-56110/multimodal-content-blocks-langchain branch 3 times, most recently from 0819bbc to d714676 Compare March 10, 2026 01:51
Comment on lines 969 to +975
"""
# Auto-convert non-string metadata values to strings
if metadata:
metadata = {k: GalileoLogger._convert_metadata_value(v) for k, v in metadata.items()}
if dataset_metadata:
dataset_metadata = {k: GalileoLogger._convert_metadata_value(v) for k, v in dataset_metadata.items()}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same metadata normalization block (if metadata: metadata = {k: GalileoLogger._convert_metadata_value(v) ...}) appears in add_single_llm_span_trace, add_llm_span, add_tool_span, add_protect_span, add_agent_span, add_workflow_span, and add_retriever_span; keeping 7 copies in sync if the MetadataValue handling changes is error prone. Can we extract a helper like _normalize_metadata(metadata) and call it before building each kwargs dict so all spans share the logic?

Finding type: Code Dedup and Conventions | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing pattern, out of scope for this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, I’ll treat this as an existing pattern and out of scope for this PR, and I’ll remember this for future reviews once the PR is merged.

@pratduv pratduv force-pushed the feature/sc-56110/multimodal-content-blocks-langchain branch from 6335fd3 to c89bd88 Compare March 10, 2026 02:11
Comment on lines +61 to +72
# LangChain multimodal message format mapping.
# See https://python.langchain.com/docs/concepts/multimodality/
# LangChain uses {"type": "<modality>_url", "<modality>_url": {"url": "..."}} for media,
# and {"type": "text", "text": "..."} for text segments.
_LANGCHAIN_TYPE_TO_MODALITY = {
"image_url": "image",
"audio_url": "audio",
"video_url": "video",
"document_url": "document",
"input_image": "image",
"input_audio": "audio",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message.content is being converted to structured content blocks (TextContentBlock / DataContentBlock) when LangChain multimodal messages are encountered. The new functions _convert_langchain_content_block and _normalize_multimodal_content convert list-of-dict LangChain multimodal content into IngestContentBlock instances (TextContentBlock/DataContentBlock). This causes the runtime payloads to include arrays/objects for message.content rather than the legacy string. The OpenAPI schema (openapi.yaml) still declares galileo_core__schemas__logging__llm__Message.content as a string, so traces with multimodal content will no longer conform to the public ingest contract and may be rejected. The schema and downstream validation should be updated to accept content block arrays (or the serialization should continue sending a string representation). Also review the EventSerializer branches that call _normalize_multimodal_content (further down in this file) as related call sites.

Finding type: Breaking Changes | Severity: 🔴 High


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

In src/galileo/utils/serialization.py around lines 61-117, the new functions
_convert_langchain_content_block and _normalize_multimodal_content convert LangChain
multimodal message.content into TextContentBlock/DataContentBlock objects, causing
Message.content to become an array/object instead of the legacy string. Either: (A)
Update the public ingest contract by changing the OpenAPI schema (openapi.yaml) and the
galileo_core__schemas__logging__llm__Message.content type to permit an array of content
blocks (and add/adjust validators and tests), or (B) revert the serialization here to
preserve backward compatibility by keeping message.content as a string (e.g., convert
content list/dicts to a stable string representation before returning), and remove or
limit calls to _normalize_multimodal_content in EventSerializer branches. Apply the
chosen approach consistently to all places in this file that call
_normalize_multimodal_content (and add/update unit tests and schema validation) so
runtime payloads match the public contract.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional. The schema is being widened across the stack (galileo-core, orbit, API). The openapi.yaml here is auto-generated from the API via scripts/import-openapi-yaml.sh, so it'll be updated once the API PR lands and we re-import.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the clarification on the intentional schema widening and OpenAPI auto-generation; I’ll remember this behavior and treat it as expected once the PR is merged.

Comment on lines +1105 to +1108
# Auto-convert non-string metadata values to strings
if metadata:
metadata = {k: GalileoLogger._convert_metadata_value(v) for k, v in metadata.items()}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_llm_span now normalizes metadata with {k: _convert_metadata_value(v)} before every span, and the same 3-line snippet appears at least in add_retriever_span (1189‑1290), add_tool_span (1277‑1281), add_protect_span (1355‑1358), add_workflow_span (1442‑1445) and add_agent_span (1532‑1535). This is copy‑paste plumbing that will need the same edit every time metadata handling changes. Can we extract a helper such as _normalize_metadata(metadata: Optional[dict[str,MetadataValue]]) -> Optional[dict[str,str]] and call it from each span method so the conversion logic lives in one place?

Finding type: Code Dedup and Conventions | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing pattern, out of scope for this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, I’ll treat this as an existing pattern and keep that in mind for future suggestions, and I’ll save this feedback to memory after the PR is merged.

@pratduv pratduv force-pushed the feature/sc-56110/multimodal-content-blocks-langchain branch 2 times, most recently from 5796034 to 107aeeb Compare March 10, 2026 15:31
Convert LangChain multimodal message format (image_url, audio_url, etc.)
into Galileo IngestContentBlock schema (TextContentBlock, DataContentBlock).
Add IngestTraces client for the orbit ingest service.
@pratduv pratduv force-pushed the feature/sc-56110/multimodal-content-blocks-langchain branch from 107aeeb to 576744f Compare March 10, 2026 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant