Skip to content

CallbackHandler.on_chain_start does not pass trace_name to propagate_attributes, causing non-deterministic trace names on LangGraph resume #1602

@Edison-A-N

Description

@Edison-A-N

Bug Description

When using CallbackHandler with LangGraph, trace names are non-deterministic on graph resume (e.g., after a human-in-the-loop interrupt). Sometimes the trace gets the correct compiled graph name (e.g., "my-agent"), sometimes it gets an empty string "".

Root Cause

In CallbackHandler.on_chain_start (line 320-325), when the root chain starts (parent_run_id is None), propagate_attributes() is called without the trace_name parameter:

# Current code (simplified):
span_name = self.get_langchain_run_name(serialized, **kwargs)  # line 310

if parent_run_id is None:
    self._propagation_context_manager = propagate_attributes(
        user_id=...,
        session_id=...,
        tags=...,
        metadata=...,
        # trace_name is NOT passed here
    )

The propagate_attributes() API does support a trace_name parameter, but it's not being used.

Why This Causes Non-Deterministic Trace Names

LangGraph's Pregel.stream() calls on_chain_start with:

# langgraph/pregel/main.py
name=config.get("run_name", self.get_name())  # self.get_name() = compiled graph name

On initial run, the first on_chain_start event comes from the root graph, so span_name correctly reflects the compiled graph name (e.g., "my-agent").

On resume (e.g., after HITL interrupt via Command(resume=...)), the graph may resume from an internal node. The first on_chain_start event can come from a subgraph node whose name is "", causing the trace to get an empty name.

Since trace_name is not propagated via propagate_attributes(), the trace name depends entirely on whichever on_chain_start fires first — which is non-deterministic on resume.

Reproduction

from langgraph.graph import StateGraph
from langgraph.types import Command, interrupt
from langfuse.langchain import CallbackHandler

# Build a graph with HITL interrupt
def my_node(state):
    answer = interrupt("question?")
    return {"messages": [AIMessage(content=answer)]}

graph = builder.compile(checkpointer=checkpointer, name="my-agent")
handler = CallbackHandler()

# Initial run — trace name = "my-agent" ✅
for chunk in graph.stream(input, config={"callbacks": [handler]}):
    pass

# Resume — trace name is "" (non-deterministic) ❌
for chunk in graph.stream(Command(resume="yes"), config={"callbacks": [handler]}):
    pass

Proposed Fix

Pass span_name as trace_name to propagate_attributes():

if parent_run_id is None:
    self._propagation_context_manager = propagate_attributes(
        trace_name=span_name,  # <-- add this
        user_id=parsed_trace_attributes.get("user_id", None),
        session_id=parsed_trace_attributes.get("session_id", None),
        tags=parsed_trace_attributes.get("tags", None),
        metadata=parsed_trace_attributes.get("metadata", None),
    )

This ensures the trace name is always set from the callback's name parameter, regardless of which internal node fires first on resume.

Workaround

Setting run_name in the LangGraph config forces a consistent name:

config = {"run_name": "my-agent", "callbacks": [handler]}
graph.stream(Command(resume="yes"), config=config)

This works because LangGraph uses config.get("run_name", self.get_name()), so an explicit run_name overrides the non-deterministic behavior. However, users shouldn't need to duplicate the compiled graph name into the config.

Environment

  • langfuse SDK: 4.x (OTel-based)
  • langgraph: 0.4.x
  • Python: 3.12+

Related

  • propagate_attributes() already supports trace_name — it's just not used in the callback handler
  • _parse_langfuse_trace_attributes() parses langfuse_session_id, langfuse_user_id, langfuse_tags from metadata, but has no support for langfuse_trace_name either

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions