Skip to content

fix(openai): tool_calls dropped when content chunk precedes tool deltas in stream#1611

Open
STHITAPRAJNAS wants to merge 2 commits intolangfuse:mainfrom
STHITAPRAJNAS:fix/streaming-tool-calls-dropped-after-content-chunk
Open

fix(openai): tool_calls dropped when content chunk precedes tool deltas in stream#1611
STHITAPRAJNAS wants to merge 2 commits intolangfuse:mainfrom
STHITAPRAJNAS:fix/streaming-tool-calls-dropped-after-content-chunk

Conversation

@STHITAPRAJNAS
Copy link
Copy Markdown

@STHITAPRAJNAS STHITAPRAJNAS commented Apr 4, 2026

Problem

When using the Langfuse-wrapped OpenAI client with models that emit a content chunk before streaming tool-call deltas (Qwen, DeepSeek, and other reasoning models often send "\n\n" or a short preamble first), the generation output logged by Langfuse shows only the content string — all tool_call data is silently dropped.

Root cause: get_response_for_chat() inside _extract_streamed_openai_response used a Python or chain to decide what to return:

return (
    completion["content"]                  # "\n\n" is truthy → short-circuits here
    or (completion["function_call"] and …)
    or (completion["tool_calls"] and …)    # never reached
    or None
)

Because "\n\n" is a truthy string, the expression short-circuited at the first branch and returned the whitespace string, discarding the accumulated tool_calls.

Fixes #12490 (tracked in langfuse/langfuse#12490).

Fix

Check tool_calls before content. When tool calls are present they become the primary output. Non-whitespace content (e.g. a genuine reasoning preamble) is preserved alongside the tool calls rather than being discarded. The function_call (legacy format) and plain-content paths are unchanged.

if completion["tool_calls"]:
    result = {
        "role": "assistant",
        "tool_calls": [{"function": data} for data in completion["tool_calls"]],
    }
    if completion["content"] and completion["content"].strip():
        result["content"] = completion["content"]
    return result

if completion["function_call"]:
    return {"role": "assistant", "function_call": completion["function_call"]}

return completion["content"] or None

Tests

Added tests/test_openai_streaming_unit.py — 9 unit tests, no API calls:

Test Scenario
test_tool_calls_not_dropped_when_whitespace_content_precedes_them Primary bug: "\n\n" before tool deltas
test_whitespace_only_content_not_included_in_result Leading whitespace is omitted from output
test_meaningful_content_preserved_alongside_tool_calls Real preamble text is kept with tool_calls
test_non_whitespace_content_before_tool_calls_preserves_both Multi-chunk preamble + tools
test_plain_text_response_returned_as_string No regression on plain content
test_empty_stream_returns_none Empty stream
test_tool_calls_returned_without_content Pure tool call, no content
test_multiple_tool_calls_all_returned Multiple sequential tool calls
test_function_call_returned_when_no_tool_calls Legacy function_call path
9 passed in 1.54s

Disclaimer: Experimental PR review

Greptile Summary

This PR bundles two independent fixes: a bug fix for the OpenAI streaming path where tool_calls were silently dropped when a whitespace content chunk preceded tool-call deltas, and a new feature for the LangChain callback handler that allows users to pass langfuse_trace_name via metadata to control the propagated trace name.

  • OpenAI streaming (langfuse/openai.py): The original get_response_for_chat() used a Python or chain that short-circuited at the first truthy value — so a "\n\n" content chunk (common from Qwen/DeepSeek) caused all accumulated tool_calls to be discarded. The fix checks tool_calls first and preserves non-whitespace content alongside them when it is meaningful.
  • LangChain callback (langfuse/langchain/CallbackHandler.py): _parse_langfuse_trace_attributes now extracts langfuse_trace_name from metadata; on_chain_start forwards it to propagate_attributes() with span_name as a fallback via or; _strip_langfuse_keys_from_dict strips the key to prevent it leaking into stored metadata.
  • Both changes are covered by new, self-contained unit tests (tests/test_openai_streaming_unit.py and tests/test_langchain_callback_unit.py) that mock away all external API calls.
  • Scope note: The PR title and description focus on the OpenAI fix; the LangChain langfuse_trace_name feature is a non-trivial separate change only briefly mentioned. Consider splitting unrelated changes across separate PRs for easier review and bisectability.

Confidence Score: 5/5

Safe to merge — both fixes are correct, logically sound, and thoroughly covered by unit tests with no P0 or P1 issues found.

All findings are P2 (process/style). The OpenAI or-chain fix correctly prioritises tool_calls over content and handles the whitespace-only edge case cleanly. The LangChain trace_name feature wires correctly into the existing propagate_attributes API (which already accepts trace_name). Comprehensive unit tests cover the primary bug, edge cases, and regression scenarios.

No files require special attention — all four changed files are clean and well-tested.

Important Files Changed

Filename Overview
langfuse/openai.py Rewrites get_response_for_chat() to check tool_calls before content, fixing the or-chain short-circuit that silently dropped tool_calls when whitespace content preceded them in a stream
langfuse/langchain/CallbackHandler.py Adds langfuse_trace_name metadata key support: extracted in _parse_langfuse_trace_attributes, forwarded to propagate_attributes with span_name fallback, and stripped from stored metadata
tests/test_openai_streaming_unit.py New unit tests covering the primary whitespace-before-tool-calls bug and edge cases (pure content, pure tool-calls, multiple tools, legacy function_call) — no real API calls
tests/test_langchain_callback_unit.py New unit tests for langfuse_trace_name parsing, on_chain_start propagation priority, and _strip_langfuse_keys_from_dict — no real API calls

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Stream chunk received] --> B{resource.type == 'chat'?}
    B -- No --> C[Accumulate completion text]
    B -- Yes --> D[Extract delta]
    D --> E{delta has content?}
    E -- Yes --> F[Append to completion content]
    E -- No --> G{delta has function_call?}
    G -- Yes --> H[Accumulate function_call]
    G -- No --> I{delta has tool_calls?}
    I -- Yes --> J[Accumulate tool_calls list]
    F --> L[get_response_for_chat]
    H --> L
    J --> L
    L --> M{completion tool_calls non-empty?}
    M -- Yes --> N[Return dict with tool_calls + non-whitespace content]
    M -- No --> O{completion function_call?}
    O -- Yes --> P[Return dict with function_call]
    O -- No --> Q[Return content or None]
Loading

Reviews (1): Last reviewed commit: "fix(openai): tool_calls dropped when con..." | Re-trigger Greptile

(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!

…tart

When CallbackHandler.on_chain_start fires at the root of a chain
(parent_run_id is None), propagate_attributes was called without a
trace_name, so the trace name was determined by whichever internal node's
on_chain_start happened to fire first. On LangGraph resume (e.g. after a
human-in-the-loop interrupt) that node is often an internal subgraph whose
name is "", which produces a blank trace name.

The fix passes span_name — the name already computed from the serialized
runnable and kwargs — as trace_name to propagate_attributes. This ensures
the trace name is always pinned to the root chain's name regardless of
execution order on resume.

As a companion change, _parse_langfuse_trace_attributes now also reads a
langfuse_trace_name key from LangChain metadata, consistent with the
existing langfuse_session_id / langfuse_user_id / langfuse_tags pattern.
When present, metadata langfuse_trace_name takes priority over the
computed span_name. The key is also added to the strip-list in
_strip_langfuse_keys_from_dict so it does not leak into observation
metadata.

Fixes langfuse#1602
…as in stream

get_response_for_chat() built its return value with a Python `or` chain:

    return completion["content"] or (completion["tool_calls"] and {...}) or None

Models like Qwen and DeepSeek emit a non-empty content chunk (often "\n\n"
or a brief reasoning prefix) before streaming the tool-call deltas. Because
a non-empty string is truthy, the `or` chain short-circuited at the content
branch and returned just the whitespace string, silently discarding all
accumulated tool_call data.

Fix: check tool_calls first. When tool_calls are present, return them as the
primary output. If the content is non-whitespace (e.g. a genuine reasoning
preamble) it is included alongside the tool_calls rather than dropped.
The function_call (legacy OpenAI format) and plain content paths are
unchanged.

Fixes langfuse/langfuse#12490
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant