Potentially duplicated token counts on generation spans [LI Instrumentor + Langfuse] #12897

rayw-lr · 2026-03-30T15:47:14Z

rayw-lr
Mar 30, 2026

Describe your question

Hi there,

llama-index-core==0.14.19
langfuse==4.0.1
openinference-instrumentation==0.1.46
openinference-instrumentation-llama-index==4.3.9

I'm using the following to generate answers based on a user's query:

Azure OpenAI
Llamaindex RetrieverQueryEngine
Langfuse
Llamaindex Instrumentor

RetrieverQueryEngine Response Synthesizer
- streaming=True
- use_async=True

We use a combination of Langfuse's OTEL instrumentation (observe decorator and / or context manager). However, this alone does not capture the OTEL spans emitted from Llamaindex-specific function calls. For example, any step conducted from RetrieverQueryEngine.aquery would be missed - if we do not use LlamaIndexInstrumentor().instrument()

In the attached screenshot, see the AzureOpenAI.* generation spans which report the tokens used at those steps. Langfuse seems to automatically aggregate all tokens reported from generation spans - resulting in over-inflated reported token usage. I'm not sure if these parent spans (i.e., astream) should be counting these tokens - when it seems like the actual usage comes from the child span (astream_chat)

I believe this affects all nested generation spans. Not sure if this issue is inherent to langfuse or to llamaindex's instrumentor.

Could I get some help navigating this issue?

Langfuse Cloud or Self-Hosted?

Self-Hosted

If Self-Hosted

3.162.0

If Langfuse Cloud

No response

SDK and integration versions

Langfuse Python SDK v4.01
Latest versions of these

Pre-Submission Checklist

I have checked for existing issues/discussions and consulted Langfuse AI.

2026-03-30T15:47:52Z

inkeep-ai-assistant[bot]
Bot Mar 30, 2026

This appears to be related to how the LlamaIndex OpenInference instrumentor emits spans, rather than a Langfuse-specific issue. The Langfuse SDK integrates with third-party OTEL-based instrumentation libraries like openinference-instrumentation-llama-index, and these libraries control how spans and their attributes (including token usage) are structured.(1)

From your screenshot, it looks like both the parent AzureOpenAI.astream span and the child AzureOpenAI.astream_chat span are reporting token counts. If the LlamaIndex instrumentor is emitting usage details on both parent and child generation spans, Langfuse will display what it receives—it doesn't deduplicate or aggregate token counts across nested generations.

Potential approaches:

Filter spans at the SDK level: You can use Langfuse's span filtering to control which spans are exported. By default, Langfuse exports spans from known LLM instrumentation scopes including openinference.*.(2) You could provide a custom should_export_span function to filter out the parent spans that are duplicating token counts:

from langfuse import Langfuse
from langfuse.span_filter import is_default_export_span

langfuse = Langfuse(
    should_export_span=lambda span: (
        is_default_export_span(span)
        and not span.name.endswith(".astream")  # Example filter logic
    )
)

(2)

Investigate the instrumentor: Since the token counts are being set by the openinference-instrumentation-llama-index library, the duplication may need to be addressed there. The Langfuse SDK captures spans as they're emitted by OTEL instrumentors.(1)
Debug which spans carry usage: Enable debug logging to inspect the spans being exported:

from langfuse import Langfuse

langfuse = Langfuse(debug=True)

(2)

This will help you identify exactly which spans from the LlamaIndex instrumentor are carrying the usage details, so you can determine whether to filter them or raise the issue with the OpenInference maintainers.

The core issue is that the LlamaIndex instrumentor appears to be setting usage_details on multiple nested spans for the same LLM call, and Langfuse faithfully displays what it receives.

📚 Sources:

Have another question? Just tag @inkeep.

0 replies

CodeCLS · 2026-04-01T13:42:30Z

CodeCLS
Apr 1, 2026
Maintainer

Hi @rayw-lr, in this case it's best to contact OpenInference in regards to their LlamaIndexInstrumentor tool and ask for support. However, the code proposed above is a good workaround for a fast solution.

0 replies

itxashancode · 2026-04-10T09:35:08Z

itxashancode
Apr 10, 2026

Problem: duplicated token counts on generation spans when using LlamaIndex Instrumentor + Langfuse. The issue: Langfuse aggregates tokens from generation spans, causing double counting because parent and child spans both report token usage. Likely due to instrumentation emitting spans at multiple levels (e.g., astream and astream_chat). Need to suggest adjusting token attribution: either disable token reporting on parent spans, or configure Langfuse to not sum tokens from certain spans, or modify instrumentation to avoid double counting. Actually OpenInference instrumentation for LlamaIndex has an option to disable token usage reporting? Let's recall: OpenInference instrumentation for LlamaIndex captures token usage from LLM calls and adds attributes like llm.token_count.prompt, llm.token_count.completion. If both parent and child spans have these attributes, Langfuse sums them. Solu

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse

Potentially duplicated token counts on generation spans [LI Instrumentor + Langfuse] #12897

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Langfuse

Potentially duplicated token counts on generation spans [LI Instrumentor + Langfuse] #12897

Uh oh!

Uh oh!

rayw-lr Mar 30, 2026

Describe your question

Langfuse Cloud or Self-Hosted?

If Self-Hosted

If Langfuse Cloud

SDK and integration versions

Pre-Submission Checklist

Replies: 3 comments

Uh oh!

inkeep-ai-assistant[bot] Bot Mar 30, 2026

Uh oh!

CodeCLS Apr 1, 2026 Maintainer

Uh oh!

itxashancode Apr 10, 2026

rayw-lr
Mar 30, 2026

inkeep-ai-assistant[bot]
Bot Mar 30, 2026

CodeCLS
Apr 1, 2026
Maintainer

itxashancode
Apr 10, 2026