fix(tracing): include token count with "invoke llm" and "stream llm" #5921

trial2onyx · 2025-10-24T22:44:34Z

Description

Updates the invoke llm and stream llm log events to include the token counts. For streaming events, ~~use the relevant tokenizer to estimate the token counts and includes is_estimate metadata which can be filtered out in situations accuracy is relevant to the counts~~ include_usage when applicable.

I don't think the braintrust sdk upgrade is relevant here, but is getting upgraded in #5918, so included for my convenience.

How Has This Been Tested?

Confirmed disabling braintrust does not cause errors.

Confirmed in braintrust values appear as expected with reasonable values,

Additional Options

Override Linear Check

Summary by cubic

Add token usage metrics to LLM tracing. “invoke llm” logs exact counts when available; “stream llm” logs estimated counts with an is_estimate flag.

New Features
- Log prompt_tokens, completion_tokens, and total_tokens via current_span for invoke responses (from response.usage when present).
- Estimate streaming token counts with the model’s tokenizer; include metadata is_estimate: true and estimate_method: tokenizer_estimate.
- Logging is best-effort and never impacts the request path.
Dependencies
- Upgrade braintrust[openai-agents] to 0.3.5.

vercel · 2025-10-24T22:44:40Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
internal-search	Ready	Preview	Comment	Oct 25, 2025 1:41am

greptile-apps

Greptile Overview

Greptile Summary

Updates LLM tracing to include token counts in both invoke llm and stream llm events. For non-streaming invocations, it extracts actual token usage from the model response. For streaming responses where usage data isn't readily available, it estimates token counts using the model-specific tokenizer and marks them with is_estimate: true metadata.

Key changes:

Added current_span().log() calls in _invoke_implementation to log actual prompt/completion/total tokens from response usage
Added token estimation logic in _stream_implementation that tokenizes prompts and output to approximate token counts
Upgraded braintrust SDK from 0.2.6 to 0.3.5
Wrapped both logging calls in try/except to ensure logging failures don't break LLM calls

Confidence Score: 4/5

This PR is safe to merge with low risk - changes are well-isolated to tracing/logging and properly wrapped in error handlers
Score reflects defensive error handling (all tracing wrapped in try/except), non-critical observability improvements, and reasonable approach to token estimation. Minor concern is that current_span() assumes braintrust is initialized, but this is mitigated by the try/except blocks. The braintrust upgrade is noted by the author as part of PR #5918 and appears stable.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
backend/onyx/llm/chat_llm.py	4/5	Adds token count logging to invoke and stream LLM methods - invoked responses use actual usage data, streaming responses use tokenizer estimates with is_estimate metadata
backend/requirements/default.txt	5/5	Updates braintrust[openai-agents] from 0.2.6 to 0.3.5

Sequence Diagram

sequenceDiagram
    participant Client
    participant DefaultMultiLLM
    participant LiteLLM
    participant Braintrust
    participant Tokenizer

    Note over Client,Tokenizer: Non-Streaming Invocation Flow
    Client->>DefaultMultiLLM: invoke(prompt, tools, ...)
    DefaultMultiLLM->>LiteLLM: completion(stream=False)
    LiteLLM-->>DefaultMultiLLM: ModelResponse with usage
    DefaultMultiLLM->>Braintrust: current_span().log(metrics={prompt_tokens, completion_tokens, total_tokens})
    Note over Braintrust: Logs actual token counts from response.usage
    DefaultMultiLLM-->>Client: BaseMessage (output)

    Note over Client,Tokenizer: Streaming Invocation Flow
    Client->>DefaultMultiLLM: stream(prompt, tools, ...)
    DefaultMultiLLM->>LiteLLM: completion(stream=True)
    loop For each chunk
        LiteLLM-->>DefaultMultiLLM: StreamChunk
        DefaultMultiLLM->>DefaultMultiLLM: Accumulate chunks into output
        DefaultMultiLLM-->>Client: BaseMessageChunk
    end
    
    DefaultMultiLLM->>Tokenizer: get_tokenizer(model_name, provider_type)
    Tokenizer-->>DefaultMultiLLM: BaseTokenizer
    
    alt prompt is str
        DefaultMultiLLM->>Tokenizer: check_number_of_tokens(prompt)
        Tokenizer-->>DefaultMultiLLM: prompt_tokens_est
    else prompt is list/Sequence
        loop For each message
            DefaultMultiLLM->>Tokenizer: check_message_tokens(msg)
            Tokenizer-->>DefaultMultiLLM: message_tokens
        end
    end
    
    DefaultMultiLLM->>Tokenizer: check_number_of_tokens(output.content)
    Tokenizer-->>DefaultMultiLLM: completion_tokens_est
    
    DefaultMultiLLM->>Braintrust: current_span().log(metrics={...}, metadata={is_estimate: true})
    Note over Braintrust: Logs estimated token counts

_{2 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

…events nit

cubic-dev-ai

No issues found across 2 files

trial2onyx · 2025-10-25T00:25:34Z

Replaced the estimated token counting with provider usage data. Confirmed it worked with stream llm events using mistralai/devstral-small & openrouter and clarifier stream and process events using gpt-5-nano & openrouter. Also confirmed the option is ignored when streaming is disabled (include the case DISABLE_LITELLM_STREAMING=True).

trial2onyx requested a review from a team as a code owner October 24, 2025 22:44

vercel bot deployed to Preview October 24, 2025 22:46 View deployment

greptile-apps bot reviewed Oct 24, 2025

View reviewed changes

fix(tracing): include token count with "invoke llm" and "stream llm" …

d06c1e9

…events nit

trial2onyx force-pushed the jamison/fix-traced-counting branch from e26428f to d06c1e9 Compare October 24, 2025 22:50

vercel bot deployed to Preview October 24, 2025 22:53 View deployment

cubic-dev-ai bot reviewed Oct 24, 2025

View reviewed changes

vercel bot deployed to Preview October 25, 2025 00:14 View deployment

vercel bot deployed to Preview October 25, 2025 00:33 View deployment

trial2onyx force-pushed the jamison/fix-traced-counting branch from 1a0bdbf to a9bc302 Compare October 25, 2025 00:50

vercel bot deployed to Preview October 25, 2025 00:53 View deployment

replace estimated counts with stream_options usage

6b2441f

trial2onyx force-pushed the jamison/fix-traced-counting branch from a9bc302 to 6b2441f Compare October 25, 2025 01:39

vercel bot deployed to Preview October 25, 2025 01:41 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(tracing): include token count with "invoke llm" and "stream llm" #5921

fix(tracing): include token count with "invoke llm" and "stream llm" #5921

trial2onyx commented Oct 24, 2025 •

edited

Loading

Uh oh!

vercel bot commented Oct 24, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

trial2onyx commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

fix(tracing): include token count with "invoke llm" and "stream llm" #5921

Are you sure you want to change the base?

fix(tracing): include token count with "invoke llm" and "stream llm" #5921

Conversation

trial2onyx commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Additional Options

Summary by cubic

Uh oh!

vercel bot commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

trial2onyx commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

trial2onyx commented Oct 24, 2025 •

edited

Loading

vercel bot commented Oct 24, 2025 •

edited

Loading