-
Couldn't load subscription status.
- Fork 2.1k
fix(tracing): include token count with "invoke llm" and "stream llm" #5921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Updates LLM tracing to include token counts in both invoke llm and stream llm events. For non-streaming invocations, it extracts actual token usage from the model response. For streaming responses where usage data isn't readily available, it estimates token counts using the model-specific tokenizer and marks them with is_estimate: true metadata.
Key changes:
- Added
current_span().log()calls in_invoke_implementationto log actual prompt/completion/total tokens from response usage - Added token estimation logic in
_stream_implementationthat tokenizes prompts and output to approximate token counts - Upgraded braintrust SDK from 0.2.6 to 0.3.5
- Wrapped both logging calls in try/except to ensure logging failures don't break LLM calls
Confidence Score: 4/5
- This PR is safe to merge with low risk - changes are well-isolated to tracing/logging and properly wrapped in error handlers
- Score reflects defensive error handling (all tracing wrapped in try/except), non-critical observability improvements, and reasonable approach to token estimation. Minor concern is that current_span() assumes braintrust is initialized, but this is mitigated by the try/except blocks. The braintrust upgrade is noted by the author as part of PR #5918 and appears stable.
- No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| backend/onyx/llm/chat_llm.py | 4/5 | Adds token count logging to invoke and stream LLM methods - invoked responses use actual usage data, streaming responses use tokenizer estimates with is_estimate metadata |
| backend/requirements/default.txt | 5/5 | Updates braintrust[openai-agents] from 0.2.6 to 0.3.5 |
Sequence Diagram
sequenceDiagram
participant Client
participant DefaultMultiLLM
participant LiteLLM
participant Braintrust
participant Tokenizer
Note over Client,Tokenizer: Non-Streaming Invocation Flow
Client->>DefaultMultiLLM: invoke(prompt, tools, ...)
DefaultMultiLLM->>LiteLLM: completion(stream=False)
LiteLLM-->>DefaultMultiLLM: ModelResponse with usage
DefaultMultiLLM->>Braintrust: current_span().log(metrics={prompt_tokens, completion_tokens, total_tokens})
Note over Braintrust: Logs actual token counts from response.usage
DefaultMultiLLM-->>Client: BaseMessage (output)
Note over Client,Tokenizer: Streaming Invocation Flow
Client->>DefaultMultiLLM: stream(prompt, tools, ...)
DefaultMultiLLM->>LiteLLM: completion(stream=True)
loop For each chunk
LiteLLM-->>DefaultMultiLLM: StreamChunk
DefaultMultiLLM->>DefaultMultiLLM: Accumulate chunks into output
DefaultMultiLLM-->>Client: BaseMessageChunk
end
DefaultMultiLLM->>Tokenizer: get_tokenizer(model_name, provider_type)
Tokenizer-->>DefaultMultiLLM: BaseTokenizer
alt prompt is str
DefaultMultiLLM->>Tokenizer: check_number_of_tokens(prompt)
Tokenizer-->>DefaultMultiLLM: prompt_tokens_est
else prompt is list/Sequence
loop For each message
DefaultMultiLLM->>Tokenizer: check_message_tokens(msg)
Tokenizer-->>DefaultMultiLLM: message_tokens
end
end
DefaultMultiLLM->>Tokenizer: check_number_of_tokens(output.content)
Tokenizer-->>DefaultMultiLLM: completion_tokens_est
DefaultMultiLLM->>Braintrust: current_span().log(metrics={...}, metadata={is_estimate: true})
Note over Braintrust: Logs estimated token counts
2 files reviewed, no comments
e26428f to
d06c1e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues found across 2 files
|
Replaced the estimated token counting with provider usage data. Confirmed it worked with |
1a0bdbf to
a9bc302
Compare
a9bc302 to
6b2441f
Compare
Description
Updates the
invoke llmandstream llmlog events to include the token counts. For streaming events,use the relevant tokenizer to estimate the token counts and includesis_estimatemetadata which can be filtered out in situations accuracy is relevant to the countsinclude_usagewhen applicable.I don't think the braintrust sdk upgrade is relevant here, but is getting upgraded in #5918, so included for my convenience.
How Has This Been Tested?
Confirmed disabling braintrust does not cause errors.
Confirmed in braintrust values appear as expected with reasonable values,
Additional Options
Summary by cubic
Add token usage metrics to LLM tracing. “invoke llm” logs exact counts when available; “stream llm” logs estimated counts with an is_estimate flag.
New Features
Dependencies