Skip to content

fix: correct token double-counting for Anthropic and Bedrock providers#2861

Merged
tusharmath merged 3 commits intomainfrom
fix/token-counting-bugs
Apr 6, 2026
Merged

fix: correct token double-counting for Anthropic and Bedrock providers#2861
tusharmath merged 3 commits intomainfrom
fix/token-counting-bugs

Conversation

@amitksingh1490
Copy link
Copy Markdown
Contributor

@amitksingh1490 amitksingh1490 commented Apr 6, 2026

Summary

Fix token double-counting for Anthropic providers and incorrect prompt_tokens mapping for Bedrock, bringing token reporting.

Context

  1. Anthropic double-counting: Anthropic streams usage as cumulative values across message_start and message_delta events (ref). Our code was using accumulate() (sum) to combine them, producing 1 + N instead of the correct N for output tokens.

  2. Bedrock prompt_tokens: prompt_tokens was set to total_tokens instead of input_tokens, inflating the reported input count by including output tokens.

Changes

Bug 1 – Anthropic cumulative usage (6 providers affected)

  • crates/forge_domain/src/message.rs: Added Usage::merge() which uses max() per token field instead of +. Cost is still summed since cost events are additive.
  • crates/forge_domain/src/context.rs: Added TokenCount::max() for comparing two token counts by inner value while preserving Actual/Approx semantics.
  • crates/forge_domain/src/result_stream_ext.rs: Changed partial-usage branch from accumulate() to merge(). Also fixed cost-only events to sum costs instead of replacing.

Affected providers: anthropic, claude_code, anthropic_compatible, vertex_ai_anthropic, minimax, alibaba_coding.

Bug 2 – Bedrock prompt_tokens

  • crates/forge_repo/src/provider/bedrock.rs: Changed u.total_tokensu.input_tokens.

Affected providers: all Bedrock models.

Key Implementation Details

The distinction between accumulate and merge:

Method Strategy Use case
accumulate() Sum all fields Session-level totals across independent requests
merge() Max per token field, sum cost Combining partial streaming events within one response

Documentation references

Testing

All existing tests updated + new tests added:

  • test_into_full_anthropic_streaming_usage_merge – covers real Anthropic pattern where message_start has output_tokens=1
  • test_into_full_anthropic_streaming_usage_merge_zero_output – covers Vertex AI pattern where message_start has output_tokens=0
  • test_usage_merge_anthropic_cumulative – unit test for merge logic
  • test_usage_merge_preserves_costs – verifies cost summation in merge
cargo test -p forge_domain --lib
cargo test -p forge_app --lib
cargo test -p forge_repo --lib

All 1,381 lib tests pass.

Bug 1 - Anthropic double-counting:
Anthropic streams usage as CUMULATIVE values across message_start and
message_delta events. The code was using accumulate (sum) to combine them,
causing output_tokens to be over-counted (1 + N instead of N) when
message_start includes output_tokens=1.

Fix: Introduced Usage::merge() which uses max() instead of sum for token
fields. This correctly handles cumulative values - the larger value wins.
ref: https://platform.claude.com/docs/en/build-with-claude/streaming#event-types

Affected providers: anthropic, claude_code, anthropic_compatible,
vertex_ai_anthropic, minimax, alibaba_coding, opencode_zen (claude-* models)

Bug 2 - Bedrock prompt_tokens:
Bedrock was setting prompt_tokens to total_tokens instead of input_tokens,
inflating the reported input token count by including output tokens.

Fix: Changed u.total_tokens to u.input_tokens.

Affected providers: All bedrock models.

Also fixed: cost-only events now properly accumulate costs (sum) instead of
replacing them.

Co-Authored-By: ForgeCode <noreply@forgecode.dev>
@github-actions github-actions bot added the type: fix Iterations on existing features or infrastructure. label Apr 6, 2026
The previous implementation used Deref comparison which returned the
original variant unchanged. When Actual(200) was compared with Approx(100),
it returned Actual(200) - violating the documented contract that the result
should be Approx if either input is Approx.

Now uses explicit match to ensure Approx propagation matches documentation.

Co-Authored-By: ForgeCode <noreply@forgecode.dev>
@tusharmath tusharmath merged commit caf374e into main Apr 6, 2026
14 checks passed
@tusharmath tusharmath deleted the fix/token-counting-bugs branch April 6, 2026 05:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: fix Iterations on existing features or infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants