fix: correct token double-counting for Anthropic and Bedrock providers by amitksingh1490 · Pull Request #2861 · antinomyhq/forgecode

amitksingh1490 · 2026-04-06T04:04:20Z

Summary

Fix token double-counting for Anthropic providers and incorrect prompt_tokens mapping for Bedrock, bringing token reporting.

Context

Anthropic double-counting: Anthropic streams usage as cumulative values across message_start and message_delta events (ref). Our code was using accumulate() (sum) to combine them, producing 1 + N instead of the correct N for output tokens.
Bedrock prompt_tokens: prompt_tokens was set to total_tokens instead of input_tokens, inflating the reported input count by including output tokens.

Changes

Bug 1 – Anthropic cumulative usage (6 providers affected)

crates/forge_domain/src/message.rs: Added Usage::merge() which uses max() per token field instead of +. Cost is still summed since cost events are additive.
crates/forge_domain/src/context.rs: Added TokenCount::max() for comparing two token counts by inner value while preserving Actual/Approx semantics.
crates/forge_domain/src/result_stream_ext.rs: Changed partial-usage branch from accumulate() to merge(). Also fixed cost-only events to sum costs instead of replacing.

Affected providers: anthropic, claude_code, anthropic_compatible, vertex_ai_anthropic, minimax, alibaba_coding.

Bug 2 – Bedrock prompt_tokens

crates/forge_repo/src/provider/bedrock.rs: Changed u.total_tokens → u.input_tokens.

Affected providers: all Bedrock models.

Key Implementation Details

The distinction between accumulate and merge:

Method	Strategy	Use case
`accumulate()`	Sum all fields	Session-level totals across independent requests
`merge()`	Max per token field, sum cost	Combining partial streaming events within one response

Documentation references

Anthropic streaming event types: https://platform.claude.com/docs/en/build-with-claude/streaming#event-types
OpenAI Responses API usage schema: https://developers.openai.com/api/reference/resources/responses#(resource)%20responses%20%3E%20(model)%20response_usage%20%3E%20(schema)

Testing

All existing tests updated + new tests added:

test_into_full_anthropic_streaming_usage_merge – covers real Anthropic pattern where message_start has output_tokens=1
test_into_full_anthropic_streaming_usage_merge_zero_output – covers Vertex AI pattern where message_start has output_tokens=0
test_usage_merge_anthropic_cumulative – unit test for merge logic
test_usage_merge_preserves_costs – verifies cost summation in merge

cargo test -p forge_domain --lib
cargo test -p forge_app --lib
cargo test -p forge_repo --lib

All 1,381 lib tests pass.

Bug 1 - Anthropic double-counting: Anthropic streams usage as CUMULATIVE values across message_start and message_delta events. The code was using accumulate (sum) to combine them, causing output_tokens to be over-counted (1 + N instead of N) when message_start includes output_tokens=1. Fix: Introduced Usage::merge() which uses max() instead of sum for token fields. This correctly handles cumulative values - the larger value wins. ref: https://platform.claude.com/docs/en/build-with-claude/streaming#event-types Affected providers: anthropic, claude_code, anthropic_compatible, vertex_ai_anthropic, minimax, alibaba_coding, opencode_zen (claude-* models) Bug 2 - Bedrock prompt_tokens: Bedrock was setting prompt_tokens to total_tokens instead of input_tokens, inflating the reported input token count by including output tokens. Fix: Changed u.total_tokens to u.input_tokens. Affected providers: All bedrock models. Also fixed: cost-only events now properly accumulate costs (sum) instead of replacing them. Co-Authored-By: ForgeCode <noreply@forgecode.dev>

crates/forge_domain/src/context.rs

The previous implementation used Deref comparison which returned the original variant unchanged. When Actual(200) was compared with Approx(100), it returned Actual(200) - violating the documented contract that the result should be Approx if either input is Approx. Now uses explicit match to ensure Approx propagation matches documentation. Co-Authored-By: ForgeCode <noreply@forgecode.dev>

github-actions bot added the type: fix Iterations on existing features or infrastructure. label Apr 6, 2026

[autofix.ci] apply automated fixes

fa7d746

graphite-app bot reviewed Apr 6, 2026

View reviewed changes

crates/forge_domain/src/context.rs Show resolved Hide resolved

tusharmath merged commit caf374e into main Apr 6, 2026
14 checks passed

tusharmath deleted the fix/token-counting-bugs branch April 6, 2026 05:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct token double-counting for Anthropic and Bedrock providers#2861

fix: correct token double-counting for Anthropic and Bedrock providers#2861
tusharmath merged 3 commits intomainfrom
fix/token-counting-bugs

amitksingh1490 commented Apr 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

amitksingh1490 commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Changes

Bug 1 – Anthropic cumulative usage (6 providers affected)

Bug 2 – Bedrock prompt_tokens

Key Implementation Details

Documentation references

Testing

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amitksingh1490 commented Apr 6, 2026 •

edited

Loading