Skip to content

[3598] Fix incorrect 8k total-token ceiling in gateway OpenResponses preflight#3599

Merged
njfio merged 5 commits intomasterfrom
3598-tui-token-budget
Mar 19, 2026
Merged

[3598] Fix incorrect 8k total-token ceiling in gateway OpenResponses preflight#3599
njfio merged 5 commits intomasterfrom
3598-tui-token-budget

Conversation

@njfio
Copy link
Copy Markdown
Owner

@njfio njfio commented Mar 19, 2026

Closes #3598

Spec: specs/3598-gateway-openresponses-token-budget.md

What changed:

  • stopped reusing max_input_chars-derived preflight tokens as AgentConfig.max_estimated_total_tokens in the /v1/responses gateway execution path
  • kept transport oversize enforcement at request translation time via input_too_large
  • updated the broader agent/session flow matrix to assert the corrected 413 transport contract instead of the old bogus 502

Why:

  • the gateway was using a transport/input-size guardrail as a total-token ceiling
  • that incorrectly counted system prompt and persisted session history against an ~8k cap
  • the result was TUI failures like estimated_total_tokens=8029, max_total_tokens=8000 under gpt-5.3-codex

Test evidence:

  • cargo test -p tau-gateway 3598_openresponses -- --nocapture
  • cargo test -p tau-gateway tier_pr_a2_agent_session_flow_matrix -- --nocapture
  • cargo test -p tau-gateway -- --nocapture

Copilot AI review requested due to automatic review settings March 19, 2026 20:55
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

njfio has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@njfio njfio merged commit 174452f into master Mar 19, 2026
7 checks passed
@njfio njfio deleted the 3598-tui-token-budget branch March 19, 2026 20:57
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the /v1/responses OpenResponses gateway preflight behavior so that the transport/input-size guardrail (max_input_chars) is no longer (incorrectly) reused as a total-token ceiling for the execution agent, preventing false token budget exceeded failures on large-context models.

Changes:

  • Stop deriving and applying an ~8k max_estimated_total_tokens ceiling from max_input_chars in the OpenResponses execution handler.
  • Update integration tests (and the broader flow matrix assertions) to validate the corrected success path and the 413 input_too_large transport contract.
  • Add a spec doc describing the corrected budgeting behavior and test plan.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
specs/3598-gateway-openresponses-token-budget.md Adds a spec for the corrected budgeting behavior and expected error semantics.
crates/tau-gateway/src/gateway_openresponses/tests.rs Reworks integration/spec tests to assert success under the char cap and 413 input_too_large on oversize payloads; updates flow matrix expectations.
crates/tau-gateway/src/gateway_openresponses/root_utilities.rs Removes the now-unused helper that derived a token limit from max_input_chars.
crates/tau-gateway/src/gateway_openresponses/openresponses_execution_handler.rs Removes reuse of the derived preflight token limit as AgentConfig.max_estimated_total_tokens.
crates/tau-gateway/src/gateway_openresponses.rs Removes the unused import of the deleted preflight token-limit helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +5
# 3598 Gateway OpenResponses Token Budget

## Objective

Fix the `/v1/responses` gateway execution path so preflight token budgeting distinguishes between:
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec won’t be picked up by the repo’s spec tooling/contract as-is. The repository contract expects the binding per-issue spec at specs/<issue-id>/spec.md and the archive indexer only scans specs/*/spec.md, so this file will be ignored unless it’s moved/renamed accordingly (and should also include a Status: line).

Copilot uses AI. Check for mistakes.
Outputs:
- `AgentConfig` passed into the OpenResponses execution agent with correct preflight fields
- HTTP/SSE success for requests that fit within the real gateway preflight intent
- hard-fail `gateway_runtime_error` for genuinely oversized requests
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec’s stated error semantics appear inconsistent with the code/tests in this PR: it says genuinely oversized requests should hard-fail with gateway_runtime_error, but the updated tests assert a 413 Payload Too Large with error code input_too_large (via OpenResponsesApiError::payload_too_large). Consider updating this section to match the actual HTTP/error contract.

Suggested change
- hard-fail `gateway_runtime_error` for genuinely oversized requests
- HTTP 413 Payload Too Large with error code `input_too_large` (via `OpenResponsesApiError::payload_too_large`) for genuinely oversized requests

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix incorrect 8k total-token ceiling in gateway OpenResponses preflight

2 participants