[3598] Fix incorrect 8k total-token ceiling in gateway OpenResponses preflight#3599
[3598] Fix incorrect 8k total-token ceiling in gateway OpenResponses preflight#3599
Conversation
There was a problem hiding this comment.
njfio has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
There was a problem hiding this comment.
Pull request overview
Fixes the /v1/responses OpenResponses gateway preflight behavior so that the transport/input-size guardrail (max_input_chars) is no longer (incorrectly) reused as a total-token ceiling for the execution agent, preventing false token budget exceeded failures on large-context models.
Changes:
- Stop deriving and applying an ~8k
max_estimated_total_tokensceiling frommax_input_charsin the OpenResponses execution handler. - Update integration tests (and the broader flow matrix assertions) to validate the corrected success path and the
413 input_too_largetransport contract. - Add a spec doc describing the corrected budgeting behavior and test plan.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
specs/3598-gateway-openresponses-token-budget.md |
Adds a spec for the corrected budgeting behavior and expected error semantics. |
crates/tau-gateway/src/gateway_openresponses/tests.rs |
Reworks integration/spec tests to assert success under the char cap and 413 input_too_large on oversize payloads; updates flow matrix expectations. |
crates/tau-gateway/src/gateway_openresponses/root_utilities.rs |
Removes the now-unused helper that derived a token limit from max_input_chars. |
crates/tau-gateway/src/gateway_openresponses/openresponses_execution_handler.rs |
Removes reuse of the derived preflight token limit as AgentConfig.max_estimated_total_tokens. |
crates/tau-gateway/src/gateway_openresponses.rs |
Removes the unused import of the deleted preflight token-limit helper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # 3598 Gateway OpenResponses Token Budget | ||
|
|
||
| ## Objective | ||
|
|
||
| Fix the `/v1/responses` gateway execution path so preflight token budgeting distinguishes between: |
There was a problem hiding this comment.
This spec won’t be picked up by the repo’s spec tooling/contract as-is. The repository contract expects the binding per-issue spec at specs/<issue-id>/spec.md and the archive indexer only scans specs/*/spec.md, so this file will be ignored unless it’s moved/renamed accordingly (and should also include a Status: line).
| Outputs: | ||
| - `AgentConfig` passed into the OpenResponses execution agent with correct preflight fields | ||
| - HTTP/SSE success for requests that fit within the real gateway preflight intent | ||
| - hard-fail `gateway_runtime_error` for genuinely oversized requests |
There was a problem hiding this comment.
The spec’s stated error semantics appear inconsistent with the code/tests in this PR: it says genuinely oversized requests should hard-fail with gateway_runtime_error, but the updated tests assert a 413 Payload Too Large with error code input_too_large (via OpenResponsesApiError::payload_too_large). Consider updating this section to match the actual HTTP/error contract.
| - hard-fail `gateway_runtime_error` for genuinely oversized requests | |
| - HTTP 413 Payload Too Large with error code `input_too_large` (via `OpenResponsesApiError::payload_too_large`) for genuinely oversized requests |
Closes #3598
Spec:
specs/3598-gateway-openresponses-token-budget.mdWhat changed:
max_input_chars-derived preflight tokens asAgentConfig.max_estimated_total_tokensin the/v1/responsesgateway execution pathinput_too_large413transport contract instead of the old bogus502Why:
estimated_total_tokens=8029, max_total_tokens=8000undergpt-5.3-codexTest evidence:
cargo test -p tau-gateway 3598_openresponses -- --nocapturecargo test -p tau-gateway tier_pr_a2_agent_session_flow_matrix -- --nocapturecargo test -p tau-gateway -- --nocapture