Add context-window management middleware for ReAct agents#523
Draft
hassan11196 wants to merge 2 commits intoarchi-physics:mainfrom
Draft
Add context-window management middleware for ReAct agents#523hassan11196 wants to merge 2 commits intoarchi-physics:mainfrom
hassan11196 wants to merge 2 commits intoarchi-physics:mainfrom
Conversation
Add defense-in-depth against context overflow that crashes the agent when tool results accumulate beyond the model's token limit (621K tokens vs 272K limit). Layer 1 - Tool output size limits (local_files.py): - Add MAX_OUTPUT_CHARS (50K), MAX_RESULTS_HARD_LIMIT (20), MAX_DOCUMENT_CHARS (8K) - Truncate search_local_files, search_metadata, fetch_catalog_document outputs - Reduce background document fetch size from 4000 to 2000 chars Layer 2 - Context overflow error recovery (base_react.py): - Add _is_context_overflow_error() to detect overflow from any LLM provider - Catch context overflow in invoke(), stream(), and astream() methods - Generate graceful wrap-up response using RunMemory context instead of crashing - Truncate message snippets in wrap-up prompt to prevent re-triggering overflow https://claude.ai/code/session_01VWeDnRy2NqVTk3YxX55Dvn
Implements industry-standard context-window management following the
LangGraph AgentMiddleware.before_model pattern and LangChain's context
engineering best practices.
Changes:
1. ContextWindowMiddleware (new, middleware/context_window.py):
- Runs before every LLM call inside the ReAct loop via before_model hook
- 3-phase condensation: truncate large tool results → LLM-summarize old
tool results → drop oldest messages preserving AI↔ToolMessage pairing
- Triggers at 80% of context window usage
- Automatically enabled in _build_static_middleware()
2. Context condensation utilities (new, utils/context_condensation.py):
- condense_messages(): Multi-phase message condensation with tool-call
pairing awareness to prevent LLM API errors
- wrap_tool_with_output_limit(): Wraps external tools (MCP) with 50K
char output truncation
3. MCP tool output limits (base_react.py):
- All MCP server tools are wrapped with output size limits to prevent
unbounded external tool outputs from overflowing the context
- Applied during _build_mcp_tools() after sync patching
4. Replaced hard truncation in _prepare_agent_inputs() (base_react.py):
- Initial history trimming now uses the same condense_messages() logic
instead of the old compress→crop→brutal-delete approach
- Preserves tool-call pairing integrity throughout
https://claude.ai/code/session_01VWeDnRy2NqVTk3YxX55Dvn
c3cd55c to
fdbbe2a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements industry-standard context-window overflow prevention for LangGraph ReAct agents using a multi-phase condensation strategy. This prevents LLM API errors caused by exceeding the model's context window during agent execution.
Fix for

Key Changes
New
context_condensation.pymodule: Implements a three-phase message condensation strategy:New
ContextWindowMiddlewareclass: LangGraphAgentMiddlewarethat hooks into thebefore_modellifecycle to condense messages before every LLM call during the ReAct loop, not just at initialization.Tool output limiting: Added
wrap_tool_with_output_limit()to enforce size limits on external MCP tool outputs, preventing unbounded context growth.Enhanced
base_react.py:ContextWindowMiddlewareinto_build_static_middleware()_prepare_agent_inputs()with the newcondense_messages()functioninvoke()methodLocal file search hardening: Added output truncation limits to
_search_local_files()and_search_metadata()tools to prevent unbounded result sizes.Implementation Details
tool_callsand their corresponding ToolMessages, preventing LLM API errors.