Add context-window management middleware for ReAct agents by hassan11196 · Pull Request #523 · archi-physics/archi

hassan11196 · 2026-03-16T03:20:27Z

Summary

Implements industry-standard context-window overflow prevention for LangGraph ReAct agents using a multi-phase condensation strategy. This prevents LLM API errors caused by exceeding the model's context window during agent execution.

Fix for

Key Changes

New context_condensation.py module: Implements a three-phase message condensation strategy:
- Phase 1: Truncate oversized tool-result messages (cheap, often sufficient)
- Phase 2: Summarize older tool results via LLM (preserves recent reasoning)
- Phase 3: Drop oldest non-essential messages (last resort, preserves tool-call pairing)
New ContextWindowMiddleware class: LangGraph AgentMiddleware that hooks into the before_model lifecycle to condense messages before every LLM call during the ReAct loop, not just at initialization.
Tool output limiting: Added wrap_tool_with_output_limit() to enforce size limits on external MCP tool outputs, preventing unbounded context growth.
Enhanced base_react.py:
- Integrated ContextWindowMiddleware into _build_static_middleware()
- Replaced ad-hoc compression logic in _prepare_agent_inputs() with the new condense_messages() function
- Added context-overflow error handling in invoke() method
- Improved wrap-up prompt generation to distinguish between recursion-limit and context-overflow scenarios
- Applied output limits to MCP tools during initialization
Local file search hardening: Added output truncation limits to _search_local_files() and _search_metadata() tools to prevent unbounded result sizes.

Implementation Details

Preserves tool-call pairing: The condensation logic carefully maintains the pairing between AIMessages with tool_calls and their corresponding ToolMessages, preventing LLM API errors.
Configurable thresholds: Uses 80% of context window as the condensation trigger (industry standard per LangChain's context-engineering guide).
Graceful degradation: Works with or without LLM summarization; falls back to truncation and dropping if summarization is unavailable.
Comprehensive logging: Tracks condensation phases and token counts for debugging and monitoring.

Add defense-in-depth against context overflow that crashes the agent when tool results accumulate beyond the model's token limit (621K tokens vs 272K limit). Layer 1 - Tool output size limits (local_files.py): - Add MAX_OUTPUT_CHARS (50K), MAX_RESULTS_HARD_LIMIT (20), MAX_DOCUMENT_CHARS (8K) - Truncate search_local_files, search_metadata, fetch_catalog_document outputs - Reduce background document fetch size from 4000 to 2000 chars Layer 2 - Context overflow error recovery (base_react.py): - Add _is_context_overflow_error() to detect overflow from any LLM provider - Catch context overflow in invoke(), stream(), and astream() methods - Generate graceful wrap-up response using RunMemory context instead of crashing - Truncate message snippets in wrap-up prompt to prevent re-triggering overflow https://claude.ai/code/session_01VWeDnRy2NqVTk3YxX55Dvn

Implements industry-standard context-window management following the LangGraph AgentMiddleware.before_model pattern and LangChain's context engineering best practices. Changes: 1. ContextWindowMiddleware (new, middleware/context_window.py): - Runs before every LLM call inside the ReAct loop via before_model hook - 3-phase condensation: truncate large tool results → LLM-summarize old tool results → drop oldest messages preserving AI↔ToolMessage pairing - Triggers at 80% of context window usage - Automatically enabled in _build_static_middleware() 2. Context condensation utilities (new, utils/context_condensation.py): - condense_messages(): Multi-phase message condensation with tool-call pairing awareness to prevent LLM API errors - wrap_tool_with_output_limit(): Wraps external tools (MCP) with 50K char output truncation 3. MCP tool output limits (base_react.py): - All MCP server tools are wrapped with output size limits to prevent unbounded external tool outputs from overflowing the context - Applied during _build_mcp_tools() after sync patching 4. Replaced hard truncation in _prepare_agent_inputs() (base_react.py): - Initial history trimming now uses the same condense_messages() logic instead of the old compress→crop→brutal-delete approach - Preserves tool-call pairing integrity throughout https://claude.ai/code/session_01VWeDnRy2NqVTk3YxX55Dvn

hassan11196 added 2 commits March 16, 2026 03:17

hassan11196 force-pushed the claude/fix-initialization-logging-yBVw0 branch from c3cd55c to fdbbe2a Compare March 16, 2026 03:47

pmlugato self-assigned this Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add context-window management middleware for ReAct agents#523

Add context-window management middleware for ReAct agents#523
hassan11196 wants to merge 2 commits intoarchi-physics:mainfrom
hassan11196:claude/fix-initialization-logging-yBVw0

hassan11196 commented Mar 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hassan11196 commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Implementation Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hassan11196 commented Mar 16, 2026 •

edited

Loading