Skip to content

Add context-window management middleware for ReAct agents#523

Draft
hassan11196 wants to merge 2 commits intoarchi-physics:mainfrom
hassan11196:claude/fix-initialization-logging-yBVw0
Draft

Add context-window management middleware for ReAct agents#523
hassan11196 wants to merge 2 commits intoarchi-physics:mainfrom
hassan11196:claude/fix-initialization-logging-yBVw0

Conversation

@hassan11196
Copy link
Copy Markdown
Collaborator

@hassan11196 hassan11196 commented Mar 16, 2026

Summary

Implements industry-standard context-window overflow prevention for LangGraph ReAct agents using a multi-phase condensation strategy. This prevents LLM API errors caused by exceeding the model's context window during agent execution.

Fix for
image

Key Changes

  • New context_condensation.py module: Implements a three-phase message condensation strategy:

    • Phase 1: Truncate oversized tool-result messages (cheap, often sufficient)
    • Phase 2: Summarize older tool results via LLM (preserves recent reasoning)
    • Phase 3: Drop oldest non-essential messages (last resort, preserves tool-call pairing)
  • New ContextWindowMiddleware class: LangGraph AgentMiddleware that hooks into the before_model lifecycle to condense messages before every LLM call during the ReAct loop, not just at initialization.

  • Tool output limiting: Added wrap_tool_with_output_limit() to enforce size limits on external MCP tool outputs, preventing unbounded context growth.

  • Enhanced base_react.py:

    • Integrated ContextWindowMiddleware into _build_static_middleware()
    • Replaced ad-hoc compression logic in _prepare_agent_inputs() with the new condense_messages() function
    • Added context-overflow error handling in invoke() method
    • Improved wrap-up prompt generation to distinguish between recursion-limit and context-overflow scenarios
    • Applied output limits to MCP tools during initialization
  • Local file search hardening: Added output truncation limits to _search_local_files() and _search_metadata() tools to prevent unbounded result sizes.

Implementation Details

  • Preserves tool-call pairing: The condensation logic carefully maintains the pairing between AIMessages with tool_calls and their corresponding ToolMessages, preventing LLM API errors.
  • Configurable thresholds: Uses 80% of context window as the condensation trigger (industry standard per LangChain's context-engineering guide).
  • Graceful degradation: Works with or without LLM summarization; falls back to truncation and dropping if summarization is unavailable.
  • Comprehensive logging: Tracks condensation phases and token counts for debugging and monitoring.

Add defense-in-depth against context overflow that crashes the agent
when tool results accumulate beyond the model's token limit (621K tokens
vs 272K limit).

Layer 1 - Tool output size limits (local_files.py):
- Add MAX_OUTPUT_CHARS (50K), MAX_RESULTS_HARD_LIMIT (20), MAX_DOCUMENT_CHARS (8K)
- Truncate search_local_files, search_metadata, fetch_catalog_document outputs
- Reduce background document fetch size from 4000 to 2000 chars

Layer 2 - Context overflow error recovery (base_react.py):
- Add _is_context_overflow_error() to detect overflow from any LLM provider
- Catch context overflow in invoke(), stream(), and astream() methods
- Generate graceful wrap-up response using RunMemory context instead of crashing
- Truncate message snippets in wrap-up prompt to prevent re-triggering overflow

https://claude.ai/code/session_01VWeDnRy2NqVTk3YxX55Dvn
Implements industry-standard context-window management following the
LangGraph AgentMiddleware.before_model pattern and LangChain's context
engineering best practices.

Changes:

1. ContextWindowMiddleware (new, middleware/context_window.py):
   - Runs before every LLM call inside the ReAct loop via before_model hook
   - 3-phase condensation: truncate large tool results → LLM-summarize old
     tool results → drop oldest messages preserving AI↔ToolMessage pairing
   - Triggers at 80% of context window usage
   - Automatically enabled in _build_static_middleware()

2. Context condensation utilities (new, utils/context_condensation.py):
   - condense_messages(): Multi-phase message condensation with tool-call
     pairing awareness to prevent LLM API errors
   - wrap_tool_with_output_limit(): Wraps external tools (MCP) with 50K
     char output truncation

3. MCP tool output limits (base_react.py):
   - All MCP server tools are wrapped with output size limits to prevent
     unbounded external tool outputs from overflowing the context
   - Applied during _build_mcp_tools() after sync patching

4. Replaced hard truncation in _prepare_agent_inputs() (base_react.py):
   - Initial history trimming now uses the same condense_messages() logic
     instead of the old compress→crop→brutal-delete approach
   - Preserves tool-call pairing integrity throughout

https://claude.ai/code/session_01VWeDnRy2NqVTk3YxX55Dvn
@hassan11196 hassan11196 force-pushed the claude/fix-initialization-logging-yBVw0 branch from c3cd55c to fdbbe2a Compare March 16, 2026 03:47
@pmlugato pmlugato self-assigned this Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants