Skip to content

feat(agent): context-aware auto-compaction with retry on context_window_exceeded #3

@Rexopia

Description

@Rexopia

Background

Currently auto_compact_history triggers based on message count only (max_history_messages, default 200). This is a rough proxy for context size — it doesn't account for actual token usage or context window limits.

Current Behavior

  1. auto_compact_history runs once at the start of each turn, compacting old messages into an LLM-generated summary when message count exceeds the limit
  2. reliable.rs already detects context_window_exceeded errors via is_context_window_exceeded() (matching keywords like "exceeds the context window", "maximum context length", "too many tokens", etc.)
  3. But on detection, it bail!s immediately — no recovery, no retry

Proposed Improvement

Phase 1: Compact-then-retry on context exceeded

In orchestrate.rs, when the LLM call returns a context_window_exceeded error:

  1. Force-run auto_compact_history (even if message count is within limit)
  2. Retry the LLM call
  3. If still fails, then fatal error

This is the minimal useful change — leverage the existing error detection in reliable.rs as a reactive trigger.

Phase 2 (future): Token-based proactive compaction

  • Track cumulative input_tokens from provider responses across turns
  • Store per-model context window sizes (from registry or model metadata)
  • Trigger compaction when input_tokens > context_window * 0.8
  • Requires providers to reliably return token usage (Codex SSE may not always include it)

Reference

  • crewforge-rs/src/agent/orchestrate.rs:274-287 — current fatal error on LLM failure
  • crewforge-rs/src/agent/history.rs:114-172 — auto_compact_history implementation
  • crewforge-rs/src/provider/reliable.rs:82-88 — is_context_window_exceeded detection
  • Upstream (vendor/zeroclaw) uses the same message-count approach, no token-based compaction either

Related Constants

  • COMPACTION_KEEP_RECENT_MESSAGES = 20
  • COMPACTION_MAX_SOURCE_CHARS = 12,000
  • COMPACTION_MAX_SUMMARY_CHARS = 2,000
  • Token estimation heuristic: ~4 chars/token (from upstream)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions