-
Notifications
You must be signed in to change notification settings - Fork 0
feat(agent): context-aware auto-compaction with retry on context_window_exceeded #3
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Background
Currently auto_compact_history triggers based on message count only (max_history_messages, default 200). This is a rough proxy for context size — it doesn't account for actual token usage or context window limits.
Current Behavior
auto_compact_historyruns once at the start of each turn, compacting old messages into an LLM-generated summary when message count exceeds the limitreliable.rsalready detectscontext_window_exceedederrors viais_context_window_exceeded()(matching keywords like "exceeds the context window", "maximum context length", "too many tokens", etc.)- But on detection, it
bail!s immediately — no recovery, no retry
Proposed Improvement
Phase 1: Compact-then-retry on context exceeded
In orchestrate.rs, when the LLM call returns a context_window_exceeded error:
- Force-run
auto_compact_history(even if message count is within limit) - Retry the LLM call
- If still fails, then fatal error
This is the minimal useful change — leverage the existing error detection in reliable.rs as a reactive trigger.
Phase 2 (future): Token-based proactive compaction
- Track cumulative
input_tokensfrom provider responses across turns - Store per-model context window sizes (from registry or model metadata)
- Trigger compaction when
input_tokens > context_window * 0.8 - Requires providers to reliably return token usage (Codex SSE may not always include it)
Reference
crewforge-rs/src/agent/orchestrate.rs:274-287— current fatal error on LLM failurecrewforge-rs/src/agent/history.rs:114-172— auto_compact_history implementationcrewforge-rs/src/provider/reliable.rs:82-88— is_context_window_exceeded detection- Upstream (vendor/zeroclaw) uses the same message-count approach, no token-based compaction either
Related Constants
COMPACTION_KEEP_RECENT_MESSAGES = 20COMPACTION_MAX_SOURCE_CHARS = 12,000COMPACTION_MAX_SUMMARY_CHARS = 2,000- Token estimation heuristic: ~4 chars/token (from upstream)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request