Skip to content

fix(opencode): reduce memory usage during prompting with lazy boundary scan and context windowing#18137

Open
BYK wants to merge 1 commit intoanomalyco:devfrom
BYK:perf/session-memory-windowing
Open

fix(opencode): reduce memory usage during prompting with lazy boundary scan and context windowing#18137
BYK wants to merge 1 commit intoanomalyco:devfrom
BYK:perf/session-memory-windowing

Conversation

@BYK
Copy link

@BYK BYK commented Mar 18, 2026

Issue for this PR

Closes #18136

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Two targeted optimizations to reduce peak RSS during prompting from ~4-8GB down to ~1.2GB:

1. Lazy compaction boundary scan (filterCompactedLazy)

The prompt loop calls filterCompacted(stream(sessionID)) which streams ALL messages newest→oldest, loading parts for every message. For compacted sessions, most of those parts are discarded once the boundary is found.

New approach: probe the newest 50 message infos (1 DB query, no parts). If a compaction summary is detected, use a two-phase scan — info-only scan to find the boundary, then hydrate parts only for messages after it. If no compaction summary is found, fall back to the original single-pass filterCompacted(stream()) to avoid wasted info-only queries.

2. Context-window message windowing

toModelMessages was called with ALL messages (e.g., 7,704 for a long session), creating ModelMessage wrapper objects for every one. These flow through 4-5 copy layers (toModelMessagesconvertToModelMessagesProviderTransform.messageconvertToLanguageModelPrompt), each creating ~60MB of wrapper objects.

Now the prompt loop estimates which messages from the tail fit in the LLM context window (model.limit.context × 4 chars/token) and only passes those to toModelMessages. For a 7,704-message session where ~200 fit, this cuts the conversion pipeline from ~300MB to ~10MB.

3. Prompt loop caching

The conversation is loaded once before the loop. On normal tool-call iterations, only the latest 200-message page is fetched and merged into the cache. Full reload only happens after compaction.

How did you verify your code works?

  • Monitored RSS with /proc/<PID>/status every 30s during active prompting
  • Before: peak 4.8GB, idle 1GB
  • After: peak 1.2GB, idle ~580MB
  • All session tests pass (118 pass, 4 skip, 0 fail)
  • Tested with both compacted and uncompacted sessions

Screenshots / recordings

Memory monitoring (30s intervals) after fix:

time,rss_mb,hwm_mb
20:46:02,942,1236    ← active prompting
20:48:02,1020,1236   ← peak during tool calls
20:50:02,606,1236    ← settled after activity
20:55:32,568,1236    ← stable idle

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

Two optimizations to drastically reduce memory during prompting:

1. filterCompactedLazy: probe newest 50 message infos (1 query, no
   parts) to detect compaction. If none found, fall back to original
   single-pass filterCompacted(stream()) — avoids 155+ wasted info-only
   queries for uncompacted sessions. Compacted sessions still use the
   efficient two-pass scan.

2. Context-window windowing: before calling toModelMessages, estimate
   which messages from the tail fit in the LLM context window using
   model.limit.context * 4 chars/token. Only convert those messages to
   ModelMessage format. For a 7,704-message session where ~200 fit in
   context, this reduces toModelMessages input from 7,704 to ~200
   messages — cutting ~300MB of wrapper objects across 4-5 copy layers
   down to ~10MB.

Also caches conversation across prompt loop iterations — full reload
only after compaction, incremental merge for tool-call steps.
@github-actions
Copy link
Contributor

Hey! Your PR title perf(session): reduce memory usage during prompting with lazy boundary scan and context windowing doesn't follow conventional commit format.

Please update it to start with one of:

  • feat: or feat(scope): new feature
  • fix: or fix(scope): bug fix
  • docs: or docs(scope): documentation changes
  • chore: or chore(scope): maintenance tasks
  • refactor: or refactor(scope): code refactoring
  • test: or test(scope): adding or updating tests

Where scope is the package name (e.g., app, desktop, opencode).

See CONTRIBUTING.md for details.

@BYK BYK changed the title perf(session): reduce memory usage during prompting with lazy boundary scan and context windowing fix(opencode): reduce memory usage during prompting with lazy boundary scan and context windowing Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: prompt loop loads entire conversation history into memory on every step

1 participant