Skip to content

fix(honcho): restore cache-stable system prompt for turn-varying recall#1203

Closed
erosika wants to merge 2 commits intoNousResearch:mainfrom
erosika:feat/honcho-async-memory
Closed

fix(honcho): restore cache-stable system prompt for turn-varying recall#1203
erosika wants to merge 2 commits intoNousResearch:mainfrom
erosika:feat/honcho-async-memory

Conversation

@erosika
Copy link
Contributor

@erosika erosika commented Mar 13, 2026

Summary

  • Moves Honcho turn-varying recall out of the system prompt into a separate injected user message
  • Keeps the system prompt prefix stable across turns for Anthropic/OpenRouter/OpenAI prompt caching
  • Continuing turns still receive fresh Honcho recall via a <honcho-context> message positioned after system + prefill

Detail

When _honcho_turn_context is appended directly to effective_system, the outbound system message changes on every turn. All three major caching mechanisms (Anthropic prompt caching, OpenRouter sticky routing, OpenAI prefix matching) depend on an exact-match system prefix to avoid cache invalidation.

This change injects the turn-varying recall as a standalone user-role message instead, preserving cache-hit eligibility while still surfacing Honcho context to the model. First-turn behavior is unchanged (context is baked into the cached system prompt once at session start).

Test plan

  • Verify prompt cache hit rates on multi-turn conversations (system prefix should be identical across turns)
  • Verify Honcho recall still reaches the model on continuing turns (<honcho-context> visible in debug logs)
  • Existing test suite passes

erosika added 2 commits March 13, 2026 16:21
…stability

Commit 047b118 reintroduced per-turn Honcho context injection into the
system prompt, reverting the cache-stability fix from aedb773. The system
prefix must stay identical across turns for Anthropic/OpenRouter/OpenAI
prompt caching to work.

Move _honcho_turn_context into a separate <honcho-context> user message
injected after system + prefill, keeping the system prefix stable while
still delivering fresh recall to the model on continuing turns.
@erosika
Copy link
Contributor Author

erosika commented Mar 13, 2026

Closing in favor of #1201 which takes the better approach of appending recall to the current-turn user message rather than injecting a synthetic user message. Also handles multimodal content and includes regression tests.

@erosika erosika closed this Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant