feat(loop): integrate evolution, memory, and mid-loop critique#9
Closed
electronicBlacksmith wants to merge 7 commits intomainfrom
Closed
feat(loop): integrate evolution, memory, and mid-loop critique#9electronicBlacksmith wants to merge 7 commits intomainfrom
electronicBlacksmith wants to merge 7 commits intomainfrom
Conversation
Closes #5. The feedback pipeline in LoopRunner already existed but was gated on loop.channelId, which was always null because the agent never plumbed channel_id/conversation_id into the in-process MCP tool call, that context only lived in the router. - AsyncLocalStorage<SlackContext> captures the Slack channel/thread/ trigger-message for the current turn so phantom_loop can auto-fill them when the agent omits them. Explicit tool args still win. - Reaction ladder on the operator's original message: hourglass -> cycle -> terminal (check/stop/warning/x). Restart-safe via iteration === 1 check, no in-memory flag. - Inline unicode progress bar in the edited status message. - New trigger_message_ts column on loops, appended as migration #11. - Extracted LoopNotifier into src/loop/notifications.ts, runner.ts was already at the 300-line cap. 34 new tests, 938 pass / 0 fail.
…tion Two defects surfaced during the first Slack end-to-end test of the loop feedback fix: 1. Stop button disappeared after the first tick. Slack's chat.update replaces the message wholesale and strips any blocks the caller does not include. postStartNotice attached the button but postTickUpdate called updateMessage without blocks, so the button was wiped on the first progress edit. Extract buildStatusBlocks() and re-send it on every tick edit. Final notice still omits blocks intentionally so the button disappears when the loop is no longer interruptible. 2. No end-of-loop summary. The agent curates the state.md body every tick (Goal, Progress, Next Action, Notes), but that content never reached the operator. Post it as a threaded reply when the loop finalizes. No extra agent cost: we surface content the agent already wrote. Frontmatter stripped, truncated at 3500 chars, silently skipped if the file is missing or empty. +7 tests covering both regressions. 945 pass / 0 fail.
…l message 1. Tick update race: postTickUpdate was fire-and-forget, so a stop on tick N+1 could race with tick N's Slack write. If the tick update's HTTP response arrived after postFinalNotice, it overwrote the final message and re-sent the Stop button blocks. Awaiting postTickUpdate serializes Slack writes so finalize always runs after the last tick update completes. 2. Final message now includes the progress bar at its halted position, visually consistent with tick updates. A stopped loop at 3/10 shows the bar frozen at 3/10 with "stopped" instead of a terse one-liner.
…oop ticks Loop ticks now use Phantom's full intelligence stack instead of running blind: Phase 1 - Memory context injection: cached once at loop start from the goal, injected into every tick prompt via TickPromptOptions. Cleared on finalize, rebuilt on resume. Phase 2 - Post-loop evolution and consolidation: bounded transcript accumulation (first tick + rolling 10 summaries + last tick), SessionData synthesis in finalize(), fire-and-forget evolution pipeline and LLM/heuristic memory consolidation with cost-cap guards matching the interactive path. Phase 3 - Mid-loop critique checkpoints: optional checkpoint_interval param lets the agent request Sonnet 4.6 review every N ticks. Guard requires evolution enabled, LLM judges active, and cost cap not exceeded. Critique is awaited before next tick to avoid race conditions. Closes #8
- Decouple postLoopDeps so evolution and memory run independently (evolution works when memory is down and vice versa) - Skip mid-loop critique on terminal ticks to avoid wasted Sonnet calls - Track judge cost on failure paths via JudgeParseError carrying usage data - Extract recordTranscript/clamp from runner.ts to post-loop.ts (292 < 300 lines)
9 tasks
electronicBlacksmith
added a commit
that referenced
this pull request
Apr 6, 2026
PR #7 was squash-merged into main while PR #9's branch still had the original commits. Conflicts were all additive - kept PR #9's features (checkpoint_interval, memory context, critique, post-loop pipeline) while adopting main's improved error formatting and race condition comment in the tick update await.
Wire setTriggerDeps before startServer so the handler is ready on the first request. Use server.url.origin instead of manually building the URL from server.port which can race in CI. Add a health check fetch to confirm the server is accepting connections before tests run.
electronicBlacksmith
added a commit
that referenced
this pull request
Apr 6, 2026
- Decouple postLoopDeps so evolution and memory run independently (evolution works when memory is down and vice versa) - Skip mid-loop critique on terminal ticks to avoid wasted Sonnet calls - Track judge cost on failure paths via JudgeParseError carrying usage data - Extract recordTranscript/clamp from runner.ts to post-loop.ts (292 < 300 lines)
5 tasks
Owner
Author
|
Superseded by #14 (consolidated clean branch) |
electronicBlacksmith
added a commit
that referenced
this pull request
Apr 7, 2026
* feat(loop): integrate evolution, memory, and mid-loop critique into loop ticks Loop ticks now use Phantom's full intelligence stack instead of running blind: Phase 1 - Memory context injection: cached once at loop start from the goal, injected into every tick prompt via TickPromptOptions. Cleared on finalize, rebuilt on resume. Phase 2 - Post-loop evolution and consolidation: bounded transcript accumulation (first tick + rolling 10 summaries + last tick), SessionData synthesis in finalize(), fire-and-forget evolution pipeline and LLM/heuristic memory consolidation with cost-cap guards matching the interactive path. Phase 3 - Mid-loop critique checkpoints: optional checkpoint_interval param lets the agent request Sonnet 4.6 review every N ticks. Guard requires evolution enabled, LLM judges active, and cost cap not exceeded. Critique is awaited before next tick to avoid race conditions. Closes #8 * fix(loop): address code review findings from PR #9 - Decouple postLoopDeps so evolution and memory run independently (evolution works when memory is down and vice versa) - Skip mid-loop critique on terminal ticks to avoid wasted Sonnet calls - Track judge cost on failure paths via JudgeParseError carrying usage data - Extract recordTranscript/clamp from runner.ts to post-loop.ts (292 < 300 lines) * fix(evolution): support OAuth tokens for LLM judge auth resolveJudgeMode() and judge client now check ANTHROPIC_AUTH_TOKEN and CLAUDE_CODE_OAUTH_TOKEN in addition to ANTHROPIC_API_KEY. Enables LLM judges on Max subscription deployments using OAuth bearer tokens. * docs: add phantom_loop documentation for upstream PR Covers MCP tool parameters, state file contract, tick lifecycle, Slack integration, mid-loop critique, post-loop evolution pipeline, memory context injection, and tips for writing effective goals. Closes #12 * fix(test): stabilize trigger-auth and judge-activation tests for CI trigger-auth: use inline Bun.serve instead of startServer to avoid module-level globals and disk I/O that can race across test files. judge-activation: save/restore ANTHROPIC_AUTH_TOKEN and CLAUDE_CODE_OAUTH_TOKEN alongside ANTHROPIC_API_KEY so tests that expect "no credentials" actually clear all auth env vars. --------- Co-authored-by: electronicBlacksmith <electronicBlacksmith@users.noreply.github.com>
electronicBlacksmith
added a commit
that referenced
this pull request
Apr 8, 2026
* feat(loop): integrate evolution, memory, and mid-loop critique into loop ticks Loop ticks now use Phantom's full intelligence stack instead of running blind: Phase 1 - Memory context injection: cached once at loop start from the goal, injected into every tick prompt via TickPromptOptions. Cleared on finalize, rebuilt on resume. Phase 2 - Post-loop evolution and consolidation: bounded transcript accumulation (first tick + rolling 10 summaries + last tick), SessionData synthesis in finalize(), fire-and-forget evolution pipeline and LLM/heuristic memory consolidation with cost-cap guards matching the interactive path. Phase 3 - Mid-loop critique checkpoints: optional checkpoint_interval param lets the agent request Sonnet 4.6 review every N ticks. Guard requires evolution enabled, LLM judges active, and cost cap not exceeded. Critique is awaited before next tick to avoid race conditions. Closes #8 * fix(loop): address code review findings from PR #9 - Decouple postLoopDeps so evolution and memory run independently (evolution works when memory is down and vice versa) - Skip mid-loop critique on terminal ticks to avoid wasted Sonnet calls - Track judge cost on failure paths via JudgeParseError carrying usage data - Extract recordTranscript/clamp from runner.ts to post-loop.ts (292 < 300 lines) * fix(evolution): support OAuth tokens for LLM judge auth resolveJudgeMode() and judge client now check ANTHROPIC_AUTH_TOKEN and CLAUDE_CODE_OAUTH_TOKEN in addition to ANTHROPIC_API_KEY. Enables LLM judges on Max subscription deployments using OAuth bearer tokens. * docs: add phantom_loop documentation for upstream PR Covers MCP tool parameters, state file contract, tick lifecycle, Slack integration, mid-loop critique, post-loop evolution pipeline, memory context injection, and tips for writing effective goals. Closes #12 * fix(test): stabilize trigger-auth and judge-activation tests for CI trigger-auth: use inline Bun.serve instead of startServer to avoid module-level globals and disk I/O that can race across test files. judge-activation: save/restore ANTHROPIC_AUTH_TOKEN and CLAUDE_CODE_OAUTH_TOKEN alongside ANTHROPIC_API_KEY so tests that expect "no credentials" actually clear all auth env vars. --------- Co-authored-by: electronicBlacksmith <electronicBlacksmith@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
checkpoint_intervaltriggers Sonnet 4.6 review every N ticks. Guarded by judge availability and cost cap. Awaited before next tick to prevent race conditions.New files:
src/loop/critique.ts,src/loop/post-loop.ts, and 3 test files.Test plan
Closes #8