From 54a872a59d6bb5b12fbdccab28afefcaca41a32f Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Thu, 12 Mar 2026 23:29:42 -0700
Subject: [PATCH 01/18] feat(skills): add ce:compound-refresh skill for
 learning and pattern maintenance

Adds a new skill that reviews existing docs/solutions/ learnings against the
current codebase and decides whether to keep, update, replace, or archive them.
Also enhances ce:compound with Phase 2.5 selective refresh checks.

Co-Authored-By: Claude <noreply@anthropic.com>
---
 plugins/compound-engineering/README.md        |   3 +-
 .../skills/ce-compound-refresh/SKILL.md       | 380 ++++++++++++++++++
 .../skills/ce-compound/SKILL.md               |  52 ++-
 3 files changed, 433 insertions(+), 2 deletions(-)
 create mode 100644 plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md

diff --git a/plugins/compound-engineering/README.md b/plugins/compound-engineering/README.md
index f41f5777..a574b37d 100644
--- a/plugins/compound-engineering/README.md
+++ b/plugins/compound-engineering/README.md
@@ -7,7 +7,7 @@ AI-powered development tools that get smarter with every use. Make each unit of
 | Component | Count |
 |-----------|-------|
 | Agents | 28 |
-| Commands | 22 |
+| Commands | 23 |
 | Skills | 20 |
 | MCP Servers | 1 |
 
@@ -81,6 +81,7 @@ Core workflow commands use `ce:` prefix to unambiguously identify them as compou
 | `/ce:review` | Run comprehensive code reviews |
 | `/ce:work` | Execute work items systematically |
 | `/ce:compound` | Document solved problems to compound team knowledge |
+| `/ce:compound-refresh` | Refresh stale or drifting learnings and decide whether to keep, update, replace, or archive them |
 
 > **Deprecated aliases:** `/workflows:plan`, `/workflows:work`, `/workflows:review`, `/workflows:brainstorm`, `/workflows:compound` still work but show a deprecation warning. Use `ce:*` equivalents.
 
diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
new file mode 100644
index 00000000..0de631ce
--- /dev/null
+++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
@@ -0,0 +1,380 @@
+---
+name: ce:compound-refresh
+description: Refresh stale or drifting learnings and pattern docs in docs/solutions/ by reviewing, updating, replacing, or archiving them against the current codebase. Use after refactors, migrations, dependency upgrades, or when a retrieved learning feels outdated or wrong. Also use when reviewing docs/solutions/ for accuracy, when a recently solved problem contradicts an existing learning, or when pattern docs no longer reflect current code.
+argument-hint: "[optional: scope hint]"
+disable-model-invocation: true
+---
+
+# Compound Refresh
+
+Maintain the quality of `docs/solutions/` over time. This workflow reviews existing learnings against the current codebase, then refreshes any derived pattern docs that depend on them.
+
+## Interaction Principles
+
+Follow the same interaction style as `ce:brainstorm`:
+
+- Ask questions **one at a time**
+- Prefer **multiple choice** when natural options exist
+- Start with **scope and intent**, then narrow only when needed
+- Do **not** ask the user to make decisions before you have evidence
+- Lead with a recommendation and explain it briefly
+
+The goal is not to force the user through a checklist. The goal is to help them make a good maintenance decision with the smallest amount of friction.
+
+## Refresh Order
+
+Refresh in this order:
+
+1. Review the relevant individual learning docs first
+2. Note which learnings stayed valid, were updated, were replaced, or were archived
+3. Then review any pattern docs that depend on those learnings
+
+Why this order:
+
+- learning docs are the primary evidence
+- pattern docs are derived from one or more learnings
+- stale learnings can make a pattern look more valid than it really is
+
+If the user starts by naming a pattern doc, you may begin there to understand the concern, but inspect the supporting learning docs before changing the pattern.
+
+## Maintenance Model
+
+For each candidate artifact, classify it into one of four outcomes:
+
+| Outcome | Meaning | Default action |
+|---------|---------|----------------|
+| **Keep** | Still accurate and still useful | No file edit by default; report that it was reviewed and remains trustworthy |
+| **Update** | Core solution is still correct, but references drifted | Apply evidence-backed in-place edits |
+| **Replace** | The old artifact is now misleading, but there is a known better replacement | Create a trustworthy successor or revised pattern, then mark/archive the old artifact as needed |
+| **Archive** | No longer useful or applicable | Move the obsolete artifact to `docs/solutions/_archived/` with archive metadata when appropriate |
+
+## Core Rules
+
+1. **Evidence informs judgment.** The signals below are inputs, not a mechanical scorecard. Use engineering judgment to decide whether the artifact is still trustworthy.
+2. **Prefer no-write Keep.** Do not update a doc just to leave a review breadcrumb.
+3. **Match docs to reality, not the reverse.** When current code differs from a learning, update the learning to reflect the current code. The skill's job is doc accuracy, not code review — do not ask the user whether code changes were "intentional" or "a regression." If the code changed, the doc should match. If the user thinks the code is wrong, that is a separate concern outside this workflow.
+4. **Be decisive, minimize questions.** When evidence is clear (file renamed, class moved, reference broken), apply the update. Only ask the user when the right maintenance action is genuinely ambiguous — not to confirm obvious fixes. The goal is automated maintenance with human oversight on judgment calls, not a question for every finding.
+5. **Avoid low-value churn.** Do not edit a doc just to fix a typo, polish wording, or make cosmetic changes that do not materially improve accuracy or usability.
+6. **Use Update only for meaningful, evidence-backed drift.** Paths, module names, related links, category metadata, code snippets, and clearly stale wording are fair game when fixing them materially improves accuracy.
+7. **Use Replace only when there is a real replacement.** That means either:
+   - the current conversation contains a recently solved, verified replacement fix, or
+   - the user provides enough concrete replacement context to document the successor honestly, or
+   - newer docs, pattern docs, PRs, or issues provide strong successor evidence.
+8. **Archive when the code is gone.** If the referenced code, controller, or workflow no longer exists in the codebase and no successor can be found, recommend Archive — don't default to Keep just because the general advice is still "sound." A learning about a deleted feature misleads readers into thinking that feature still exists. When in doubt between Keep and Archive, ask the user — but missing referenced files with no matching code is strong Archive evidence, not a reason to Keep with "medium confidence."
+
+## Scope Selection
+
+Start by discovering learnings and pattern docs under `docs/solutions/`.
+
+Exclude:
+
+- `README.md`
+- `docs/solutions/_archived/`
+
+Find all `.md` files under `docs/solutions/`, excluding `README.md` files and anything under `_archived/`.
+
+If `$ARGUMENTS` is provided, use it to narrow scope before proceeding. Try these matching strategies in order, stopping at the first that produces results:
+
+1. **Directory match** — check if the argument matches a subdirectory name under `docs/solutions/` (e.g., `performance-issues`, `database-issues`)
+2. **Frontmatter match** — search `module`, `component`, or `tags` fields in learning frontmatter for the argument
+3. **Filename match** — match against filenames (partial matches are fine)
+4. **Content search** — search file contents for the argument as a keyword (useful for feature names or feature areas)
+
+If no matches are found, report that and ask the user to clarify.
+
+If no candidate docs are found, report:
+
+```text
+No candidate docs found in docs/solutions/.
+Run `ce:compound` after solving problems to start building your knowledge base.
+```
+
+## Phase 0: Assess and Route
+
+Before asking the user to classify anything:
+
+1. Discover candidate artifacts
+2. Estimate scope
+3. Choose the lightest interaction path that fits
+
+### Route by Scope
+
+| Scope | When to use it | Interaction style |
+|-------|----------------|-------------------|
+| **Focused** | 1-2 likely files or user named a specific doc | Investigate directly, then present a recommendation |
+| **Batch** | 3-8 mostly independent docs | Investigate first, then present grouped recommendations |
+| **Broad** | Large, ambiguous, or repo-wide stale-doc sweep | Ask one narrowing question before deep investigation |
+
+If scope is broad or ambiguous, ask one question to narrow it before scanning deeply. Prefer multiple choice when possible:
+
+```text
+I found a broad refresh scope. Which area should we review first?
+
+1. A specific file
+2. A category or module
+3. Pattern docs first
+4. Everything in scope
+```
+
+Do not ask action-selection questions yet. First gather evidence.
+
+## Phase 1: Investigate Candidate Learnings
+
+For each learning in scope, read it, cross-reference its claims against the current codebase, and form a recommendation.
+
+A learning has several dimensions that can independently go stale. Surface-level checks catch the obvious drift, but staleness often hides deeper:
+
+- **References** — do the file paths, class names, and modules it mentions still exist or have they moved?
+- **Recommended solution** — does the fix still match how the code actually works today? A renamed file with a completely different implementation pattern is not just a path update.
+- **Code examples** — if the learning includes code snippets, do they still reflect the current implementation?
+- **Related docs** — are cross-referenced learnings and patterns still present and consistent?
+
+Match investigation depth to the learning's specificity — a learning referencing exact file paths and code snippets needs more verification than one describing a general principle.
+
+Three judgment guidelines that are easy to get wrong:
+
+1. **Contradiction = strong Replace signal.** If the learning's recommendation conflicts with current code patterns or a recently verified fix, that is not a minor drift — the learning is actively misleading.
+2. **Age alone is not a stale signal.** A 2-year-old learning that still matches current code is fine. Only use age as a prompt to inspect more carefully.
+3. **Check for successors before archiving.** Before recommending Replace or Archive, look for newer learnings, pattern docs, PRs, or issues covering the same problem space. If successor evidence exists, prefer Replace over Archive so readers are directed to the newer guidance.
+
+## Phase 1.5: Investigate Pattern Docs
+
+After reviewing the underlying learning docs, investigate any relevant pattern docs under `docs/solutions/patterns/`.
+
+Pattern docs are high-leverage — a stale pattern is more dangerous than a stale individual learning because future work may treat it as broadly applicable guidance. Evaluate whether the generalized rule still holds given the refreshed state of the learnings it depends on.
+
+A pattern doc with no clear supporting learnings is a stale signal — investigate carefully before keeping it unchanged.
+
+## Subagent Strategy
+
+Use subagents for context isolation when investigating multiple artifacts — not just because the task sounds complex. Choose the lightest approach that fits:
+
+| Approach | When to use |
+|----------|-------------|
+| **Main thread only** | Small scope, short docs |
+| **Sequential subagents** | 1-2 artifacts with many supporting files to read |
+| **Parallel subagents** | 3+ truly independent artifacts with low overlap |
+| **Batched subagents** | Broad sweeps — narrow scope first, then investigate in batches |
+
+Subagents are **read-only investigators**. They must not edit files, create successors, or archive anything. Each returns: file path, evidence, recommended action, confidence, and open questions.
+
+The orchestrator merges results, detects contradictions, asks the user questions, and performs all edits centrally. If two artifacts overlap or discuss the same root issue, investigate them together rather than parallelizing.
+
+## Phase 2: Classify the Right Maintenance Action
+
+After gathering evidence, assign one recommended action.
+
+### Keep
+
+The learning is still accurate and useful. Do not edit the file — report that it was reviewed and remains trustworthy. Only add `last_refreshed` if you are already making a meaningful update for another reason.
+
+### Update
+
+The core solution is still valid but references have drifted (paths, class names, links, code snippets, metadata). Apply the fixes directly.
+
+### Replace
+
+Choose **Replace** when the learning's core guidance is now misleading — the recommended fix changed materially, the root cause or architecture shifted, or the preferred pattern is different.
+
+Replace requires real replacement context. Investigate before asking the user — they may have invoked the refresh months after the original learning was written and not have this context themselves.
+
+**Investigation order:**
+
+1. Check if the current conversation already contains replacement context (e.g., user just solved the problem differently)
+2. If not, spawn a read-only subagent to investigate deeper — git history, related PRs, newer learnings, current code patterns — to find what replaced the old approach. Use a subagent to protect the main session context window from the volume of evidence.
+3. If the conversation or codebase provides sufficient replacement context → proceed:
+   - Create a successor learning through `ce:compound`
+   - Add `superseded_by` metadata to the old learning
+   - Move the old learning to `docs/solutions/_archived/`
+4. If replacement context is insufficient → do **not** force Replace. Mark the learning as stale in place so readers know not to rely on it:
+   - Add `status: stale`, `stale_reason`, and `stale_date` to the frontmatter
+   - Report to the user what you found and suggest they come back with `ce:compound` after solving the problem fresh
+
+Only ask the user for replacement context if they clearly have it (e.g., they mentioned a recent migration or refactor). Do not default to asking — default to investigating.
+
+### Archive
+
+Choose **Archive** when:
+
+- The code or workflow no longer exists
+- The learning is obsolete and has no modern replacement worth documenting
+- The learning is redundant and no longer useful on its own
+- There is no meaningful successor evidence suggesting it should be replaced instead
+
+Action:
+
+- Move the file to `docs/solutions/_archived/`, preserving directory structure when helpful
+- Add:
+  - `archived_date: YYYY-MM-DD`
+  - `archive_reason: [why it was archived]`
+
+Auto-archive when evidence is unambiguous:
+
+- the referenced code, controller, or workflow is gone and no successor exists in the codebase
+- the learning is fully superseded by a clearly better successor
+- the document is plainly redundant and adds no distinct value
+
+Do not keep a learning just because its general advice is "still sound" — if the specific code it references is gone, the learning misleads readers. Archive it.
+
+If there is a clearly better successor, strongly consider **Replace** before **Archive** so the old artifact points readers toward the newer guidance.
+
+## Pattern Guidance
+
+Apply the same four outcomes (Keep, Update, Replace, Archive) to pattern docs, but evaluate them as **derived guidance** rather than incident-level learnings. Key differences:
+
+- **Keep**: the underlying learnings still support the generalized rule and examples remain representative
+- **Update**: the rule holds but examples, links, scope, or supporting references drifted
+- **Replace**: the generalized rule is now misleading, or the underlying learnings support a different synthesis. Base the replacement on the refreshed learning set — do not invent new rules from guesswork
+- **Archive**: the pattern is no longer valid, no longer recurring, or fully subsumed by a stronger pattern doc
+
+If "archive" feels too strong but the pattern should no longer be elevated, reduce its prominence in place if the docs structure supports that.
+
+## Phase 3: Ask for Decisions
+
+Most Updates should be applied directly without asking. Only ask the user when:
+
+- The right action is genuinely ambiguous (Update vs Replace vs Archive)
+- You are about to Archive a document
+- You are about to create a successor via `ce:compound`
+
+Do **not** ask questions about whether code changes were intentional, whether the user wants to fix bugs in the code, or other concerns outside doc maintenance. Stay in your lane — doc accuracy.
+
+### Question Style
+
+Use the **AskUserQuestion tool** when available.
+
+If the environment does not support interactive prompts, present numbered options in plain text and wait for the user's response before proceeding.
+
+Question rules:
+
+- Ask **one question at a time**
+- Prefer **multiple choice**
+- Lead with the **recommended option**
+- Explain the rationale for the recommendation in one concise sentence
+- Avoid asking the user to choose from actions that are not actually plausible
+
+### Focused Scope
+
+For a single artifact, present:
+
+- file path
+- 2-4 bullets of evidence
+- recommended action
+
+Then ask:
+
+```text
+This [learning/pattern] looks like a [Update/Keep/Replace/Archive].
+
+Why: [one-sentence rationale based on the evidence]
+
+What would you like to do?
+
+1. [Recommended action]
+2. [Second plausible action]
+3. Skip for now
+```
+
+Do not list all four actions unless all four are genuinely plausible.
+
+### Batch Scope
+
+For several learnings:
+
+1. Group obvious **Keep** cases together
+2. Group obvious **Update** cases together when the fixes are straightforward
+3. Present **Replace** cases individually or in very small groups
+4. Present **Archive** cases individually unless they are strong auto-archive candidates
+
+Ask for confirmation in stages:
+
+1. Confirm grouped Keep/Update recommendations
+2. Then handle Replace one at a time
+3. Then handle Archive one at a time unless the archive is unambiguous and safe to auto-apply
+
+### Broad Scope
+
+If the user asked for a sweeping refresh, keep the interaction incremental:
+
+1. Narrow scope first
+2. Investigate a manageable batch
+3. Present recommendations
+4. Ask whether to continue to the next batch
+
+Do not front-load the user with a full maintenance queue.
+
+## Phase 4: Execute the Chosen Action
+
+### Keep Flow
+
+No file edit by default. Summarize why the learning remains trustworthy.
+
+### Update Flow
+
+Apply in-place edits only when the solution is still substantively correct.
+
+Examples of valid in-place updates:
+
+- Rename `app/models/auth_token.rb` reference to `app/models/session_token.rb`
+- Update `module: AuthToken` to `module: SessionToken`
+- Fix outdated links to related docs
+- Refresh implementation notes after a directory move
+
+Examples that should **not** be in-place updates:
+
+- Fixing a typo with no effect on understanding
+- Rewording prose for style alone
+- Small cleanup that does not materially improve accuracy or usability
+- The old fix is now an anti-pattern
+- The system architecture changed enough that the old guidance is misleading
+- The troubleshooting path is materially different
+
+Those cases require **Replace**, not Update.
+
+### Replace Flow
+
+Follow the investigation order defined in Phase 2's Replace section. The key principle: exhaust codebase investigation before asking the user for context they may not have.
+
+If replacement context is found and sufficient:
+
+1. Run `ce:compound` with a short context summary for the replacement learning
+2. Create the new learning
+3. Update the old doc with `superseded_by`
+4. Move the old doc to `docs/solutions/_archived/`
+
+If replacement context is insufficient, mark the learning as stale in place:
+
+1. Add to frontmatter: `status: stale`, `stale_reason: [what you found]`, `stale_date: YYYY-MM-DD`
+2. Report to the user what evidence you found and what's missing
+3. Suggest they revisit with `ce:compound` after solving the problem fresh
+
+### Archive Flow
+
+Archive only when a learning is clearly obsolete or redundant. Do not archive a document just because it is old.
+
+## Output Format
+
+After processing the selected scope, report:
+
+```text
+Compound Refresh Summary
+========================
+Scanned: N learnings
+
+Kept: X
+Updated: Y
+Replaced: Z
+Archived: W
+Skipped: V
+```
+
+Then list the affected files and what changed.
+
+For **Keep** outcomes, list them under a reviewed-without-edits section so the result is visible without creating git churn.
+
+## Relationship to ce:compound
+
+- `ce:compound` captures a newly solved, verified problem
+- `ce:compound-refresh` maintains older learnings as the codebase evolves
+
+Use **Replace** only when the refresh process has enough real replacement context to hand off honestly into `ce:compound`.
diff --git a/plugins/compound-engineering/skills/ce-compound/SKILL.md b/plugins/compound-engineering/skills/ce-compound/SKILL.md
index ca94c50f..98ef7b34 100644
--- a/plugins/compound-engineering/skills/ce-compound/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-compound/SKILL.md
@@ -89,7 +89,8 @@ Launch these subagents IN PARALLEL. Each returns text data to the orchestrator.
    - Searches `docs/solutions/` for related documentation
    - Identifies cross-references and links
    - Finds related GitHub issues
-   - Returns: Links and relationships
+   - Flags any related learning or pattern docs that may now be stale, contradicted, or overly broad
+   - Returns: Links, relationships, and any refresh candidates
 
 #### 4. **Prevention Strategist**
    - Develops prevention strategies
@@ -121,6 +122,53 @@ The orchestrating agent (main conversation) performs these steps:
 
 </sequential_tasks>
 
+### Phase 2.5: Selective Refresh Check
+
+After writing the new learning, decide whether this new solution is evidence that older docs should be refreshed.
+
+`ce:compound-refresh` is **not** a default follow-up. Use it selectively when the new learning suggests an older learning or pattern doc may now be inaccurate.
+
+It makes sense to invoke `ce:compound-refresh` when one or more of these are true:
+
+1. A related learning or pattern doc recommends an approach that the new fix now contradicts
+2. The new fix clearly supersedes an older documented solution
+3. The current work involved a refactor, migration, rename, or dependency upgrade that likely invalidated references in older docs
+4. A pattern doc now looks overly broad, outdated, or no longer supported by the refreshed reality
+5. The Related Docs Finder surfaced high-confidence refresh candidates in the same problem space
+
+It does **not** make sense to invoke `ce:compound-refresh` when:
+
+1. No related docs were found
+2. Related docs still appear consistent with the new learning
+3. The overlap is superficial and does not change prior guidance
+4. Refresh would require a broad historical review with weak evidence
+
+Use these rules:
+
+- If there is **one obvious stale candidate**, invoke `ce:compound-refresh` with a narrow scope hint after the new learning is written
+- If there are **multiple candidates in the same area**, ask the user whether to run a targeted refresh for that module, category, or pattern set
+- If context is already tight or you are in compact-safe mode, do not expand into a broad refresh automatically; instead recommend `ce:compound-refresh` as the next step with a scope hint
+
+When invoking or recommending `ce:compound-refresh`, be explicit about the argument to pass. Prefer the narrowest useful scope:
+
+- **Specific file** when one learning or pattern doc is the likely stale artifact
+- **Module or component name** when several related docs may need review
+- **Category name** when the drift is concentrated in one solutions area
+- **Pattern filename or pattern topic** when the stale guidance lives in `docs/solutions/patterns/`
+
+Examples:
+
+- `/ce:compound-refresh plugin-versioning-requirements`
+- `/ce:compound-refresh payments`
+- `/ce:compound-refresh performance-issues`
+- `/ce:compound-refresh critical-patterns`
+
+A single scope hint may still expand to multiple related docs when the change is cross-cutting within one domain, category, or pattern area.
+
+Do not invoke `ce:compound-refresh` without an argument unless the user explicitly wants a broad sweep.
+
+Always capture the new learning first. Refresh is a targeted maintenance follow-up, not a prerequisite for documentation.
+
 ### Phase 3: Optional Enhancement
 
 **WAIT for Phase 2 to complete before proceeding.**
@@ -173,6 +221,8 @@ re-run /compound in a fresh session.
 
 **No subagents are launched. No parallel tasks. One file written.**
 
+In compact-safe mode, only suggest `ce:compound-refresh` if there is an obvious narrow refresh target. Do not broaden into a large refresh sweep from a compact-safe session.
+
 ---
 
 ## What It Captures

From 816a17992d77d04b0712639d6c583ba3e6aca5cc Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Fri, 13 Mar 2026 12:07:16 -0700
Subject: [PATCH 02/18] fix(skills): improve ce:compound-refresh interaction
 and auto-archive behavior

- Use platform-agnostic interactive question tool phrasing with examples
  for Claude Code and Codex instead of hardcoding AskUserQuestion
- Fix contradiction between Phase 2 auto-archive criteria and Phase 3
  always-ask-before-archive rule so unambiguous archives proceed without
  unnecessary user prompts
---
 .../skills/ce-compound-refresh/SKILL.md                   | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
index 0de631ce..61644fe8 100644
--- a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
@@ -13,7 +13,7 @@ Maintain the quality of `docs/solutions/` over time. This workflow reviews exist
 
 Follow the same interaction style as `ce:brainstorm`:
 
-- Ask questions **one at a time**
+- Ask questions **one at a time** — use the platform's interactive question tool (e.g. `AskUserQuestion` in Claude Code, `request_user_input` in Codex) and **stop to wait for the answer** before continuing
 - Prefer **multiple choice** when natural options exist
 - Start with **scope and intent**, then narrow only when needed
 - Do **not** ask the user to make decisions before you have evidence
@@ -234,16 +234,14 @@ If "archive" feels too strong but the pattern should no longer be elevated, redu
 Most Updates should be applied directly without asking. Only ask the user when:
 
 - The right action is genuinely ambiguous (Update vs Replace vs Archive)
-- You are about to Archive a document
+- You are about to Archive a document **and** the evidence is not unambiguous (see auto-archive criteria in Phase 2). When auto-archive criteria are met, proceed without asking.
 - You are about to create a successor via `ce:compound`
 
 Do **not** ask questions about whether code changes were intentional, whether the user wants to fix bugs in the code, or other concerns outside doc maintenance. Stay in your lane — doc accuracy.
 
 ### Question Style
 
-Use the **AskUserQuestion tool** when available.
-
-If the environment does not support interactive prompts, present numbered options in plain text and wait for the user's response before proceeding.
+Always present choices using the platform's interactive question tool (e.g. `AskUserQuestion` in Claude Code, `request_user_input` in Codex). If the environment has no interactive prompt tool, present numbered options in plain text and wait for the user's response before proceeding.
 
 Question rules:
 

From f3d4f48a54548437293308117e1dfe9d71bb46b6 Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Fri, 13 Mar 2026 12:17:26 -0700
Subject: [PATCH 03/18] fix(skills): steer compound-refresh subagents toward
 file tools over shell commands

Avoids unnecessary permission prompts during investigation by
preferring dedicated file search and read tools instead of bash.
---
 .../compound-engineering/skills/ce-compound-refresh/SKILL.md    | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
index 61644fe8..2fae9638 100644
--- a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
@@ -156,7 +156,7 @@ Use subagents for context isolation when investigating multiple artifacts — no
 | **Parallel subagents** | 3+ truly independent artifacts with low overlap |
 | **Batched subagents** | Broad sweeps — narrow scope first, then investigate in batches |
 
-Subagents are **read-only investigators**. They must not edit files, create successors, or archive anything. Each returns: file path, evidence, recommended action, confidence, and open questions.
+Subagents are **read-only investigators**. They must not edit files, create successors, or archive anything. Each returns: file path, evidence, recommended action, confidence, and open questions. Subagents should use dedicated file search and read tools for investigation — not shell commands. This avoids unnecessary permission prompts and is more reliable across platforms.
 
 The orchestrator merges results, detects contradictions, asks the user questions, and performs all edits centrally. If two artifacts overlap or discuss the same root issue, investigate them together rather than parallelizing.
 

From 4cd08ad3282996a39a1fc816f6b258f19895bf37 Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Fri, 13 Mar 2026 12:37:18 -0700
Subject: [PATCH 04/18] feat(skills): add smart triage, drift classification,
 and replacement subagents to ce:compound-refresh

- Broad scope triage: inventory + impact clustering + spot-check drift
  for 9+ docs, recommends highest-impact area instead of blind ask
- Drift classification: sharp boundary between Update (fix references
  in-skill) and Replace (subagent writes successor learning)
- Replacement subagents: sequential subagents write new learnings using
  ce:compound's document format with investigation evidence already
  gathered, avoiding redundant research
- Stale fallback: when evidence is insufficient for a confident
  replacement, mark as stale and recommend ce:compound later
---
 .../skills/ce-compound-refresh/SKILL.md       | 97 ++++++++++++-------
 1 file changed, 63 insertions(+), 34 deletions(-)

diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
index 2fae9638..b552fbcb 100644
--- a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
@@ -102,18 +102,30 @@ Before asking the user to classify anything:
 | Scope | When to use it | Interaction style |
 |-------|----------------|-------------------|
 | **Focused** | 1-2 likely files or user named a specific doc | Investigate directly, then present a recommendation |
-| **Batch** | 3-8 mostly independent docs | Investigate first, then present grouped recommendations |
-| **Broad** | Large, ambiguous, or repo-wide stale-doc sweep | Ask one narrowing question before deep investigation |
+| **Batch** | Up to ~8 mostly independent docs | Investigate first, then present grouped recommendations |
+| **Broad** | 9+ docs, ambiguous, or repo-wide stale-doc sweep | Triage first, then investigate in batches |
 
-If scope is broad or ambiguous, ask one question to narrow it before scanning deeply. Prefer multiple choice when possible:
+### Broad Scope Triage
+
+When scope is broad (9+ candidate docs), do a lightweight triage before deep investigation:
+
+1. **Inventory** — read frontmatter of all candidate docs, group by module/component/category
+2. **Impact clustering** — identify areas with the densest clusters of learnings + pattern docs. A cluster of 5 learnings and 2 patterns covering the same module is higher-impact than 5 isolated single-doc areas, because staleness in one doc is likely to affect the others.
+3. **Spot-check drift** — for each cluster, check whether the primary referenced files still exist. Missing references in a high-impact cluster = strongest signal for where to start.
+4. **Recommend a starting area** — present the highest-impact cluster with a brief rationale and ask the user to confirm or redirect.
+
+Example:
 
 ```text
-I found a broad refresh scope. Which area should we review first?
+Found 24 learnings across 5 areas.
 
-1. A specific file
-2. A category or module
-3. Pattern docs first
-4. Everything in scope
+The auth module has 5 learnings and 2 pattern docs that cross-reference
+each other — and 3 of those reference files that no longer exist.
+I'd start there.
+
+1. Start with auth (recommended)
+2. Pick a different area
+3. Review everything
 ```
 
 Do not ask action-selection questions yet. First gather evidence.
@@ -131,9 +143,20 @@ A learning has several dimensions that can independently go stale. Surface-level
 
 Match investigation depth to the learning's specificity — a learning referencing exact file paths and code snippets needs more verification than one describing a general principle.
 
-Three judgment guidelines that are easy to get wrong:
+### Drift Classification: Update vs Replace
+
+The critical distinction is whether the drift is **cosmetic** (references moved but the solution is the same) or **substantive** (the solution itself changed):
+
+- **Update territory** — file paths moved, classes renamed, links broke, metadata drifted, but the core recommended approach is still how the code works. `ce:compound-refresh` fixes these directly.
+- **Replace territory** — the recommended solution conflicts with current code, the architectural approach changed, or the pattern is no longer the preferred way. This means a new learning needs to be written. A replacement subagent writes the successor following `ce:compound`'s document format (frontmatter, problem, root cause, solution, prevention), using the investigation evidence already gathered. The orchestrator does not rewrite learnings inline — it delegates to a subagent for context isolation.
 
-1. **Contradiction = strong Replace signal.** If the learning's recommendation conflicts with current code patterns or a recently verified fix, that is not a minor drift — the learning is actively misleading.
+**The boundary:** if you find yourself rewriting the solution section or changing what the learning recommends, stop — that is Replace, not Update.
+
+### Judgment Guidelines
+
+Three guidelines that are easy to get wrong:
+
+1. **Contradiction = strong Replace signal.** If the learning's recommendation conflicts with current code patterns or a recently verified fix, that is not a minor drift — the learning is actively misleading. Classify as Replace.
 2. **Age alone is not a stale signal.** A 2-year-old learning that still matches current code is fine. Only use age as a prompt to inspect more carefully.
 3. **Check for successors before archiving.** Before recommending Replace or Archive, look for newer learnings, pattern docs, PRs, or issues covering the same problem space. If successor evidence exists, prefer Replace over Archive so readers are directed to the newer guidance.
 
@@ -156,9 +179,14 @@ Use subagents for context isolation when investigating multiple artifacts — no
 | **Parallel subagents** | 3+ truly independent artifacts with low overlap |
 | **Batched subagents** | Broad sweeps — narrow scope first, then investigate in batches |
 
-Subagents are **read-only investigators**. They must not edit files, create successors, or archive anything. Each returns: file path, evidence, recommended action, confidence, and open questions. Subagents should use dedicated file search and read tools for investigation — not shell commands. This avoids unnecessary permission prompts and is more reliable across platforms.
+Subagents should use dedicated file search and read tools for investigation — not shell commands. This avoids unnecessary permission prompts and is more reliable across platforms.
+
+There are two subagent roles:
+
+1. **Investigation subagents** — read-only. They must not edit files, create successors, or archive anything. Each returns: file path, evidence, recommended action, confidence, and open questions. These can run in parallel when artifacts are independent.
+2. **Replacement subagents** — write a single new learning to replace a stale one. These run **one at a time, sequentially** (each replacement subagent may need to read significant code, and running multiple in parallel risks context exhaustion). The orchestrator handles all archival and metadata updates after each replacement completes.
 
-The orchestrator merges results, detects contradictions, asks the user questions, and performs all edits centrally. If two artifacts overlap or discuss the same root issue, investigate them together rather than parallelizing.
+The orchestrator merges investigation results, detects contradictions, asks the user questions, coordinates replacement subagents, and performs all archival/metadata edits centrally. If two artifacts overlap or discuss the same root issue, investigate them together rather than parallelizing.
 
 ## Phase 2: Classify the Right Maintenance Action
 
@@ -176,21 +204,17 @@ The core solution is still valid but references have drifted (paths, class names
 
 Choose **Replace** when the learning's core guidance is now misleading — the recommended fix changed materially, the root cause or architecture shifted, or the preferred pattern is different.
 
-Replace requires real replacement context. Investigate before asking the user — they may have invoked the refresh months after the original learning was written and not have this context themselves.
+The user may have invoked the refresh months after the original learning was written. Do not ask them for replacement context they are unlikely to have — use agent intelligence to investigate the codebase and synthesize the replacement.
 
-**Investigation order:**
+**Evidence assessment:**
 
-1. Check if the current conversation already contains replacement context (e.g., user just solved the problem differently)
-2. If not, spawn a read-only subagent to investigate deeper — git history, related PRs, newer learnings, current code patterns — to find what replaced the old approach. Use a subagent to protect the main session context window from the volume of evidence.
-3. If the conversation or codebase provides sufficient replacement context → proceed:
-   - Create a successor learning through `ce:compound`
-   - Add `superseded_by` metadata to the old learning
-   - Move the old learning to `docs/solutions/_archived/`
-4. If replacement context is insufficient → do **not** force Replace. Mark the learning as stale in place so readers know not to rely on it:
-   - Add `status: stale`, `stale_reason`, and `stale_date` to the frontmatter
-   - Report to the user what you found and suggest they come back with `ce:compound` after solving the problem fresh
+By the time you identify a Replace candidate, Phase 1 investigation has already gathered significant evidence: the old learning's claims, what the current code actually does, and where the drift occurred. Assess whether this evidence is sufficient to write a trustworthy replacement:
 
-Only ask the user for replacement context if they clearly have it (e.g., they mentioned a recent migration or refactor). Do not default to asking — default to investigating.
+- **Sufficient evidence** — you understand both what the old learning recommended AND what the current approach is. The investigation found the current code patterns, the new file locations, the changed architecture. → Proceed to write the replacement (see Phase 4 Replace Flow).
+- **Insufficient evidence** — the drift is so fundamental that you cannot confidently document the current approach. The entire subsystem was replaced, or the new architecture is too complex to understand from a file scan alone. → Mark as stale in place:
+   - Add `status: stale`, `stale_reason: [what you found]`, `stale_date: YYYY-MM-DD` to the frontmatter
+   - Report what evidence you found and what is missing
+   - Recommend the user run `ce:compound` after their next encounter with that area, when they have fresh problem-solving context
 
 ### Archive
 
@@ -331,20 +355,25 @@ Those cases require **Replace**, not Update.
 
 ### Replace Flow
 
-Follow the investigation order defined in Phase 2's Replace section. The key principle: exhaust codebase investigation before asking the user for context they may not have.
+Process Replace candidates **one at a time, sequentially**. Each replacement is written by a subagent to protect the main context window.
 
-If replacement context is found and sufficient:
+**When evidence is sufficient:**
 
-1. Run `ce:compound` with a short context summary for the replacement learning
-2. Create the new learning
-3. Update the old doc with `superseded_by`
-4. Move the old doc to `docs/solutions/_archived/`
+1. Spawn a single subagent to write the replacement learning. Pass it:
+   - The old learning's full content
+   - A summary of the investigation evidence (what changed, what the current code does, why the old guidance is misleading)
+   - The target path and category (same category as the old learning unless the category itself changed)
+2. The subagent writes the new learning following `ce:compound`'s document format: YAML frontmatter (title, category, date, module, component, tags), problem description, root cause, current solution with code examples, and prevention tips. It should use dedicated file search and read tools if it needs additional context beyond what was passed.
+3. After the subagent completes, the orchestrator:
+   - Adds `superseded_by: [new learning path]` to the old learning's frontmatter
+   - Moves the old learning to `docs/solutions/_archived/`
 
-If replacement context is insufficient, mark the learning as stale in place:
+**When evidence is insufficient:**
 
-1. Add to frontmatter: `status: stale`, `stale_reason: [what you found]`, `stale_date: YYYY-MM-DD`
-2. Report to the user what evidence you found and what's missing
-3. Suggest they revisit with `ce:compound` after solving the problem fresh
+1. Mark the learning as stale in place:
+   - Add to frontmatter: `status: stale`, `stale_reason: [what you found]`, `stale_date: YYYY-MM-DD`
+2. Report what evidence was found and what is missing
+3. Recommend the user run `ce:compound` after their next encounter with that area
 
 ### Archive Flow
 

From f8b591426181a918075c17c1c205ff0a39ab5d96 Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Fri, 13 Mar 2026 12:45:58 -0700
Subject: [PATCH 05/18] docs(solutions): compound learning from
 ce:compound-refresh skill redesign

Documents five skill design patterns discovered during testing:
platform-agnostic tool references, auto-archive consistency,
smart triage for broad scope, replacement subagents over
ce:compound handoff, and file tools over shell commands.
---
 .../compound-refresh-skill-improvements.md    | 127 ++++++++++++++++++
 1 file changed, 127 insertions(+)
 create mode 100644 docs/solutions/skill-design/compound-refresh-skill-improvements.md

diff --git a/docs/solutions/skill-design/compound-refresh-skill-improvements.md b/docs/solutions/skill-design/compound-refresh-skill-improvements.md
new file mode 100644
index 00000000..29a50bf4
--- /dev/null
+++ b/docs/solutions/skill-design/compound-refresh-skill-improvements.md
@@ -0,0 +1,127 @@
+---
+title: "ce:compound-refresh skill redesign for autonomous maintenance without live user context"
+category: skill-design
+date: 2026-03-13
+module: plugins/compound-engineering/skills/ce-compound-refresh
+component: SKILL.md
+tags:
+  - skill-design
+  - compound-refresh
+  - maintenance-workflow
+  - drift-classification
+  - subagent-architecture
+  - platform-agnostic
+severity: medium
+description: "Redesign ce:compound-refresh to handle autonomous drift triage, in-skill replacement via subagents, and smart scoping without relying on live problem-solving context that ce:compound expects."
+related:
+  - docs/solutions/plugin-versioning-requirements.md
+  - https://github.com/EveryInc/compound-engineering-plugin/pull/260
+  - https://github.com/EveryInc/compound-engineering-plugin/issues/204
+  - https://github.com/EveryInc/compound-engineering-plugin/issues/221
+---
+
+## Problem
+
+The initial `ce:compound-refresh` skill had several design issues discovered during real-world testing:
+
+1. Interactive questions never triggered the proper tool (AskUserQuestion) because the instruction used a weak "when available" qualifier
+2. Auto-archive criteria contradicted a "always ask before archiving" rule in a later phase
+3. Broad scope (9+ docs) asked the user to choose an area blindly without providing analysis
+4. The Replace flow tried to hand off to `ce:compound`, which expects fresh problem-solving context the user doesn't have months later
+5. Subagents used shell commands for file existence checks, triggering permission prompts
+
+## Root Cause
+
+Five independent design issues, each with a distinct root cause:
+
+1. **Hardcoded tool name with escape hatch.** Saying "Use AskUserQuestion when available" gave the model permission to skip the tool and just output text. Also non-portable to Codex and other platforms.
+2. **Contradictory rules across phases.** Phase 2 defined auto-archive criteria. Phase 3 said "always ask before archiving" with no exception. The model followed Phase 3.
+3. **Question before evidence.** The skill prompted scope selection before gathering any information about which areas were most stale or interconnected.
+4. **Unsatisfied precondition in cross-skill handoff.** `ce:compound` expects a recently solved problem with fresh context. A maintenance refresh has investigation evidence instead — equivalent data, different shape.
+5. **No tool preference guidance for subagents.** Without explicit instruction, subagents defaulted to bash for file operations.
+
+## Solution
+
+### 1. Platform-agnostic interactive questions
+
+Reference "the platform's interactive question tool" as the concept, with concrete examples:
+
+```markdown
+Ask questions **one at a time** — use the platform's interactive question tool
+(e.g. `AskUserQuestion` in Claude Code, `request_user_input` in Codex) and
+**stop to wait for the answer** before continuing.
+```
+
+The "stop to wait" language removes the escape hatch. The examples help each platform's model select the right tool.
+
+### 2. Auto-archive exemption for unambiguous cases
+
+Phase 3 now defers to Phase 2's auto-archive criteria:
+
+```markdown
+You are about to Archive a document **and** the evidence is not unambiguous
+(see auto-archive criteria in Phase 2). When auto-archive criteria are met,
+proceed without asking.
+```
+
+### 3. Smart triage for broad scope
+
+When 9+ candidate docs are found, triage before asking:
+
+1. **Inventory** — read frontmatter, group by module/component/category
+2. **Impact clustering** — dense clusters of interconnected learnings + pattern docs are higher-impact than isolated docs
+3. **Spot-check drift** — check whether primary referenced files still exist
+4. **Recommend** — present the highest-impact cluster with rationale
+
+Key insight: "code changed recently" is NOT a reliable staleness signal. Missing references in a high-impact cluster is the strongest signal.
+
+### 4. Replacement subagents instead of ce:compound handoff
+
+By the time a Replace is identified, Phase 1 investigation has already gathered the evidence that `ce:compound` would research:
+- The old learning's claims
+- What the current code actually does
+- Where and why the drift occurred
+
+A replacement subagent writes the successor directly using `ce:compound`'s document format (frontmatter, problem, root cause, solution, prevention). Run sequentially — one at a time — because each may read significant code.
+
+When evidence is insufficient (e.g., entire subsystem replaced, new architecture too complex to understand from investigation alone), mark as stale and recommend `ce:compound` after the user's next encounter with that area.
+
+### 5. Dedicated file tools over shell commands
+
+Added to subagent strategy:
+
+```markdown
+Subagents should use dedicated file search and read tools for investigation —
+not shell commands. This avoids unnecessary permission prompts and is more
+reliable across platforms.
+```
+
+## Prevention
+
+### Skill review checklist additions
+
+These five patterns should be checked during any skill review:
+
+1. **No hardcoded tool names** — All tool references use capability-first language with platform examples and a plain-text fallback
+2. **No contradictory rules across phases** — Trace each action type through all phases; verify absolute language ("always," "never") is not contradicted elsewhere
+3. **No blind user questions** — Every question presented to the user is informed by evidence the agent gathered first
+4. **No unsatisfied cross-skill preconditions** — Every skill handoff verifies the target skill's preconditions are met by the calling context
+5. **No shell commands for file operations in subagents** — Subagent instructions explicitly prefer dedicated tools over shell commands
+
+### Key anti-patterns
+
+| Anti-pattern | Better pattern |
+|---|---|
+| "Use the AskUserQuestion tool when available" | "Use the platform's interactive question tool (e.g. AskUserQuestion in Claude Code, request_user_input in Codex)" |
+| Defining auto-archive conditions, then "always ask before archiving" | Single-source-of-truth: define the rule once, reference it elsewhere |
+| "Which area should we review?" before any investigation | Triage first, recommend with evidence, let user confirm or redirect |
+| "Create a successor learning through ce:compound" during a refresh | Replacement subagent writes directly using gathered evidence |
+| No tool guidance for subagents | "Use dedicated file search and read tools, not shell commands" |
+
+## Cross-References
+
+- **PR #260**: The PR containing all these improvements
+- **Issue #204**: Platform-agnostic tool references (AskUserQuestion dependency)
+- **Issue #221**: Motivating issue for maintenance at scale
+- **PR #242**: ce:audit (detection counterpart, closed)
+- **PR #150**: Established subagent context-isolation pattern

From 88735cdb10c8ec360b0d4a9f71f326cd5922579a Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Fri, 13 Mar 2026 12:57:39 -0700
Subject: [PATCH 06/18] feat(skills): add autonomous mode to
 ce:compound-refresh

Support mode:autonomous argument for unattended/scheduled runs.
In autonomous mode: skip all user questions, apply safe actions
directly, mark ambiguous cases as stale with conservative confidence,
and generate a detailed report for after-the-fact human review.
---
 .../compound-refresh-skill-improvements.md    | 14 ++++++
 .../skills/ce-compound-refresh/SKILL.md       | 44 ++++++++++++++++---
 2 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/docs/solutions/skill-design/compound-refresh-skill-improvements.md b/docs/solutions/skill-design/compound-refresh-skill-improvements.md
index 29a50bf4..21f0fab1 100644
--- a/docs/solutions/skill-design/compound-refresh-skill-improvements.md
+++ b/docs/solutions/skill-design/compound-refresh-skill-improvements.md
@@ -29,6 +29,7 @@ The initial `ce:compound-refresh` skill had several design issues discovered dur
 3. Broad scope (9+ docs) asked the user to choose an area blindly without providing analysis
 4. The Replace flow tried to hand off to `ce:compound`, which expects fresh problem-solving context the user doesn't have months later
 5. Subagents used shell commands for file existence checks, triggering permission prompts
+6. No way to run the skill unattended (e.g., on a schedule) — every run required user interaction
 
 ## Root Cause
 
@@ -39,6 +40,7 @@ Five independent design issues, each with a distinct root cause:
 3. **Question before evidence.** The skill prompted scope selection before gathering any information about which areas were most stale or interconnected.
 4. **Unsatisfied precondition in cross-skill handoff.** `ce:compound` expects a recently solved problem with fresh context. A maintenance refresh has investigation evidence instead — equivalent data, different shape.
 5. **No tool preference guidance for subagents.** Without explicit instruction, subagents defaulted to bash for file operations.
+6. **Interactive-only design.** Every phase assumed a user was present. No way to run autonomously for scheduled maintenance or hands-off sweeps.
 
 ## Solution
 
@@ -96,6 +98,16 @@ not shell commands. This avoids unnecessary permission prompts and is more
 reliable across platforms.
 ```
 
+### 6. Autonomous mode for scheduled/unattended runs
+
+Added `mode:autonomous` argument support so the skill can run without user interaction (e.g., on a schedule, in CI, or when the user just wants a hands-off sweep).
+
+Key design decisions:
+- **Explicit opt-in only.** `mode:autonomous` must be in the arguments. Auto-detection based on tool availability was rejected because a user in an interactive agent without a question tool (e.g., Cursor, Windsurf) is still interactive — they just use plain-text replies.
+- **Conservative confidence.** Borderline cases that would get a user question in interactive mode get marked stale in autonomous mode. Err toward stale-marking over incorrect action.
+- **Detailed report as deliverable.** Since no user was present, the output report includes full rationale for each action so a human can review after the fact.
+- **Process everything.** No scope narrowing questions — if no scope hint provided, process all docs. For broad scope, process clusters in impact order without asking.
+
 ## Prevention
 
 ### Skill review checklist additions
@@ -107,6 +119,7 @@ These five patterns should be checked during any skill review:
 3. **No blind user questions** — Every question presented to the user is informed by evidence the agent gathered first
 4. **No unsatisfied cross-skill preconditions** — Every skill handoff verifies the target skill's preconditions are met by the calling context
 5. **No shell commands for file operations in subagents** — Subagent instructions explicitly prefer dedicated tools over shell commands
+6. **Autonomous mode for long-running skills** — Any skill that could run unattended should support an explicit opt-in mode with conservative confidence and detailed reporting
 
 ### Key anti-patterns
 
@@ -117,6 +130,7 @@ These five patterns should be checked during any skill review:
 | "Which area should we review?" before any investigation | Triage first, recommend with evidence, let user confirm or redirect |
 | "Create a successor learning through ce:compound" during a refresh | Replacement subagent writes directly using gathered evidence |
 | No tool guidance for subagents | "Use dedicated file search and read tools, not shell commands" |
+| Auto-detecting "no question tool = headless" | Explicit `mode:autonomous` argument — interactive agents without question tools are still interactive |
 
 ## Cross-References
 
diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
index b552fbcb..69b307d6 100644
--- a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
@@ -1,7 +1,7 @@
 ---
 name: ce:compound-refresh
 description: Refresh stale or drifting learnings and pattern docs in docs/solutions/ by reviewing, updating, replacing, or archiving them against the current codebase. Use after refactors, migrations, dependency upgrades, or when a retrieved learning feels outdated or wrong. Also use when reviewing docs/solutions/ for accuracy, when a recently solved problem contradicts an existing learning, or when pattern docs no longer reflect current code.
-argument-hint: "[optional: scope hint]"
+argument-hint: "[mode:autonomous] [optional: scope hint]"
 disable-model-invocation: true
 ---
 
@@ -9,8 +9,28 @@ disable-model-invocation: true
 
 Maintain the quality of `docs/solutions/` over time. This workflow reviews existing learnings against the current codebase, then refreshes any derived pattern docs that depend on them.
 
+## Mode Detection
+
+Check if `$ARGUMENTS` contains `mode:autonomous`. If present, strip it from arguments (use the remainder as a scope hint) and run in **autonomous mode**.
+
+| Mode | When | Behavior |
+|------|------|----------|
+| **Interactive** (default) | User is present and can answer questions | Ask for decisions on ambiguous cases, confirm actions |
+| **Autonomous** | `mode:autonomous` in arguments | No user interaction. Apply all unambiguous actions (Keep, Update, auto-Archive, Replace with sufficient evidence). Mark ambiguous cases as stale. Generate a summary report at the end. |
+
+### Autonomous mode rules
+
+- **Skip all user questions.** Never pause for input.
+- **Process all docs in scope.** No scope narrowing questions — if no scope hint was provided, process everything.
+- **Apply safe actions directly:** Keep (no-op), Update (fix references), auto-Archive (unambiguous criteria met), Replace (when evidence is sufficient).
+- **Mark as stale when uncertain.** If classification is genuinely ambiguous (Update vs Replace vs Archive) or Replace evidence is insufficient, mark as stale with `status: stale`, `stale_reason`, and `stale_date` in the frontmatter. Do not guess.
+- **Use conservative confidence.** In interactive mode, borderline cases get a user question. In autonomous mode, borderline cases get marked stale. Err toward stale-marking over incorrect action.
+- **Generate a full report.** The output report (see Output Format) lists all actions taken and all items marked stale with reasons, so a human can review the results after the fact.
+
 ## Interaction Principles
 
+**These principles apply to interactive mode only. In autonomous mode, skip all user questions and apply the autonomous mode rules above.**
+
 Follow the same interaction style as `ce:brainstorm`:
 
 - Ask questions **one at a time** — use the platform's interactive question tool (e.g. `AskUserQuestion` in Claude Code, `request_user_input` in Codex) and **stop to wait for the answer** before continuing
@@ -80,7 +100,7 @@ If `$ARGUMENTS` is provided, use it to narrow scope before proceeding. Try these
 3. **Filename match** — match against filenames (partial matches are fine)
 4. **Content search** — search file contents for the argument as a keyword (useful for feature names or feature areas)
 
-If no matches are found, report that and ask the user to clarify.
+If no matches are found, report that and ask the user to clarify. In autonomous mode, report the miss and stop — do not guess at scope.
 
 If no candidate docs are found, report:
 
@@ -112,7 +132,7 @@ When scope is broad (9+ candidate docs), do a lightweight triage before deep inv
 1. **Inventory** — read frontmatter of all candidate docs, group by module/component/category
 2. **Impact clustering** — identify areas with the densest clusters of learnings + pattern docs. A cluster of 5 learnings and 2 patterns covering the same module is higher-impact than 5 isolated single-doc areas, because staleness in one doc is likely to affect the others.
 3. **Spot-check drift** — for each cluster, check whether the primary referenced files still exist. Missing references in a high-impact cluster = strongest signal for where to start.
-4. **Recommend a starting area** — present the highest-impact cluster with a brief rationale and ask the user to confirm or redirect.
+4. **Recommend a starting area** — present the highest-impact cluster with a brief rationale and ask the user to confirm or redirect. In autonomous mode, skip the question and process all clusters in impact order.
 
 Example:
 
@@ -186,7 +206,7 @@ There are two subagent roles:
 1. **Investigation subagents** — read-only. They must not edit files, create successors, or archive anything. Each returns: file path, evidence, recommended action, confidence, and open questions. These can run in parallel when artifacts are independent.
 2. **Replacement subagents** — write a single new learning to replace a stale one. These run **one at a time, sequentially** (each replacement subagent may need to read significant code, and running multiple in parallel risks context exhaustion). The orchestrator handles all archival and metadata updates after each replacement completes.
 
-The orchestrator merges investigation results, detects contradictions, asks the user questions, coordinates replacement subagents, and performs all archival/metadata edits centrally. If two artifacts overlap or discuss the same root issue, investigate them together rather than parallelizing.
+The orchestrator merges investigation results, detects contradictions, coordinates replacement subagents, and performs all archival/metadata edits centrally. In interactive mode, it asks the user questions on ambiguous cases. In autonomous mode, it marks ambiguous cases as stale instead. If two artifacts overlap or discuss the same root issue, investigate them together rather than parallelizing.
 
 ## Phase 2: Classify the Right Maintenance Action
 
@@ -255,6 +275,8 @@ If "archive" feels too strong but the pattern should no longer be elevated, redu
 
 ## Phase 3: Ask for Decisions
 
+**In autonomous mode, skip this entire phase.** Apply all unambiguous actions directly and mark ambiguous cases as stale (see autonomous mode rules).
+
 Most Updates should be applied directly without asking. Only ask the user when:
 
 - The right action is genuinely ambiguous (Update vs Replace vs Archive)
@@ -393,15 +415,27 @@ Updated: Y
 Replaced: Z
 Archived: W
 Skipped: V
+Marked stale: S
 ```
 
 Then list the affected files and what changed.
 
 For **Keep** outcomes, list them under a reviewed-without-edits section so the result is visible without creating git churn.
 
+### Autonomous mode output
+
+In autonomous mode, the report is the primary deliverable since no user was present during execution. Include additional detail:
+
+- For each **Updated** file: what references were fixed and why
+- For each **Replaced** file: what the old learning recommended vs what the current code does, and the path to the new successor
+- For each **Archived** file: what referenced code/workflow is gone
+- For each **Marked stale** file: what evidence was found, what was ambiguous, and what action a human should consider
+
+This report gives a human reviewer enough context to verify the autonomous run's decisions after the fact.
+
 ## Relationship to ce:compound
 
 - `ce:compound` captures a newly solved, verified problem
 - `ce:compound-refresh` maintains older learnings as the codebase evolves
 
-Use **Replace** only when the refresh process has enough real replacement context to hand off honestly into `ce:compound`.
+Use **Replace** only when the refresh process has enough real evidence to write a trustworthy successor. When evidence is insufficient, mark as stale and recommend `ce:compound` for when the user next encounters that problem area.

From 3536ca0a932736c4df824a85aa1efe95774e776d Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Fri, 13 Mar 2026 13:23:26 -0700
Subject: [PATCH 07/18] fix(skills): autonomous mode adapts to available
 permissions

Instead of requiring write permissions, autonomous mode attempts
writes and gracefully falls back to recommendations when denied.
Report splits into Applied (succeeded) and Recommended (could not
write) sections. Read-only invocations produce a maintenance plan.
---
 .../skills/ce-compound-refresh/SKILL.md         | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
index 69b307d6..fd6b1820 100644
--- a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
@@ -22,10 +22,10 @@ Check if `$ARGUMENTS` contains `mode:autonomous`. If present, strip it from argu
 
 - **Skip all user questions.** Never pause for input.
 - **Process all docs in scope.** No scope narrowing questions — if no scope hint was provided, process everything.
-- **Apply safe actions directly:** Keep (no-op), Update (fix references), auto-Archive (unambiguous criteria met), Replace (when evidence is sufficient).
-- **Mark as stale when uncertain.** If classification is genuinely ambiguous (Update vs Replace vs Archive) or Replace evidence is insufficient, mark as stale with `status: stale`, `stale_reason`, and `stale_date` in the frontmatter. Do not guess.
+- **Attempt all safe actions:** Keep (no-op), Update (fix references), auto-Archive (unambiguous criteria met), Replace (when evidence is sufficient). If a write succeeds, record it as **applied**. If a write fails (e.g., permission denied), record the action as **recommended** in the report and continue — do not stop or ask for permissions.
+- **Mark as stale when uncertain.** If classification is genuinely ambiguous (Update vs Replace vs Archive) or Replace evidence is insufficient, mark as stale with `status: stale`, `stale_reason`, and `stale_date` in the frontmatter. If even the stale-marking write fails, include it as a recommendation.
 - **Use conservative confidence.** In interactive mode, borderline cases get a user question. In autonomous mode, borderline cases get marked stale. Err toward stale-marking over incorrect action.
-- **Generate a full report.** The output report (see Output Format) lists all actions taken and all items marked stale with reasons, so a human can review the results after the fact.
+- **Always generate a report.** The report is the primary deliverable. It has two sections: **Applied** (actions that were successfully written) and **Recommended** (actions that could not be written, with full rationale so a human can apply them or run the skill interactively). The report structure is the same regardless of what permissions were granted — the only difference is which section each action lands in.
 
 ## Interaction Principles
 
@@ -424,14 +424,19 @@ For **Keep** outcomes, list them under a reviewed-without-edits section so the r
 
 ### Autonomous mode output
 
-In autonomous mode, the report is the primary deliverable since no user was present during execution. Include additional detail:
+In autonomous mode, the report is the primary deliverable. Split actions into two sections:
 
+**Applied** (writes that succeeded):
 - For each **Updated** file: what references were fixed and why
 - For each **Replaced** file: what the old learning recommended vs what the current code does, and the path to the new successor
 - For each **Archived** file: what referenced code/workflow is gone
-- For each **Marked stale** file: what evidence was found, what was ambiguous, and what action a human should consider
+- For each **Marked stale** file: what evidence was found and why it was ambiguous
 
-This report gives a human reviewer enough context to verify the autonomous run's decisions after the fact.
+**Recommended** (actions that could not be written — e.g., permission denied):
+- Same detail as above, but framed as recommendations for a human to apply
+- Include enough context that the user can apply the change manually or re-run the skill interactively
+
+If all writes succeed, the Recommended section is empty. If no writes succeed (e.g., read-only invocation), all actions appear under Recommended — the report becomes a maintenance plan.
 
 ## Relationship to ce:compound
 

From a67dde82def541e21e99fbd9349dd77e88c55da9 Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Fri, 13 Mar 2026 13:36:28 -0700
Subject: [PATCH 08/18] fix(skills): strengthen autonomous mode to prevent
 blocking on user input

- Restructure Phase 3 with explicit autonomous skip section that says
  "do not ask, do not present, do not wait" before any interactive
  instructions
- Add autonomous caveats to Core Rules 4, 7, 8 which previously had
  unconditional "ask the user" language
- Clarify that missing referenced files is unambiguous Archive evidence,
  not a doubt case requiring user input
---
 .../skills/ce-compound-refresh/SKILL.md       | 25 +++++++++++++------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
index fd6b1820..dad800c1 100644
--- a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
@@ -73,14 +73,15 @@ For each candidate artifact, classify it into one of four outcomes:
 1. **Evidence informs judgment.** The signals below are inputs, not a mechanical scorecard. Use engineering judgment to decide whether the artifact is still trustworthy.
 2. **Prefer no-write Keep.** Do not update a doc just to leave a review breadcrumb.
 3. **Match docs to reality, not the reverse.** When current code differs from a learning, update the learning to reflect the current code. The skill's job is doc accuracy, not code review — do not ask the user whether code changes were "intentional" or "a regression." If the code changed, the doc should match. If the user thinks the code is wrong, that is a separate concern outside this workflow.
-4. **Be decisive, minimize questions.** When evidence is clear (file renamed, class moved, reference broken), apply the update. Only ask the user when the right maintenance action is genuinely ambiguous — not to confirm obvious fixes. The goal is automated maintenance with human oversight on judgment calls, not a question for every finding.
+4. **Be decisive, minimize questions.** When evidence is clear (file renamed, class moved, reference broken), apply the update. In interactive mode, only ask the user when the right action is genuinely ambiguous. In autonomous mode, mark ambiguous cases as stale instead of asking. The goal is automated maintenance with human oversight on judgment calls, not a question for every finding.
 5. **Avoid low-value churn.** Do not edit a doc just to fix a typo, polish wording, or make cosmetic changes that do not materially improve accuracy or usability.
 6. **Use Update only for meaningful, evidence-backed drift.** Paths, module names, related links, category metadata, code snippets, and clearly stale wording are fair game when fixing them materially improves accuracy.
 7. **Use Replace only when there is a real replacement.** That means either:
    - the current conversation contains a recently solved, verified replacement fix, or
-   - the user provides enough concrete replacement context to document the successor honestly, or
+   - the user has provided enough concrete replacement context to document the successor honestly, or
+   - the codebase investigation found the current approach and can document it as the successor, or
    - newer docs, pattern docs, PRs, or issues provide strong successor evidence.
-8. **Archive when the code is gone.** If the referenced code, controller, or workflow no longer exists in the codebase and no successor can be found, recommend Archive — don't default to Keep just because the general advice is still "sound." A learning about a deleted feature misleads readers into thinking that feature still exists. When in doubt between Keep and Archive, ask the user — but missing referenced files with no matching code is strong Archive evidence, not a reason to Keep with "medium confidence."
+8. **Archive when the code is gone.** If the referenced code, controller, or workflow no longer exists in the codebase and no successor can be found, recommend Archive — don't default to Keep just because the general advice is still "sound." A learning about a deleted feature misleads readers into thinking that feature still exists. When in doubt between Keep and Archive, ask the user (in interactive mode) or mark as stale (in autonomous mode). But missing referenced files with no matching code is **not** a doubt case — it is strong, unambiguous Archive evidence. Auto-archive it.
 
 ## Scope Selection
 
@@ -275,7 +276,15 @@ If "archive" feels too strong but the pattern should no longer be elevated, redu
 
 ## Phase 3: Ask for Decisions
 
-**In autonomous mode, skip this entire phase.** Apply all unambiguous actions directly and mark ambiguous cases as stale (see autonomous mode rules).
+### Autonomous mode
+
+**Skip this entire phase. Do not ask any questions. Do not present options. Do not wait for input.** Proceed directly to Phase 4 and execute all actions based on the classifications from Phase 2:
+
+- Unambiguous Keep, Update, auto-Archive, and Replace (with sufficient evidence) → execute directly
+- Ambiguous cases → mark as stale
+- Then generate the report (see Output Format)
+
+### Interactive mode
 
 Most Updates should be applied directly without asking. Only ask the user when:
 
@@ -285,7 +294,7 @@ Most Updates should be applied directly without asking. Only ask the user when:
 
 Do **not** ask questions about whether code changes were intentional, whether the user wants to fix bugs in the code, or other concerns outside doc maintenance. Stay in your lane — doc accuracy.
 
-### Question Style
+#### Question Style
 
 Always present choices using the platform's interactive question tool (e.g. `AskUserQuestion` in Claude Code, `request_user_input` in Codex). If the environment has no interactive prompt tool, present numbered options in plain text and wait for the user's response before proceeding.
 
@@ -297,7 +306,7 @@ Question rules:
 - Explain the rationale for the recommendation in one concise sentence
 - Avoid asking the user to choose from actions that are not actually plausible
 
-### Focused Scope
+#### Focused Scope
 
 For a single artifact, present:
 
@@ -321,7 +330,7 @@ What would you like to do?
 
 Do not list all four actions unless all four are genuinely plausible.
 
-### Batch Scope
+#### Batch Scope
 
 For several learnings:
 
@@ -336,7 +345,7 @@ Ask for confirmation in stages:
 2. Then handle Replace one at a time
 3. Then handle Archive one at a time unless the archive is unambiguous and safe to auto-apply
 
-### Broad Scope
+#### Broad Scope
 
 If the user asked for a sweeping refresh, keep the interaction incremental:
 

From 3c88644656ef378e9a4661441dafe8a3ea420bdc Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Fri, 13 Mar 2026 13:36:54 -0700
Subject: [PATCH 09/18] fix(skills): enforce full report output in autonomous
 mode
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The model was generating findings internally then outputting a
one-line summary. Added explicit instructions that the full report
must be printed as text output — every file, every classification,
every action. In autonomous mode, the report is the sole deliverable
and must be self-contained and complete.
---
 .../skills/ce-compound-refresh/SKILL.md       | 20 +++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
index dad800c1..deb81a5f 100644
--- a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
@@ -412,7 +412,9 @@ Archive only when a learning is clearly obsolete or redundant. Do not archive a
 
 ## Output Format
 
-After processing the selected scope, report:
+**The full report MUST be printed as text output.** Do not summarize findings internally and then output a one-liner. The report is the deliverable — print every section in full.
+
+After processing the selected scope, output the following report:
 
 ```text
 Compound Refresh Summary
@@ -427,19 +429,25 @@ Skipped: V
 Marked stale: S
 ```
 
-Then list the affected files and what changed.
+Then for EVERY file processed, list:
+- The file path
+- The classification (Keep/Update/Replace/Archive/Stale)
+- What evidence was found
+- What action was taken (or recommended)
 
 For **Keep** outcomes, list them under a reviewed-without-edits section so the result is visible without creating git churn.
 
 ### Autonomous mode output
 
-In autonomous mode, the report is the primary deliverable. Split actions into two sections:
+In autonomous mode, the report is the sole deliverable — there is no user present to ask follow-up questions, so the report must be self-contained and complete. **Print the full report. Do not abbreviate, summarize, or skip sections.**
+
+Split actions into two sections:
 
 **Applied** (writes that succeeded):
-- For each **Updated** file: what references were fixed and why
+- For each **Updated** file: the file path, what references were fixed, and why
 - For each **Replaced** file: what the old learning recommended vs what the current code does, and the path to the new successor
-- For each **Archived** file: what referenced code/workflow is gone
-- For each **Marked stale** file: what evidence was found and why it was ambiguous
+- For each **Archived** file: the file path and what referenced code/workflow is gone
+- For each **Marked stale** file: the file path, what evidence was found, and why it was ambiguous
 
 **Recommended** (actions that could not be written — e.g., permission denied):
 - Same detail as above, but framed as recommendations for a human to apply

From 6fa31615203f68d54aff790d497271b41bb05ca2 Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Fri, 13 Mar 2026 13:37:06 -0700
Subject: [PATCH 10/18] fix(skills): specify markdown format for autonomous
 report output

---
 .../compound-engineering/skills/ce-compound-refresh/SKILL.md    | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
index deb81a5f..95a92c64 100644
--- a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
@@ -412,7 +412,7 @@ Archive only when a learning is clearly obsolete or redundant. Do not archive a
 
 ## Output Format
 
-**The full report MUST be printed as text output.** Do not summarize findings internally and then output a one-liner. The report is the deliverable — print every section in full.
+**The full report MUST be printed as markdown output.** Do not summarize findings internally and then output a one-liner. The report is the deliverable — print every section in full, formatted as readable markdown with headers, tables, and bullet points.
 
 After processing the selected scope, output the following report:
 

From ed9b29b8e32bef7b1caecc09350b24ffbdd93990 Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Fri, 13 Mar 2026 13:47:52 -0700
Subject: [PATCH 11/18] fix(skills): prevent auto-archive when problem domain
 is still active
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Auto-archive now requires both the implementation AND the problem
domain to be gone. If referenced files are deleted but the application
still deals with the same problem (auth, payments, migrations), the
learning should be Replace'd not Archive'd — the knowledge gap needs
to be filled. Uses agent reasoning about concepts, not mechanical
keyword searches.
---
 .../skills/ce-compound-refresh/SKILL.md         | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
index 95a92c64..d5cba9f4 100644
--- a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
@@ -253,13 +253,24 @@ Action:
   - `archived_date: YYYY-MM-DD`
   - `archive_reason: [why it was archived]`
 
-Auto-archive when evidence is unambiguous:
+### Before archiving: check if the problem domain is still active
 
-- the referenced code, controller, or workflow is gone and no successor exists in the codebase
+When a learning's referenced files are gone, that is strong evidence — but only that the **implementation** is gone. Before archiving, reason about whether the **problem the learning solves** is still a concern in the codebase:
+
+- A learning about session token storage where `auth_token.rb` is gone — does the application still handle session tokens? If so, the concept persists under a new implementation. That is Replace, not Archive.
+- A learning about a deprecated API endpoint where the entire feature was removed — the problem domain is gone. That is Archive.
+
+Do not search mechanically for keywords from the old learning. Instead, understand what problem the learning addresses, then investigate whether that problem domain still exists in the codebase. The agent understands concepts — use that understanding to look for where the problem lives now, not where the old code used to be.
+
+**Auto-archive only when both the implementation AND the problem domain are gone:**
+
+- the referenced code is gone AND the application no longer deals with that problem domain
 - the learning is fully superseded by a clearly better successor
 - the document is plainly redundant and adds no distinct value
 
-Do not keep a learning just because its general advice is "still sound" — if the specific code it references is gone, the learning misleads readers. Archive it.
+If the implementation is gone but the problem domain persists (the app still does auth, still processes payments, still handles migrations), classify as **Replace** — the problem still matters and the current approach should be documented.
+
+Do not keep a learning just because its general advice is "still sound" — if the specific code it references is gone, the learning misleads readers. But do not archive a learning whose problem domain is still active — that knowledge gap should be filled with a replacement.
 
 If there is a clearly better successor, strongly consider **Replace** before **Archive** so the old artifact points readers toward the newer guidance.
 

From 42649e190ef5542612601f8d27a393ec5afd8077 Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Fri, 13 Mar 2026 14:54:27 -0700
Subject: [PATCH 12/18] fix(skills): include tool constraint in subagent task
 prompts

The file-tools-over-bash instruction was in the orchestrator's
context but not passed to spawned subagents. Changed to an explicit
quoted instruction block that must be included in each subagent's
task prompt so it's visible to the subagent, not just the orchestrator.
---
 .../compound-engineering/skills/ce-compound-refresh/SKILL.md  | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
index d5cba9f4..f79f4bed 100644
--- a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
@@ -200,7 +200,9 @@ Use subagents for context isolation when investigating multiple artifacts — no
 | **Parallel subagents** | 3+ truly independent artifacts with low overlap |
 | **Batched subagents** | Broad sweeps — narrow scope first, then investigate in batches |
 
-Subagents should use dedicated file search and read tools for investigation — not shell commands. This avoids unnecessary permission prompts and is more reliable across platforms.
+**When spawning any subagent, include this instruction in its task prompt:**
+
+> Use dedicated file search and read tools (Glob, Grep, Read) for all investigation. Do NOT use shell commands (ls, find, cat, grep, test, bash) for file operations. This avoids permission prompts and is more reliable.
 
 There are two subagent roles:
 

From b960f7d57e8cf7e71579ebaec3b357f0a8b9494e Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Fri, 13 Mar 2026 15:14:52 -0700
Subject: [PATCH 13/18] feat(skills): add Phase 5 commit workflow to
 ce:compound-refresh

Handles committing changes at the end of a refresh run so doc
maintenance doesn't sit uncommitted. Detects git context and adapts:
autonomous mode uses sensible defaults (branch + PR on main, separate
commit on feature branches), interactive mode presents options. Always
selectively stages only compound-refresh files to avoid mixing with
in-progress feature work.
---
 .../skills/ce-compound-refresh/SKILL.md       | 50 +++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
index f79f4bed..99e3d952 100644
--- a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
@@ -468,6 +468,56 @@ Split actions into two sections:
 
 If all writes succeed, the Recommended section is empty. If no writes succeed (e.g., read-only invocation), all actions appear under Recommended — the report becomes a maintenance plan.
 
+## Phase 5: Commit Changes
+
+After all actions are executed and the report is generated, handle committing the changes. Skip this phase if no files were modified (all Keep, or all writes failed).
+
+### Detect git context
+
+Before offering options, check:
+1. Which branch is currently checked out (main/master vs feature branch)
+2. Whether the working tree has other uncommitted changes beyond what compound-refresh modified
+3. Recent commit messages to match the repo's commit style
+
+### Autonomous mode
+
+Use sensible defaults — no user to ask:
+
+| Context | Default action |
+|---------|---------------|
+| On main/master | Create a branch (`docs/compound-refresh-YYYY-MM-DD`), commit, attempt to open a PR. If PR creation fails, report the branch name. |
+| On a feature branch | Commit as a separate commit on the current branch |
+| Git operations fail | Include the recommended git commands in the report and continue |
+
+Stage only the files that compound-refresh modified — not other dirty files in the working tree.
+
+### Interactive mode
+
+Present options based on context. Stage only compound-refresh files regardless of which option the user picks.
+
+**On main/master (clean or dirty):**
+
+1. Create a branch, commit, and open a PR (recommended)
+2. Don't commit — I'll handle it
+
+**On a feature branch, clean working tree:**
+
+1. Commit to this branch as a separate commit (recommended)
+2. Create a separate branch and commit
+3. Don't commit
+
+**On a feature branch, dirty working tree (other uncommitted changes):**
+
+1. Commit only the compound-refresh changes to this branch (selective staging — other dirty files stay untouched)
+2. Don't commit
+
+### Commit message
+
+Write a descriptive commit message that:
+- Summarizes what was refreshed (e.g., "update 3 stale learnings, archive 1 obsolete doc")
+- Follows the repo's existing commit conventions (check recent git log for style)
+- Is succinct — the details are in the changed files themselves
+
 ## Relationship to ce:compound
 
 - `ce:compound` captures a newly solved, verified problem

From 7b5dd85a43814f52e02e6ab1e9323276647067e8 Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Fri, 13 Mar 2026 15:16:03 -0700
Subject: [PATCH 14/18] fix(skills): remove prescriptive branch naming in
 compound-refresh

Let the agent generate a reasonable branch name based on context
and repo conventions instead of prescribing a date-based format
that would collide on multiple runs per day.
---
 .../compound-engineering/skills/ce-compound-refresh/SKILL.md    | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
index 99e3d952..e2ec303c 100644
--- a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
@@ -485,7 +485,7 @@ Use sensible defaults — no user to ask:
 
 | Context | Default action |
 |---------|---------------|
-| On main/master | Create a branch (`docs/compound-refresh-YYYY-MM-DD`), commit, attempt to open a PR. If PR creation fails, report the branch name. |
+| On main/master | Create a descriptively named branch, commit, attempt to open a PR. If PR creation fails, report the branch name. |
 | On a feature branch | Commit as a separate commit on the current branch |
 | Git operations fail | Include the recommended git commands in the report and continue |
 

From 4b8d206def0949b8f235bdd12d18e19469b2a419 Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Fri, 13 Mar 2026 15:26:18 -0700
Subject: [PATCH 15/18] fix(skills): enforce branch creation when committing on
 main

The model was offering "commit to current branch" on main instead
of "create a branch and PR." Added explicit branch detection step
and "Do NOT commit directly to main" instruction.
---
 .../skills/ce-compound-refresh/SKILL.md                | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
index e2ec303c..1806b23e 100644
--- a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
@@ -493,20 +493,22 @@ Stage only the files that compound-refresh modified — not other dirty files in
 
 ### Interactive mode
 
-Present options based on context. Stage only compound-refresh files regardless of which option the user picks.
+First, run `git branch --show-current` to determine the current branch. Then present the correct options based on the result. Stage only compound-refresh files regardless of which option the user picks.
 
-**On main/master (clean or dirty):**
+**If the current branch is main, master, or the repo's default branch:**
+
+Do NOT offer to commit directly to main. Always offer a branch first:
 
 1. Create a branch, commit, and open a PR (recommended)
 2. Don't commit — I'll handle it
 
-**On a feature branch, clean working tree:**
+**If the current branch is a feature branch, clean working tree:**
 
 1. Commit to this branch as a separate commit (recommended)
 2. Create a separate branch and commit
 3. Don't commit
 
-**On a feature branch, dirty working tree (other uncommitted changes):**
+**If the current branch is a feature branch, dirty working tree (other uncommitted changes):**
 
 1. Commit only the compound-refresh changes to this branch (selective staging — other dirty files stay untouched)
 2. Don't commit

From a3697d3b30f454eaf840216b0f0f7f1ebf77d825 Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Fri, 13 Mar 2026 15:26:51 -0700
Subject: [PATCH 16/18] fix(skills): allow direct commit on main as non-default
 option

---
 .../compound-engineering/skills/ce-compound-refresh/SKILL.md | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
index 1806b23e..7d26c6c9 100644
--- a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
@@ -497,10 +497,9 @@ First, run `git branch --show-current` to determine the current branch. Then pre
 
 **If the current branch is main, master, or the repo's default branch:**
 
-Do NOT offer to commit directly to main. Always offer a branch first:
-
 1. Create a branch, commit, and open a PR (recommended)
-2. Don't commit — I'll handle it
+2. Commit directly to this branch
+3. Don't commit — I'll handle it
 
 **If the current branch is a feature branch, clean working tree:**
 

From 583fb38ae40da60f80ec3528a445679705cf3c0f Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Fri, 13 Mar 2026 15:27:41 -0700
Subject: [PATCH 17/18] fix(skills): use actual branch name in commit options
 instead of 'this branch'

---
 .../skills/ce-compound-refresh/SKILL.md                     | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
index 7d26c6c9..759685bf 100644
--- a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
@@ -498,18 +498,18 @@ First, run `git branch --show-current` to determine the current branch. Then pre
 **If the current branch is main, master, or the repo's default branch:**
 
 1. Create a branch, commit, and open a PR (recommended)
-2. Commit directly to this branch
+2. Commit directly to `{current branch name}`
 3. Don't commit — I'll handle it
 
 **If the current branch is a feature branch, clean working tree:**
 
-1. Commit to this branch as a separate commit (recommended)
+1. Commit to `{current branch name}` as a separate commit (recommended)
 2. Create a separate branch and commit
 3. Don't commit
 
 **If the current branch is a feature branch, dirty working tree (other uncommitted changes):**
 
-1. Commit only the compound-refresh changes to this branch (selective staging — other dirty files stay untouched)
+1. Commit only the compound-refresh changes to `{current branch name}` (selective staging — other dirty files stay untouched)
 2. Don't commit
 
 ### Commit message

From cdb8de42a2c22ed55491dba1e8dda7553e000e77 Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Fri, 13 Mar 2026 15:33:25 -0700
Subject: [PATCH 18/18] fix(skills): require specific branch names based on
 what was refreshed

---
 .../compound-engineering/skills/ce-compound-refresh/SKILL.md  | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
index 759685bf..276aef4d 100644
--- a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
@@ -485,7 +485,7 @@ Use sensible defaults — no user to ask:
 
 | Context | Default action |
 |---------|---------------|
-| On main/master | Create a descriptively named branch, commit, attempt to open a PR. If PR creation fails, report the branch name. |
+| On main/master | Create a branch named for what was refreshed (e.g., `docs/refresh-auth-and-ci-learnings`), commit, attempt to open a PR. If PR creation fails, report the branch name. |
 | On a feature branch | Commit as a separate commit on the current branch |
 | Git operations fail | Include the recommended git commands in the report and continue |
 
@@ -497,7 +497,7 @@ First, run `git branch --show-current` to determine the current branch. Then pre
 
 **If the current branch is main, master, or the repo's default branch:**
 
-1. Create a branch, commit, and open a PR (recommended)
+1. Create a branch, commit, and open a PR (recommended) — the branch name should be specific to what was refreshed, not generic (e.g., `docs/refresh-auth-learnings` not `docs/compound-refresh`)
 2. Commit directly to `{current branch name}`
 3. Don't commit — I'll handle it