diff --git a/docs/solutions/skill-design/compound-refresh-skill-improvements.md b/docs/solutions/skill-design/compound-refresh-skill-improvements.md new file mode 100644 index 00000000..21f0fab1 --- /dev/null +++ b/docs/solutions/skill-design/compound-refresh-skill-improvements.md @@ -0,0 +1,141 @@ +--- +title: "ce:compound-refresh skill redesign for autonomous maintenance without live user context" +category: skill-design +date: 2026-03-13 +module: plugins/compound-engineering/skills/ce-compound-refresh +component: SKILL.md +tags: + - skill-design + - compound-refresh + - maintenance-workflow + - drift-classification + - subagent-architecture + - platform-agnostic +severity: medium +description: "Redesign ce:compound-refresh to handle autonomous drift triage, in-skill replacement via subagents, and smart scoping without relying on live problem-solving context that ce:compound expects." +related: + - docs/solutions/plugin-versioning-requirements.md + - https://github.com/EveryInc/compound-engineering-plugin/pull/260 + - https://github.com/EveryInc/compound-engineering-plugin/issues/204 + - https://github.com/EveryInc/compound-engineering-plugin/issues/221 +--- + +## Problem + +The initial `ce:compound-refresh` skill had several design issues discovered during real-world testing: + +1. Interactive questions never triggered the proper tool (AskUserQuestion) because the instruction used a weak "when available" qualifier +2. Auto-archive criteria contradicted a "always ask before archiving" rule in a later phase +3. Broad scope (9+ docs) asked the user to choose an area blindly without providing analysis +4. The Replace flow tried to hand off to `ce:compound`, which expects fresh problem-solving context the user doesn't have months later +5. Subagents used shell commands for file existence checks, triggering permission prompts +6. No way to run the skill unattended (e.g., on a schedule) — every run required user interaction + +## Root Cause + +Five independent design issues, each with a distinct root cause: + +1. **Hardcoded tool name with escape hatch.** Saying "Use AskUserQuestion when available" gave the model permission to skip the tool and just output text. Also non-portable to Codex and other platforms. +2. **Contradictory rules across phases.** Phase 2 defined auto-archive criteria. Phase 3 said "always ask before archiving" with no exception. The model followed Phase 3. +3. **Question before evidence.** The skill prompted scope selection before gathering any information about which areas were most stale or interconnected. +4. **Unsatisfied precondition in cross-skill handoff.** `ce:compound` expects a recently solved problem with fresh context. A maintenance refresh has investigation evidence instead — equivalent data, different shape. +5. **No tool preference guidance for subagents.** Without explicit instruction, subagents defaulted to bash for file operations. +6. **Interactive-only design.** Every phase assumed a user was present. No way to run autonomously for scheduled maintenance or hands-off sweeps. + +## Solution + +### 1. Platform-agnostic interactive questions + +Reference "the platform's interactive question tool" as the concept, with concrete examples: + +```markdown +Ask questions **one at a time** — use the platform's interactive question tool +(e.g. `AskUserQuestion` in Claude Code, `request_user_input` in Codex) and +**stop to wait for the answer** before continuing. +``` + +The "stop to wait" language removes the escape hatch. The examples help each platform's model select the right tool. + +### 2. Auto-archive exemption for unambiguous cases + +Phase 3 now defers to Phase 2's auto-archive criteria: + +```markdown +You are about to Archive a document **and** the evidence is not unambiguous +(see auto-archive criteria in Phase 2). When auto-archive criteria are met, +proceed without asking. +``` + +### 3. Smart triage for broad scope + +When 9+ candidate docs are found, triage before asking: + +1. **Inventory** — read frontmatter, group by module/component/category +2. **Impact clustering** — dense clusters of interconnected learnings + pattern docs are higher-impact than isolated docs +3. **Spot-check drift** — check whether primary referenced files still exist +4. **Recommend** — present the highest-impact cluster with rationale + +Key insight: "code changed recently" is NOT a reliable staleness signal. Missing references in a high-impact cluster is the strongest signal. + +### 4. Replacement subagents instead of ce:compound handoff + +By the time a Replace is identified, Phase 1 investigation has already gathered the evidence that `ce:compound` would research: +- The old learning's claims +- What the current code actually does +- Where and why the drift occurred + +A replacement subagent writes the successor directly using `ce:compound`'s document format (frontmatter, problem, root cause, solution, prevention). Run sequentially — one at a time — because each may read significant code. + +When evidence is insufficient (e.g., entire subsystem replaced, new architecture too complex to understand from investigation alone), mark as stale and recommend `ce:compound` after the user's next encounter with that area. + +### 5. Dedicated file tools over shell commands + +Added to subagent strategy: + +```markdown +Subagents should use dedicated file search and read tools for investigation — +not shell commands. This avoids unnecessary permission prompts and is more +reliable across platforms. +``` + +### 6. Autonomous mode for scheduled/unattended runs + +Added `mode:autonomous` argument support so the skill can run without user interaction (e.g., on a schedule, in CI, or when the user just wants a hands-off sweep). + +Key design decisions: +- **Explicit opt-in only.** `mode:autonomous` must be in the arguments. Auto-detection based on tool availability was rejected because a user in an interactive agent without a question tool (e.g., Cursor, Windsurf) is still interactive — they just use plain-text replies. +- **Conservative confidence.** Borderline cases that would get a user question in interactive mode get marked stale in autonomous mode. Err toward stale-marking over incorrect action. +- **Detailed report as deliverable.** Since no user was present, the output report includes full rationale for each action so a human can review after the fact. +- **Process everything.** No scope narrowing questions — if no scope hint provided, process all docs. For broad scope, process clusters in impact order without asking. + +## Prevention + +### Skill review checklist additions + +These five patterns should be checked during any skill review: + +1. **No hardcoded tool names** — All tool references use capability-first language with platform examples and a plain-text fallback +2. **No contradictory rules across phases** — Trace each action type through all phases; verify absolute language ("always," "never") is not contradicted elsewhere +3. **No blind user questions** — Every question presented to the user is informed by evidence the agent gathered first +4. **No unsatisfied cross-skill preconditions** — Every skill handoff verifies the target skill's preconditions are met by the calling context +5. **No shell commands for file operations in subagents** — Subagent instructions explicitly prefer dedicated tools over shell commands +6. **Autonomous mode for long-running skills** — Any skill that could run unattended should support an explicit opt-in mode with conservative confidence and detailed reporting + +### Key anti-patterns + +| Anti-pattern | Better pattern | +|---|---| +| "Use the AskUserQuestion tool when available" | "Use the platform's interactive question tool (e.g. AskUserQuestion in Claude Code, request_user_input in Codex)" | +| Defining auto-archive conditions, then "always ask before archiving" | Single-source-of-truth: define the rule once, reference it elsewhere | +| "Which area should we review?" before any investigation | Triage first, recommend with evidence, let user confirm or redirect | +| "Create a successor learning through ce:compound" during a refresh | Replacement subagent writes directly using gathered evidence | +| No tool guidance for subagents | "Use dedicated file search and read tools, not shell commands" | +| Auto-detecting "no question tool = headless" | Explicit `mode:autonomous` argument — interactive agents without question tools are still interactive | + +## Cross-References + +- **PR #260**: The PR containing all these improvements +- **Issue #204**: Platform-agnostic tool references (AskUserQuestion dependency) +- **Issue #221**: Motivating issue for maintenance at scale +- **PR #242**: ce:audit (detection counterpart, closed) +- **PR #150**: Established subagent context-isolation pattern diff --git a/plugins/compound-engineering/README.md b/plugins/compound-engineering/README.md index f41f5777..a574b37d 100644 --- a/plugins/compound-engineering/README.md +++ b/plugins/compound-engineering/README.md @@ -7,7 +7,7 @@ AI-powered development tools that get smarter with every use. Make each unit of | Component | Count | |-----------|-------| | Agents | 28 | -| Commands | 22 | +| Commands | 23 | | Skills | 20 | | MCP Servers | 1 | @@ -81,6 +81,7 @@ Core workflow commands use `ce:` prefix to unambiguously identify them as compou | `/ce:review` | Run comprehensive code reviews | | `/ce:work` | Execute work items systematically | | `/ce:compound` | Document solved problems to compound team knowledge | +| `/ce:compound-refresh` | Refresh stale or drifting learnings and decide whether to keep, update, replace, or archive them | > **Deprecated aliases:** `/workflows:plan`, `/workflows:work`, `/workflows:review`, `/workflows:brainstorm`, `/workflows:compound` still work but show a deprecation warning. Use `ce:*` equivalents. diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md new file mode 100644 index 00000000..276aef4d --- /dev/null +++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md @@ -0,0 +1,527 @@ +--- +name: ce:compound-refresh +description: Refresh stale or drifting learnings and pattern docs in docs/solutions/ by reviewing, updating, replacing, or archiving them against the current codebase. Use after refactors, migrations, dependency upgrades, or when a retrieved learning feels outdated or wrong. Also use when reviewing docs/solutions/ for accuracy, when a recently solved problem contradicts an existing learning, or when pattern docs no longer reflect current code. +argument-hint: "[mode:autonomous] [optional: scope hint]" +disable-model-invocation: true +--- + +# Compound Refresh + +Maintain the quality of `docs/solutions/` over time. This workflow reviews existing learnings against the current codebase, then refreshes any derived pattern docs that depend on them. + +## Mode Detection + +Check if `$ARGUMENTS` contains `mode:autonomous`. If present, strip it from arguments (use the remainder as a scope hint) and run in **autonomous mode**. + +| Mode | When | Behavior | +|------|------|----------| +| **Interactive** (default) | User is present and can answer questions | Ask for decisions on ambiguous cases, confirm actions | +| **Autonomous** | `mode:autonomous` in arguments | No user interaction. Apply all unambiguous actions (Keep, Update, auto-Archive, Replace with sufficient evidence). Mark ambiguous cases as stale. Generate a summary report at the end. | + +### Autonomous mode rules + +- **Skip all user questions.** Never pause for input. +- **Process all docs in scope.** No scope narrowing questions — if no scope hint was provided, process everything. +- **Attempt all safe actions:** Keep (no-op), Update (fix references), auto-Archive (unambiguous criteria met), Replace (when evidence is sufficient). If a write succeeds, record it as **applied**. If a write fails (e.g., permission denied), record the action as **recommended** in the report and continue — do not stop or ask for permissions. +- **Mark as stale when uncertain.** If classification is genuinely ambiguous (Update vs Replace vs Archive) or Replace evidence is insufficient, mark as stale with `status: stale`, `stale_reason`, and `stale_date` in the frontmatter. If even the stale-marking write fails, include it as a recommendation. +- **Use conservative confidence.** In interactive mode, borderline cases get a user question. In autonomous mode, borderline cases get marked stale. Err toward stale-marking over incorrect action. +- **Always generate a report.** The report is the primary deliverable. It has two sections: **Applied** (actions that were successfully written) and **Recommended** (actions that could not be written, with full rationale so a human can apply them or run the skill interactively). The report structure is the same regardless of what permissions were granted — the only difference is which section each action lands in. + +## Interaction Principles + +**These principles apply to interactive mode only. In autonomous mode, skip all user questions and apply the autonomous mode rules above.** + +Follow the same interaction style as `ce:brainstorm`: + +- Ask questions **one at a time** — use the platform's interactive question tool (e.g. `AskUserQuestion` in Claude Code, `request_user_input` in Codex) and **stop to wait for the answer** before continuing +- Prefer **multiple choice** when natural options exist +- Start with **scope and intent**, then narrow only when needed +- Do **not** ask the user to make decisions before you have evidence +- Lead with a recommendation and explain it briefly + +The goal is not to force the user through a checklist. The goal is to help them make a good maintenance decision with the smallest amount of friction. + +## Refresh Order + +Refresh in this order: + +1. Review the relevant individual learning docs first +2. Note which learnings stayed valid, were updated, were replaced, or were archived +3. Then review any pattern docs that depend on those learnings + +Why this order: + +- learning docs are the primary evidence +- pattern docs are derived from one or more learnings +- stale learnings can make a pattern look more valid than it really is + +If the user starts by naming a pattern doc, you may begin there to understand the concern, but inspect the supporting learning docs before changing the pattern. + +## Maintenance Model + +For each candidate artifact, classify it into one of four outcomes: + +| Outcome | Meaning | Default action | +|---------|---------|----------------| +| **Keep** | Still accurate and still useful | No file edit by default; report that it was reviewed and remains trustworthy | +| **Update** | Core solution is still correct, but references drifted | Apply evidence-backed in-place edits | +| **Replace** | The old artifact is now misleading, but there is a known better replacement | Create a trustworthy successor or revised pattern, then mark/archive the old artifact as needed | +| **Archive** | No longer useful or applicable | Move the obsolete artifact to `docs/solutions/_archived/` with archive metadata when appropriate | + +## Core Rules + +1. **Evidence informs judgment.** The signals below are inputs, not a mechanical scorecard. Use engineering judgment to decide whether the artifact is still trustworthy. +2. **Prefer no-write Keep.** Do not update a doc just to leave a review breadcrumb. +3. **Match docs to reality, not the reverse.** When current code differs from a learning, update the learning to reflect the current code. The skill's job is doc accuracy, not code review — do not ask the user whether code changes were "intentional" or "a regression." If the code changed, the doc should match. If the user thinks the code is wrong, that is a separate concern outside this workflow. +4. **Be decisive, minimize questions.** When evidence is clear (file renamed, class moved, reference broken), apply the update. In interactive mode, only ask the user when the right action is genuinely ambiguous. In autonomous mode, mark ambiguous cases as stale instead of asking. The goal is automated maintenance with human oversight on judgment calls, not a question for every finding. +5. **Avoid low-value churn.** Do not edit a doc just to fix a typo, polish wording, or make cosmetic changes that do not materially improve accuracy or usability. +6. **Use Update only for meaningful, evidence-backed drift.** Paths, module names, related links, category metadata, code snippets, and clearly stale wording are fair game when fixing them materially improves accuracy. +7. **Use Replace only when there is a real replacement.** That means either: + - the current conversation contains a recently solved, verified replacement fix, or + - the user has provided enough concrete replacement context to document the successor honestly, or + - the codebase investigation found the current approach and can document it as the successor, or + - newer docs, pattern docs, PRs, or issues provide strong successor evidence. +8. **Archive when the code is gone.** If the referenced code, controller, or workflow no longer exists in the codebase and no successor can be found, recommend Archive — don't default to Keep just because the general advice is still "sound." A learning about a deleted feature misleads readers into thinking that feature still exists. When in doubt between Keep and Archive, ask the user (in interactive mode) or mark as stale (in autonomous mode). But missing referenced files with no matching code is **not** a doubt case — it is strong, unambiguous Archive evidence. Auto-archive it. + +## Scope Selection + +Start by discovering learnings and pattern docs under `docs/solutions/`. + +Exclude: + +- `README.md` +- `docs/solutions/_archived/` + +Find all `.md` files under `docs/solutions/`, excluding `README.md` files and anything under `_archived/`. + +If `$ARGUMENTS` is provided, use it to narrow scope before proceeding. Try these matching strategies in order, stopping at the first that produces results: + +1. **Directory match** — check if the argument matches a subdirectory name under `docs/solutions/` (e.g., `performance-issues`, `database-issues`) +2. **Frontmatter match** — search `module`, `component`, or `tags` fields in learning frontmatter for the argument +3. **Filename match** — match against filenames (partial matches are fine) +4. **Content search** — search file contents for the argument as a keyword (useful for feature names or feature areas) + +If no matches are found, report that and ask the user to clarify. In autonomous mode, report the miss and stop — do not guess at scope. + +If no candidate docs are found, report: + +```text +No candidate docs found in docs/solutions/. +Run `ce:compound` after solving problems to start building your knowledge base. +``` + +## Phase 0: Assess and Route + +Before asking the user to classify anything: + +1. Discover candidate artifacts +2. Estimate scope +3. Choose the lightest interaction path that fits + +### Route by Scope + +| Scope | When to use it | Interaction style | +|-------|----------------|-------------------| +| **Focused** | 1-2 likely files or user named a specific doc | Investigate directly, then present a recommendation | +| **Batch** | Up to ~8 mostly independent docs | Investigate first, then present grouped recommendations | +| **Broad** | 9+ docs, ambiguous, or repo-wide stale-doc sweep | Triage first, then investigate in batches | + +### Broad Scope Triage + +When scope is broad (9+ candidate docs), do a lightweight triage before deep investigation: + +1. **Inventory** — read frontmatter of all candidate docs, group by module/component/category +2. **Impact clustering** — identify areas with the densest clusters of learnings + pattern docs. A cluster of 5 learnings and 2 patterns covering the same module is higher-impact than 5 isolated single-doc areas, because staleness in one doc is likely to affect the others. +3. **Spot-check drift** — for each cluster, check whether the primary referenced files still exist. Missing references in a high-impact cluster = strongest signal for where to start. +4. **Recommend a starting area** — present the highest-impact cluster with a brief rationale and ask the user to confirm or redirect. In autonomous mode, skip the question and process all clusters in impact order. + +Example: + +```text +Found 24 learnings across 5 areas. + +The auth module has 5 learnings and 2 pattern docs that cross-reference +each other — and 3 of those reference files that no longer exist. +I'd start there. + +1. Start with auth (recommended) +2. Pick a different area +3. Review everything +``` + +Do not ask action-selection questions yet. First gather evidence. + +## Phase 1: Investigate Candidate Learnings + +For each learning in scope, read it, cross-reference its claims against the current codebase, and form a recommendation. + +A learning has several dimensions that can independently go stale. Surface-level checks catch the obvious drift, but staleness often hides deeper: + +- **References** — do the file paths, class names, and modules it mentions still exist or have they moved? +- **Recommended solution** — does the fix still match how the code actually works today? A renamed file with a completely different implementation pattern is not just a path update. +- **Code examples** — if the learning includes code snippets, do they still reflect the current implementation? +- **Related docs** — are cross-referenced learnings and patterns still present and consistent? + +Match investigation depth to the learning's specificity — a learning referencing exact file paths and code snippets needs more verification than one describing a general principle. + +### Drift Classification: Update vs Replace + +The critical distinction is whether the drift is **cosmetic** (references moved but the solution is the same) or **substantive** (the solution itself changed): + +- **Update territory** — file paths moved, classes renamed, links broke, metadata drifted, but the core recommended approach is still how the code works. `ce:compound-refresh` fixes these directly. +- **Replace territory** — the recommended solution conflicts with current code, the architectural approach changed, or the pattern is no longer the preferred way. This means a new learning needs to be written. A replacement subagent writes the successor following `ce:compound`'s document format (frontmatter, problem, root cause, solution, prevention), using the investigation evidence already gathered. The orchestrator does not rewrite learnings inline — it delegates to a subagent for context isolation. + +**The boundary:** if you find yourself rewriting the solution section or changing what the learning recommends, stop — that is Replace, not Update. + +### Judgment Guidelines + +Three guidelines that are easy to get wrong: + +1. **Contradiction = strong Replace signal.** If the learning's recommendation conflicts with current code patterns or a recently verified fix, that is not a minor drift — the learning is actively misleading. Classify as Replace. +2. **Age alone is not a stale signal.** A 2-year-old learning that still matches current code is fine. Only use age as a prompt to inspect more carefully. +3. **Check for successors before archiving.** Before recommending Replace or Archive, look for newer learnings, pattern docs, PRs, or issues covering the same problem space. If successor evidence exists, prefer Replace over Archive so readers are directed to the newer guidance. + +## Phase 1.5: Investigate Pattern Docs + +After reviewing the underlying learning docs, investigate any relevant pattern docs under `docs/solutions/patterns/`. + +Pattern docs are high-leverage — a stale pattern is more dangerous than a stale individual learning because future work may treat it as broadly applicable guidance. Evaluate whether the generalized rule still holds given the refreshed state of the learnings it depends on. + +A pattern doc with no clear supporting learnings is a stale signal — investigate carefully before keeping it unchanged. + +## Subagent Strategy + +Use subagents for context isolation when investigating multiple artifacts — not just because the task sounds complex. Choose the lightest approach that fits: + +| Approach | When to use | +|----------|-------------| +| **Main thread only** | Small scope, short docs | +| **Sequential subagents** | 1-2 artifacts with many supporting files to read | +| **Parallel subagents** | 3+ truly independent artifacts with low overlap | +| **Batched subagents** | Broad sweeps — narrow scope first, then investigate in batches | + +**When spawning any subagent, include this instruction in its task prompt:** + +> Use dedicated file search and read tools (Glob, Grep, Read) for all investigation. Do NOT use shell commands (ls, find, cat, grep, test, bash) for file operations. This avoids permission prompts and is more reliable. + +There are two subagent roles: + +1. **Investigation subagents** — read-only. They must not edit files, create successors, or archive anything. Each returns: file path, evidence, recommended action, confidence, and open questions. These can run in parallel when artifacts are independent. +2. **Replacement subagents** — write a single new learning to replace a stale one. These run **one at a time, sequentially** (each replacement subagent may need to read significant code, and running multiple in parallel risks context exhaustion). The orchestrator handles all archival and metadata updates after each replacement completes. + +The orchestrator merges investigation results, detects contradictions, coordinates replacement subagents, and performs all archival/metadata edits centrally. In interactive mode, it asks the user questions on ambiguous cases. In autonomous mode, it marks ambiguous cases as stale instead. If two artifacts overlap or discuss the same root issue, investigate them together rather than parallelizing. + +## Phase 2: Classify the Right Maintenance Action + +After gathering evidence, assign one recommended action. + +### Keep + +The learning is still accurate and useful. Do not edit the file — report that it was reviewed and remains trustworthy. Only add `last_refreshed` if you are already making a meaningful update for another reason. + +### Update + +The core solution is still valid but references have drifted (paths, class names, links, code snippets, metadata). Apply the fixes directly. + +### Replace + +Choose **Replace** when the learning's core guidance is now misleading — the recommended fix changed materially, the root cause or architecture shifted, or the preferred pattern is different. + +The user may have invoked the refresh months after the original learning was written. Do not ask them for replacement context they are unlikely to have — use agent intelligence to investigate the codebase and synthesize the replacement. + +**Evidence assessment:** + +By the time you identify a Replace candidate, Phase 1 investigation has already gathered significant evidence: the old learning's claims, what the current code actually does, and where the drift occurred. Assess whether this evidence is sufficient to write a trustworthy replacement: + +- **Sufficient evidence** — you understand both what the old learning recommended AND what the current approach is. The investigation found the current code patterns, the new file locations, the changed architecture. → Proceed to write the replacement (see Phase 4 Replace Flow). +- **Insufficient evidence** — the drift is so fundamental that you cannot confidently document the current approach. The entire subsystem was replaced, or the new architecture is too complex to understand from a file scan alone. → Mark as stale in place: + - Add `status: stale`, `stale_reason: [what you found]`, `stale_date: YYYY-MM-DD` to the frontmatter + - Report what evidence you found and what is missing + - Recommend the user run `ce:compound` after their next encounter with that area, when they have fresh problem-solving context + +### Archive + +Choose **Archive** when: + +- The code or workflow no longer exists +- The learning is obsolete and has no modern replacement worth documenting +- The learning is redundant and no longer useful on its own +- There is no meaningful successor evidence suggesting it should be replaced instead + +Action: + +- Move the file to `docs/solutions/_archived/`, preserving directory structure when helpful +- Add: + - `archived_date: YYYY-MM-DD` + - `archive_reason: [why it was archived]` + +### Before archiving: check if the problem domain is still active + +When a learning's referenced files are gone, that is strong evidence — but only that the **implementation** is gone. Before archiving, reason about whether the **problem the learning solves** is still a concern in the codebase: + +- A learning about session token storage where `auth_token.rb` is gone — does the application still handle session tokens? If so, the concept persists under a new implementation. That is Replace, not Archive. +- A learning about a deprecated API endpoint where the entire feature was removed — the problem domain is gone. That is Archive. + +Do not search mechanically for keywords from the old learning. Instead, understand what problem the learning addresses, then investigate whether that problem domain still exists in the codebase. The agent understands concepts — use that understanding to look for where the problem lives now, not where the old code used to be. + +**Auto-archive only when both the implementation AND the problem domain are gone:** + +- the referenced code is gone AND the application no longer deals with that problem domain +- the learning is fully superseded by a clearly better successor +- the document is plainly redundant and adds no distinct value + +If the implementation is gone but the problem domain persists (the app still does auth, still processes payments, still handles migrations), classify as **Replace** — the problem still matters and the current approach should be documented. + +Do not keep a learning just because its general advice is "still sound" — if the specific code it references is gone, the learning misleads readers. But do not archive a learning whose problem domain is still active — that knowledge gap should be filled with a replacement. + +If there is a clearly better successor, strongly consider **Replace** before **Archive** so the old artifact points readers toward the newer guidance. + +## Pattern Guidance + +Apply the same four outcomes (Keep, Update, Replace, Archive) to pattern docs, but evaluate them as **derived guidance** rather than incident-level learnings. Key differences: + +- **Keep**: the underlying learnings still support the generalized rule and examples remain representative +- **Update**: the rule holds but examples, links, scope, or supporting references drifted +- **Replace**: the generalized rule is now misleading, or the underlying learnings support a different synthesis. Base the replacement on the refreshed learning set — do not invent new rules from guesswork +- **Archive**: the pattern is no longer valid, no longer recurring, or fully subsumed by a stronger pattern doc + +If "archive" feels too strong but the pattern should no longer be elevated, reduce its prominence in place if the docs structure supports that. + +## Phase 3: Ask for Decisions + +### Autonomous mode + +**Skip this entire phase. Do not ask any questions. Do not present options. Do not wait for input.** Proceed directly to Phase 4 and execute all actions based on the classifications from Phase 2: + +- Unambiguous Keep, Update, auto-Archive, and Replace (with sufficient evidence) → execute directly +- Ambiguous cases → mark as stale +- Then generate the report (see Output Format) + +### Interactive mode + +Most Updates should be applied directly without asking. Only ask the user when: + +- The right action is genuinely ambiguous (Update vs Replace vs Archive) +- You are about to Archive a document **and** the evidence is not unambiguous (see auto-archive criteria in Phase 2). When auto-archive criteria are met, proceed without asking. +- You are about to create a successor via `ce:compound` + +Do **not** ask questions about whether code changes were intentional, whether the user wants to fix bugs in the code, or other concerns outside doc maintenance. Stay in your lane — doc accuracy. + +#### Question Style + +Always present choices using the platform's interactive question tool (e.g. `AskUserQuestion` in Claude Code, `request_user_input` in Codex). If the environment has no interactive prompt tool, present numbered options in plain text and wait for the user's response before proceeding. + +Question rules: + +- Ask **one question at a time** +- Prefer **multiple choice** +- Lead with the **recommended option** +- Explain the rationale for the recommendation in one concise sentence +- Avoid asking the user to choose from actions that are not actually plausible + +#### Focused Scope + +For a single artifact, present: + +- file path +- 2-4 bullets of evidence +- recommended action + +Then ask: + +```text +This [learning/pattern] looks like a [Update/Keep/Replace/Archive]. + +Why: [one-sentence rationale based on the evidence] + +What would you like to do? + +1. [Recommended action] +2. [Second plausible action] +3. Skip for now +``` + +Do not list all four actions unless all four are genuinely plausible. + +#### Batch Scope + +For several learnings: + +1. Group obvious **Keep** cases together +2. Group obvious **Update** cases together when the fixes are straightforward +3. Present **Replace** cases individually or in very small groups +4. Present **Archive** cases individually unless they are strong auto-archive candidates + +Ask for confirmation in stages: + +1. Confirm grouped Keep/Update recommendations +2. Then handle Replace one at a time +3. Then handle Archive one at a time unless the archive is unambiguous and safe to auto-apply + +#### Broad Scope + +If the user asked for a sweeping refresh, keep the interaction incremental: + +1. Narrow scope first +2. Investigate a manageable batch +3. Present recommendations +4. Ask whether to continue to the next batch + +Do not front-load the user with a full maintenance queue. + +## Phase 4: Execute the Chosen Action + +### Keep Flow + +No file edit by default. Summarize why the learning remains trustworthy. + +### Update Flow + +Apply in-place edits only when the solution is still substantively correct. + +Examples of valid in-place updates: + +- Rename `app/models/auth_token.rb` reference to `app/models/session_token.rb` +- Update `module: AuthToken` to `module: SessionToken` +- Fix outdated links to related docs +- Refresh implementation notes after a directory move + +Examples that should **not** be in-place updates: + +- Fixing a typo with no effect on understanding +- Rewording prose for style alone +- Small cleanup that does not materially improve accuracy or usability +- The old fix is now an anti-pattern +- The system architecture changed enough that the old guidance is misleading +- The troubleshooting path is materially different + +Those cases require **Replace**, not Update. + +### Replace Flow + +Process Replace candidates **one at a time, sequentially**. Each replacement is written by a subagent to protect the main context window. + +**When evidence is sufficient:** + +1. Spawn a single subagent to write the replacement learning. Pass it: + - The old learning's full content + - A summary of the investigation evidence (what changed, what the current code does, why the old guidance is misleading) + - The target path and category (same category as the old learning unless the category itself changed) +2. The subagent writes the new learning following `ce:compound`'s document format: YAML frontmatter (title, category, date, module, component, tags), problem description, root cause, current solution with code examples, and prevention tips. It should use dedicated file search and read tools if it needs additional context beyond what was passed. +3. After the subagent completes, the orchestrator: + - Adds `superseded_by: [new learning path]` to the old learning's frontmatter + - Moves the old learning to `docs/solutions/_archived/` + +**When evidence is insufficient:** + +1. Mark the learning as stale in place: + - Add to frontmatter: `status: stale`, `stale_reason: [what you found]`, `stale_date: YYYY-MM-DD` +2. Report what evidence was found and what is missing +3. Recommend the user run `ce:compound` after their next encounter with that area + +### Archive Flow + +Archive only when a learning is clearly obsolete or redundant. Do not archive a document just because it is old. + +## Output Format + +**The full report MUST be printed as markdown output.** Do not summarize findings internally and then output a one-liner. The report is the deliverable — print every section in full, formatted as readable markdown with headers, tables, and bullet points. + +After processing the selected scope, output the following report: + +```text +Compound Refresh Summary +======================== +Scanned: N learnings + +Kept: X +Updated: Y +Replaced: Z +Archived: W +Skipped: V +Marked stale: S +``` + +Then for EVERY file processed, list: +- The file path +- The classification (Keep/Update/Replace/Archive/Stale) +- What evidence was found +- What action was taken (or recommended) + +For **Keep** outcomes, list them under a reviewed-without-edits section so the result is visible without creating git churn. + +### Autonomous mode output + +In autonomous mode, the report is the sole deliverable — there is no user present to ask follow-up questions, so the report must be self-contained and complete. **Print the full report. Do not abbreviate, summarize, or skip sections.** + +Split actions into two sections: + +**Applied** (writes that succeeded): +- For each **Updated** file: the file path, what references were fixed, and why +- For each **Replaced** file: what the old learning recommended vs what the current code does, and the path to the new successor +- For each **Archived** file: the file path and what referenced code/workflow is gone +- For each **Marked stale** file: the file path, what evidence was found, and why it was ambiguous + +**Recommended** (actions that could not be written — e.g., permission denied): +- Same detail as above, but framed as recommendations for a human to apply +- Include enough context that the user can apply the change manually or re-run the skill interactively + +If all writes succeed, the Recommended section is empty. If no writes succeed (e.g., read-only invocation), all actions appear under Recommended — the report becomes a maintenance plan. + +## Phase 5: Commit Changes + +After all actions are executed and the report is generated, handle committing the changes. Skip this phase if no files were modified (all Keep, or all writes failed). + +### Detect git context + +Before offering options, check: +1. Which branch is currently checked out (main/master vs feature branch) +2. Whether the working tree has other uncommitted changes beyond what compound-refresh modified +3. Recent commit messages to match the repo's commit style + +### Autonomous mode + +Use sensible defaults — no user to ask: + +| Context | Default action | +|---------|---------------| +| On main/master | Create a branch named for what was refreshed (e.g., `docs/refresh-auth-and-ci-learnings`), commit, attempt to open a PR. If PR creation fails, report the branch name. | +| On a feature branch | Commit as a separate commit on the current branch | +| Git operations fail | Include the recommended git commands in the report and continue | + +Stage only the files that compound-refresh modified — not other dirty files in the working tree. + +### Interactive mode + +First, run `git branch --show-current` to determine the current branch. Then present the correct options based on the result. Stage only compound-refresh files regardless of which option the user picks. + +**If the current branch is main, master, or the repo's default branch:** + +1. Create a branch, commit, and open a PR (recommended) — the branch name should be specific to what was refreshed, not generic (e.g., `docs/refresh-auth-learnings` not `docs/compound-refresh`) +2. Commit directly to `{current branch name}` +3. Don't commit — I'll handle it + +**If the current branch is a feature branch, clean working tree:** + +1. Commit to `{current branch name}` as a separate commit (recommended) +2. Create a separate branch and commit +3. Don't commit + +**If the current branch is a feature branch, dirty working tree (other uncommitted changes):** + +1. Commit only the compound-refresh changes to `{current branch name}` (selective staging — other dirty files stay untouched) +2. Don't commit + +### Commit message + +Write a descriptive commit message that: +- Summarizes what was refreshed (e.g., "update 3 stale learnings, archive 1 obsolete doc") +- Follows the repo's existing commit conventions (check recent git log for style) +- Is succinct — the details are in the changed files themselves + +## Relationship to ce:compound + +- `ce:compound` captures a newly solved, verified problem +- `ce:compound-refresh` maintains older learnings as the codebase evolves + +Use **Replace** only when the refresh process has enough real evidence to write a trustworthy successor. When evidence is insufficient, mark as stale and recommend `ce:compound` for when the user next encounters that problem area. diff --git a/plugins/compound-engineering/skills/ce-compound/SKILL.md b/plugins/compound-engineering/skills/ce-compound/SKILL.md index ca94c50f..98ef7b34 100644 --- a/plugins/compound-engineering/skills/ce-compound/SKILL.md +++ b/plugins/compound-engineering/skills/ce-compound/SKILL.md @@ -89,7 +89,8 @@ Launch these subagents IN PARALLEL. Each returns text data to the orchestrator. - Searches `docs/solutions/` for related documentation - Identifies cross-references and links - Finds related GitHub issues - - Returns: Links and relationships + - Flags any related learning or pattern docs that may now be stale, contradicted, or overly broad + - Returns: Links, relationships, and any refresh candidates #### 4. **Prevention Strategist** - Develops prevention strategies @@ -121,6 +122,53 @@ The orchestrating agent (main conversation) performs these steps: +### Phase 2.5: Selective Refresh Check + +After writing the new learning, decide whether this new solution is evidence that older docs should be refreshed. + +`ce:compound-refresh` is **not** a default follow-up. Use it selectively when the new learning suggests an older learning or pattern doc may now be inaccurate. + +It makes sense to invoke `ce:compound-refresh` when one or more of these are true: + +1. A related learning or pattern doc recommends an approach that the new fix now contradicts +2. The new fix clearly supersedes an older documented solution +3. The current work involved a refactor, migration, rename, or dependency upgrade that likely invalidated references in older docs +4. A pattern doc now looks overly broad, outdated, or no longer supported by the refreshed reality +5. The Related Docs Finder surfaced high-confidence refresh candidates in the same problem space + +It does **not** make sense to invoke `ce:compound-refresh` when: + +1. No related docs were found +2. Related docs still appear consistent with the new learning +3. The overlap is superficial and does not change prior guidance +4. Refresh would require a broad historical review with weak evidence + +Use these rules: + +- If there is **one obvious stale candidate**, invoke `ce:compound-refresh` with a narrow scope hint after the new learning is written +- If there are **multiple candidates in the same area**, ask the user whether to run a targeted refresh for that module, category, or pattern set +- If context is already tight or you are in compact-safe mode, do not expand into a broad refresh automatically; instead recommend `ce:compound-refresh` as the next step with a scope hint + +When invoking or recommending `ce:compound-refresh`, be explicit about the argument to pass. Prefer the narrowest useful scope: + +- **Specific file** when one learning or pattern doc is the likely stale artifact +- **Module or component name** when several related docs may need review +- **Category name** when the drift is concentrated in one solutions area +- **Pattern filename or pattern topic** when the stale guidance lives in `docs/solutions/patterns/` + +Examples: + +- `/ce:compound-refresh plugin-versioning-requirements` +- `/ce:compound-refresh payments` +- `/ce:compound-refresh performance-issues` +- `/ce:compound-refresh critical-patterns` + +A single scope hint may still expand to multiple related docs when the change is cross-cutting within one domain, category, or pattern area. + +Do not invoke `ce:compound-refresh` without an argument unless the user explicitly wants a broad sweep. + +Always capture the new learning first. Refresh is a targeted maintenance follow-up, not a prerequisite for documentation. + ### Phase 3: Optional Enhancement **WAIT for Phase 2 to complete before proceeding.** @@ -173,6 +221,8 @@ re-run /compound in a fresh session. **No subagents are launched. No parallel tasks. One file written.** +In compact-safe mode, only suggest `ce:compound-refresh` if there is an obvious narrow refresh target. Do not broaden into a large refresh sweep from a compact-safe session. + --- ## What It Captures