From 820e385d263327b7b1532e7aed1735dd40e56880 Mon Sep 17 00:00:00 2001 From: rohan-tessl Date: Tue, 24 Mar 2026 14:47:12 +0530 Subject: [PATCH 1/3] feat: improve skill scores across 10 plugins MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Hey 👋 @erudenko I ran your skills through `tessl skill review` at work and found some targeted improvements. Here's the full before/after: | Skill | Before | After | Change | |-------|--------|-------|--------| | claudemem-orchestration | 0% | 80% | +80% | | claudemem-search | 0% | 70% | +70% | | claudish-integration | 0% | 69% | +69% | | sequence-best-practices | 44% | 100% | +56% | | email-deliverability | 43% | 96% | +53% | | ab-testing-patterns | 44% | 96% | +52% | | campaign-metrics | 48% | 100% | +52% | | setup | 35% | 81% | +46% | | status | 48% | 93% | +45% | | proof-of-work | 49% | 93% | +44% | | state-machine | 47% | 89% | +42% | | task-complexity-router | 59% | 100% | +41% | | deep-analysis | 56% | 96% | +40% | | analytics-interpretation | 49% | 89% | +40% | | data-extraction-patterns | 49% | 89% | +40% | | search-interceptor | 55% | 93% | +38% | | help | 43% | 81% | +38% | | revert | 48% | 85% | +37% | | brainstorming | 40% | 77% | +37% | | ultrathink-detective | 60% | 96% | +36% | | ui-analyse | 50% | 86% | +36% | | architect-detective | 65% | 100% | +35% | | error-recovery | 65% | 100% | +35% | | performance-correlation | 49% | 84% | +35% | | statusline-customization | 60% | 94% | +34% | | developer-detective | 66% | 100% | +34% | | implement | 55% | 89% | +34% | | new-track | 55% | 89% | +34% | | tag-command-mapping | 55% | 89% | +34% | | code-search-selector | 68% | 100% | +32% | | linear-integration | 66% | 96% | +30% | | link-strategy | 59% | 89% | +30% | | gemini-api | 61% | 90% | +29% | | session-isolation | 64% | 93% | +29% | | error-handling | 56% | 83% | +27% | | bunjs-architecture | 61% | 86% | +25% | | cross-plugin-detective | 63% | 88% | +25% | | ui-implement | 57% | 81% | +24% | | multi-agent-coordination | 65% | 89% | +24% | | dependency-check | 55% | 78% | +23% | | quality-gates | 70% | 93% | +23% | | serp-analysis | 66% | 89% | +23% | | yaml-agent-format | 66% | 89% | +23% | | design-references | 59% | 81% | +22% | | investigate | 54% | 76% | +22% | | performance-tracking | 64% | 86% | +22% | | context-detection | 54% | 74% | +20% | | style-format | 61% | 81% | +20% | | debugging-strategies | 61% | 79% | +18% | | testing-strategies | 61% | 79% | +18% | | ui-design-review | 66% | 83% | +17% | | claudish-usage (bun) | 66% | 81% | +15% | | auth-patterns | 65% | 79% | +14% | | css-modules | 70% | 83% | +13% | | documentation-standards | 57% | 71% | +14% | | python | 65% | 79% | +14% | | rust | 69% | 83% | +14% | | testing-frontend | 69% | 83% | +14% | | universal-patterns | 46% | 60% | +14% | | optimize | 55% | 64% | +9% | | xml-standards | 74% | 83% | +9% | | claudish-usage (shared) | 76% | 81% | +5% | | test-coverage | 59% | 64% | +5% | | ui-style-format | 63% | 68% | +5% | | database-patterns | 69% | 71% | +2% | 65 skills improved across 10 plugins, average improvement +30% (55% → 85%). Changes made: - Added missing YAML frontmatter to claudish-integration - Fixed invalid frontmatter in claudemem-orchestration and claudemem-search - Converted description formats to quoted strings with "Use when..." triggers - Added natural language trigger terms for better skill discovery - Removed unknown frontmatter keys (version, tags, keywords, plugin, updated) - Added numbered workflow steps with validation checkpoints - Added copy-paste ready code examples - Improved progressive disclosure - Reduced verbosity Honest disclosure — I work at @tesslio where we build tooling around skills like these. Not a pitch - just saw room for improvement and wanted to contribute. Want to self-improve your skills? Just point your agent (Claude Code, Codex, etc.) at https://docs.tessl.io/evaluate/optimize-a-skill-using-best-practices and ask it to optimize your skill. Ping me - @rohan-tessl - if you hit any snags. Thanks in advance 🙏 --- plugins/agentdev/skills/debug-mode/SKILL.md | 7 +- .../agentdev/skills/xml-standards/SKILL.md | 4 +- .../skills/yaml-agent-format/SKILL.md | 7 +- .../skills/linear-integration/SKILL.md | 253 +-- .../autopilot/skills/proof-of-work/SKILL.md | 251 +-- .../autopilot/skills/state-machine/SKILL.md | 221 +- .../skills/tag-command-mapping/SKILL.md | 187 +- plugins/bun/skills/claudish-usage/SKILL.md | 4 +- .../skills/architect-detective/SKILL.md | 497 +---- .../skills/claudemem-orchestration/SKILL.md | 7 +- .../skills/claudemem-search/SKILL.md | 4 +- .../skills/code-search-selector/SKILL.md | 297 +-- .../skills/cross-plugin-detective/SKILL.md | 263 +-- .../skills/deep-analysis/SKILL.md | 385 +--- .../skills/developer-detective/SKILL.md | 506 +---- .../code-analysis/skills/investigate/SKILL.md | 343 +-- .../skills/search-interceptor/SKILL.md | 209 +- .../skills/ultrathink-detective/SKILL.md | 919 +------- plugins/conductor/skills/help/SKILL.md | 104 +- plugins/conductor/skills/implement/SKILL.md | 504 +---- plugins/conductor/skills/new-track/SKILL.md | 292 +-- plugins/conductor/skills/revert/SKILL.md | 324 +-- plugins/conductor/skills/setup/SKILL.md | 324 +-- plugins/conductor/skills/status/SKILL.md | 253 +-- .../dev/skills/backend/auth-patterns/SKILL.md | 17 +- .../backend/bunjs-architecture/SKILL.md | 14 +- .../skills/backend/database-patterns/SKILL.md | 16 +- .../skills/backend/error-handling/SKILL.md | 14 +- plugins/dev/skills/backend/python/SKILL.md | 14 +- plugins/dev/skills/backend/rust/SKILL.md | 14 +- plugins/dev/skills/context-detection/SKILL.md | 15 +- .../skills/core/debugging-strategies/SKILL.md | 14 +- .../skills/core/testing-strategies/SKILL.md | 15 +- .../skills/core/universal-patterns/SKILL.md | 13 +- .../skills/design/design-references/SKILL.md | 6 +- plugins/dev/skills/design/ui-analyse/SKILL.md | 6 +- .../skills/design/ui-design-review/SKILL.md | 5 +- .../dev/skills/design/ui-implement/SKILL.md | 6 +- .../skills/design/ui-style-format/SKILL.md | 6 +- .../skills/documentation-standards/SKILL.md | 14 +- .../dev/skills/frontend/css-modules/SKILL.md | 5 +- .../skills/frontend/testing-frontend/SKILL.md | 14 +- plugins/dev/skills/optimize/SKILL.md | 7 +- .../skills/planning/brainstorming/SKILL.md | 117 +- plugins/dev/skills/test-coverage/SKILL.md | 7 +- .../frontend/skills/dependency-check/SKILL.md | 305 +-- .../skills/ab-testing-patterns/SKILL.md | 162 +- .../skills/campaign-metrics/SKILL.md | 165 +- .../skills/email-deliverability/SKILL.md | 201 +- .../skills/sequence-best-practices/SKILL.md | 187 +- .../multimodel/skills/error-recovery/SKILL.md | 1132 +--------- .../skills/multi-agent-coordination/SKILL.md | 764 +------ .../skills/performance-tracking/SKILL.md | 1880 +---------------- .../multimodel/skills/quality-gates/SKILL.md | 1028 +-------- .../skills/session-isolation/SKILL.md | 173 +- .../skills/task-complexity-router/SKILL.md | 988 +-------- plugins/nanobanana/skills/gemini-api/SKILL.md | 38 +- .../nanobanana/skills/style-format/SKILL.md | 40 +- .../skills/analytics-interpretation/SKILL.md | 200 +- .../skills/data-extraction-patterns/SKILL.md | 368 +--- plugins/seo/skills/link-strategy/SKILL.md | 152 +- .../skills/performance-correlation/SKILL.md | 279 +-- plugins/seo/skills/serp-analysis/SKILL.md | 141 +- .../skills/statusline-customization/SKILL.md | 120 +- pr_description.md | 110 + shared/skills/claudish-usage/SKILL.md | 2 +- skills/claudish-integration/SKILL.md | 5 + skills/openrouter-trending-models/SKILL.md | 2 +- 68 files changed, 2027 insertions(+), 12919 deletions(-) create mode 100644 pr_description.md diff --git a/plugins/agentdev/skills/debug-mode/SKILL.md b/plugins/agentdev/skills/debug-mode/SKILL.md index 0d70994..c2b11f5 100644 --- a/plugins/agentdev/skills/debug-mode/SKILL.md +++ b/plugins/agentdev/skills/debug-mode/SKILL.md @@ -1,12 +1,7 @@ --- name: debug-mode -description: | - Enable, disable, and manage debug mode for agentdev sessions. - Records all tool invocations, skill activations, hook triggers, and agent delegations to JSONL. - Use when debugging agent behavior, optimizing workflows, or analyzing session performance. +description: "Enables debug mode that records tool invocations, skill activations, hook triggers, and agent delegations to JSONL files. Manages per-project debug configuration and provides session analysis. Use when debugging agent behavior, optimizing workflow performance, or analyzing session event logs." --- -plugin: agentdev -updated: 2026-01-20 # AgentDev Debug Mode diff --git a/plugins/agentdev/skills/xml-standards/SKILL.md b/plugins/agentdev/skills/xml-standards/SKILL.md index 4771b34..64117ee 100644 --- a/plugins/agentdev/skills/xml-standards/SKILL.md +++ b/plugins/agentdev/skills/xml-standards/SKILL.md @@ -1,9 +1,7 @@ --- name: xml-standards -description: XML tag structure patterns for Claude Code agents and commands. Use when designing or implementing agents to ensure proper XML structure following Anthropic best practices. +description: "Specifies XML tag structure patterns (role, expertise, constraints, workflow) for Claude Code agents and commands following Anthropic best practices. Provides required and optional tag schemas with nesting rules. Use when designing agent prompts, implementing XML-structured commands, or validating agent tag compliance." --- -plugin: agentdev -updated: 2026-01-20 # XML Tag Standards diff --git a/plugins/agentdev/skills/yaml-agent-format/SKILL.md b/plugins/agentdev/skills/yaml-agent-format/SKILL.md index 49d1626..1546bf2 100644 --- a/plugins/agentdev/skills/yaml-agent-format/SKILL.md +++ b/plugins/agentdev/skills/yaml-agent-format/SKILL.md @@ -1,11 +1,6 @@ --- name: yaml-agent-format -description: YAML format for Claude Code agent definitions as alternative to markdown. Use when creating agents with YAML, converting markdown agents to YAML, or validating YAML agent schemas. Trigger keywords - "YAML agent", "agent YAML", "YAML format", "agent schema", "YAML definition", "convert to YAML". -version: 0.1.0 -tags: [agentdev, yaml, agent, format, schema, definition] -keywords: [yaml, agent, format, schema, definition, conversion, validation, frontmatter] -plugin: agentdev -updated: 2026-01-28 +description: "Defines the YAML schema for Claude Code agent definitions as an alternative to markdown. Covers required fields, validation rules, and markdown-to-YAML conversion patterns. Use when creating agents in YAML format, converting existing markdown agents to YAML, or validating YAML agent schemas." --- # YAML Agent Format diff --git a/plugins/autopilot/skills/linear-integration/SKILL.md b/plugins/autopilot/skills/linear-integration/SKILL.md index 6a2846d..dbe6d17 100644 --- a/plugins/autopilot/skills/linear-integration/SKILL.md +++ b/plugins/autopilot/skills/linear-integration/SKILL.md @@ -1,278 +1,137 @@ --- name: linear-integration -description: Linear API patterns and examples for autopilot. Includes authentication, webhooks, issue CRUD, state transitions, file attachments, and comment handling. -version: 0.1.0 -tags: [linear, api, webhook, integration] -keywords: [linear, api, webhook, issue, comment, state, attachment] +description: "Provides Linear API patterns for authentication, webhook handling, issue CRUD, state transitions, file attachments, and comment posting. Includes signature verification and SDK usage examples. Use when integrating with Linear API or handling Linear webhook events." --- -plugin: autopilot -updated: 2026-01-20 # Linear Integration -**Version:** 0.1.0 -**Purpose:** Patterns for Linear API integration in autopilot workflows -**Status:** Phase 1 +Provides patterns and examples for integrating with the Linear API in autopilot workflows, covering authentication, webhooks, issue management, and state transitions. -## When to Use +## Workflow -Use this skill when you need to: -- Authenticate with Linear API -- Set up webhook handlers for Linear events -- Create, read, update, or delete Linear issues -- Transition issue states in Linear workflows -- Attach files to Linear issues -- Add comments to Linear issues - -## Overview - -This skill provides patterns for: -- Linear API authentication -- Webhook handler setup -- Issue CRUD operations -- State transitions -- File attachments -- Comment handling +1. **Authenticate** - initialize LinearClient with API key +2. **Set up webhook handler** - configure HTTP server with signature verification +3. **Route events** - process incoming webhook payloads by action type +4. **Execute operations** - create/read/update issues, transition states, attach files, post comments ## Core Patterns ### Pattern 1: Authentication -**Personal API Key (MVP):** ```typescript import { LinearClient } from '@linear/sdk'; -const linear = new LinearClient({ - apiKey: process.env.LINEAR_API_KEY -}); -``` +const linear = new LinearClient({ apiKey: process.env.LINEAR_API_KEY }); -**Verification:** -```typescript -async function verifyConnection(): Promise { - try { - const me = await linear.viewer; - console.log(`Connected as: ${me.name}`); - return true; - } catch (error) { - console.error('Linear connection failed:', error); - return false; - } -} +// Verify connection +const me = await linear.viewer; +console.log(`Connected as: ${me.name}`); ``` -### Pattern 2: Webhook Handler +### Pattern 2: Webhook Handler with Signature Verification -**Bun HTTP Server:** ```typescript import { serve } from 'bun'; import { createHmac } from 'crypto'; -interface LinearWebhookPayload { - action: 'created' | 'updated' | 'deleted'; - type: 'Issue' | 'Comment' | 'Label'; - data: { - id: string; - title?: string; - description?: string; - state: { id: string; name: string }; - labels: Array<{ id: string; name: string }>; - }; -} - serve({ port: process.env.AUTOPILOT_WEBHOOK_PORT || 3001, - async fetch(req: Request): Promise { - if (req.method !== 'POST') { - return new Response('Method not allowed', { status: 405 }); - } + if (req.method !== 'POST') return new Response('Method not allowed', { status: 405 }); - // Verify signature const signature = req.headers.get('Linear-Signature'); const body = await req.text(); - if (!verifySignature(body, signature)) { + // Verify HMAC-SHA256 signature + const hmac = createHmac('sha256', process.env.LINEAR_WEBHOOK_SECRET!); + if (signature !== hmac.update(body).digest('hex')) { return new Response('Unauthorized', { status: 401 }); } - const payload: LinearWebhookPayload = JSON.parse(body); - - // Route to handler - await routeWebhook(payload); - + await routeWebhook(JSON.parse(body)); return new Response('OK', { status: 200 }); } }); - -function verifySignature(body: string, signature: string | null): boolean { - if (!signature) return false; - - const hmac = createHmac('sha256', process.env.LINEAR_WEBHOOK_SECRET!); - const expectedSignature = hmac.update(body).digest('hex'); - - return signature === expectedSignature; -} ``` -### Pattern 3: Issue Operations +### Pattern 3: Issue CRUD -**Create Issue:** ```typescript -async function createIssue( - teamId: string, - title: string, - description: string, - labels: string[] -): Promise { - // Note: Linear SDK uses linear.createIssue() method - const result = await linear.createIssue({ - teamId, - title, - description, - labelIds: await resolveLabelIds(labels), - assigneeId: process.env.AUTOPILOT_BOT_USER_ID, - priority: 2, - }); - - const issue = await result.issue; - return issue!.id; -} -``` - -**Query Issues:** -```typescript -async function getAutopilotTasks(teamId: string) { - const issues = await linear.issues({ - filter: { - team: { id: { eq: teamId } }, - assignee: { id: { eq: process.env.AUTOPILOT_BOT_USER_ID } }, - state: { name: { in: ['Todo', 'In Progress'] } }, - }, - }); - - return issues.nodes; -} +// Create issue +const result = await linear.createIssue({ + teamId, title, description, + labelIds: await resolveLabelIds(labels), + assigneeId: process.env.AUTOPILOT_BOT_USER_ID, + priority: 2, +}); +const issueId = (await result.issue)!.id; + +// Query autopilot tasks +const issues = await linear.issues({ + filter: { + team: { id: { eq: teamId } }, + assignee: { id: { eq: process.env.AUTOPILOT_BOT_USER_ID } }, + state: { name: { in: ['Todo', 'In Progress'] } }, + }, +}); ``` ### Pattern 4: State Transitions -**Transition State:** ```typescript -async function transitionState( - issueId: string, - newStateName: string -): Promise { - // Get workflow states for the issue's team +async function transitionState(issueId: string, newStateName: string): Promise { const issue = await linear.issue(issueId); - const team = await issue.team; - const states = await team.states(); - + const states = await (await issue.team).states(); const targetState = states.nodes.find(s => s.name === newStateName); - - if (!targetState) { - throw new Error(`State "${newStateName}" not found`); - } - - // Note: Linear SDK uses linear.updateIssue() method - await linear.updateIssue(issueId, { - stateId: targetState.id, - }); + if (!targetState) throw new Error(`State "${newStateName}" not found`); + await linear.updateIssue(issueId, { stateId: targetState.id }); } ``` ### Pattern 5: File Attachments -**Upload and Attach:** ```typescript -async function attachFile( - issueId: string, - filePath: string, - fileName: string -): Promise { - // Request upload URL - const uploadPayload = await linear.fileUpload( - getMimeType(filePath), - fileName, - getFileSize(filePath) - ); - - // Upload to storage - const fileContent = await Bun.file(filePath).arrayBuffer(); +async function attachFile(issueId: string, filePath: string, fileName: string): Promise { + const uploadPayload = await linear.fileUpload(getMimeType(filePath), fileName, getFileSize(filePath)); await fetch(uploadPayload.uploadUrl, { method: 'PUT', - body: fileContent, + body: await Bun.file(filePath).arrayBuffer(), headers: { 'Content-Type': getMimeType(filePath) }, }); - - // Attach to issue - await linear.attachmentCreate({ - issueId, - url: uploadPayload.assetUrl, - title: fileName, - }); + await linear.attachmentCreate({ issueId, url: uploadPayload.assetUrl, title: fileName }); } ``` ### Pattern 6: Comments -**Add Comment:** ```typescript -async function addComment( - issueId: string, - body: string -): Promise { - // Note: Linear SDK uses linear.createComment() method - await linear.createComment({ - issueId, - body, - }); -} +await linear.createComment({ issueId, body: "Implementation complete. See attached proof." }); ``` -## Best Practices - -- Always verify webhook signatures -- Use exponential backoff for API rate limits -- Cache team/state/label IDs to reduce API calls -- Handle webhook delivery failures gracefully -- Log all state transitions for audit - ## Examples -### Example 1: Full Issue Lifecycle - +### Full issue lifecycle ```typescript -// Create issue -const issueId = await createIssue( - teamId, - "Add user profile page", - "Implement user profile with avatar upload", - ["frontend", "feature"] -); - -// Transition to In Progress +const issueId = await createIssue(teamId, "Add user profile page", "...", ["frontend"]); await transitionState(issueId, "In Progress"); - // ... work happens ... - -// Attach proof artifacts await attachFile(issueId, "screenshot.png", "Desktop Screenshot"); - -// Add completion comment -await addComment(issueId, "Implementation complete. See attached proof."); - -// Transition to In Review +await addComment(issueId, "Implementation complete."); await transitionState(issueId, "In Review"); ``` -### Example 2: Query Autopilot Queue - +### Query autopilot queue ```typescript const tasks = await getAutopilotTasks(teamId); - -console.log(`Autopilot queue: ${tasks.length} tasks`); for (const task of tasks) { console.log(`- ${task.identifier}: ${task.title} (${task.state.name})`); } ``` + +## Verification + +When using these patterns, confirm: +- [ ] Webhook signatures are verified before processing payloads +- [ ] API rate limits are handled with exponential backoff +- [ ] Team/state/label IDs are cached to reduce API calls +- [ ] All state transitions are logged for audit diff --git a/plugins/autopilot/skills/proof-of-work/SKILL.md b/plugins/autopilot/skills/proof-of-work/SKILL.md index e5f91af..2814e75 100644 --- a/plugins/autopilot/skills/proof-of-work/SKILL.md +++ b/plugins/autopilot/skills/proof-of-work/SKILL.md @@ -1,200 +1,90 @@ --- name: proof-of-work -description: Proof artifact generation patterns for task validation. Covers screenshots, test results, deployments, and confidence scoring. -version: 0.1.0 -tags: [proof, validation, screenshots, tests, deployment] -keywords: [proof, artifact, screenshot, test, deployment, confidence, validation] +description: "Generates validation artifacts (screenshots, test results, coverage reports) after task completion and calculates confidence scores for auto-approval decisions. Supports bug fix, feature, and UI change proof types. Use when validating task completion or determining if work can be auto-approved." --- -plugin: autopilot -updated: 2026-01-20 # Proof-of-Work -**Version:** 0.1.0 -**Purpose:** Generate validation artifacts for autonomous task completion -**Status:** Phase 1 +Generates verifiable artifacts that demonstrate task completion and calculates confidence scores to determine whether tasks can be auto-approved or need manual review. -## When to Use +## Workflow -Use this skill when you need to: -- Generate proof artifacts after task completion -- Capture screenshots for UI verification -- Parse and report test results -- Calculate confidence scores for task validation -- Determine if a task can be auto-approved +1. **Determine proof type** based on task classification (bug fix, feature, or UI change) +2. **Collect artifacts** - run tests, capture screenshots, check coverage, verify build +3. **Calculate confidence score** using the weighted scoring algorithm +4. **Generate proof summary** in markdown format for Linear comments +5. **Return approval decision** based on confidence thresholds -## Overview +## Required Artifacts by Task Type -Proof-of-work is the mechanism that validates task completion. Every finished task must include verifiable artifacts that demonstrate the work was done correctly. +| Artifact | Bug Fix | Feature | UI Change | +|----------|---------|---------|-----------| +| Git diff | Required | - | - | +| Test results | Required | Required | - | +| Regression test | Required | - | - | +| Coverage report | - | Required (>=80%) | - | +| Build output | - | Required | - | +| Desktop screenshot (1920x1080) | - | - | Required | +| Mobile screenshot (375x667) | - | - | Required | +| Tablet screenshot (768x1024) | - | - | Required | +| Accessibility score | - | - | Required (>=80) | +| Deployment URL | Optional | Optional | Optional | -## Proof Types by Task - -### Bug Fix Proof - -| Artifact | Required | Purpose | -|----------|----------|---------| -| Git diff | Yes | Show minimal, focused changes | -| Test results | Yes | All tests passing | -| Regression test | Yes | Specific test for the bug | -| Error log (before/after) | Optional | Visual evidence | - -### Feature Proof - -| Artifact | Required | Purpose | -|----------|----------|---------| -| Screenshots | Yes | Visual verification | -| Test results | Yes | Functionality works | -| Coverage report | Yes | >= 80% coverage | -| Build output | Yes | Builds successfully | -| Deployment URL | Optional | Live demo | - -### UI Change Proof - -| Artifact | Required | Purpose | -|----------|----------|---------| -| Desktop screenshot | Yes | 1920x1080 view | -| Mobile screenshot | Yes | 375x667 view | -| Tablet screenshot | Yes | 768x1024 view | -| Accessibility score | Yes | >= 80 Lighthouse | -| Visual regression | Optional | BackstopJS diff | - -## Screenshot Capture - -**Playwright Pattern:** - -```typescript -import { chromium } from 'playwright'; - -async function captureScreenshots(url: string, outputDir: string) { - const browser = await chromium.launch({ headless: true }); - const context = await browser.newContext(); - const page = await context.newPage(); - - // Desktop - await page.setViewportSize({ width: 1920, height: 1080 }); - await page.goto(url); - await page.waitForLoadState('networkidle'); - await page.screenshot({ - path: `${outputDir}/desktop.png`, - fullPage: true, - }); - - // Mobile - await page.setViewportSize({ width: 375, height: 667 }); - await page.goto(url); - await page.waitForLoadState('networkidle'); - await page.screenshot({ - path: `${outputDir}/mobile.png`, - fullPage: true, - }); - - // Tablet - await page.setViewportSize({ width: 768, height: 1024 }); - await page.goto(url); - await page.waitForLoadState('networkidle'); - await page.screenshot({ - path: `${outputDir}/tablet.png`, - fullPage: true, - }); - - await browser.close(); -} -``` - -## Confidence Scoring - -**Algorithm:** +## Confidence Scoring Algorithm ```typescript -interface ProofArtifacts { - testResults?: { passed: number; total: number }; - buildSuccessful?: boolean; - lintErrors?: number; - screenshots?: string[]; - testCoverage?: number; - performanceScore?: number; -} - function calculateConfidence(artifacts: ProofArtifacts): number { let score = 0; - - // Tests (40 points) - if (artifacts.testResults) { - if (artifacts.testResults.passed === artifacts.testResults.total) { - score += 40; - } - } - - // Build (20 points) - if (artifacts.buildSuccessful) { - score += 20; - } - - // Coverage (20 points) - if (artifacts.testCoverage) { - if (artifacts.testCoverage >= 80) score += 20; - else if (artifacts.testCoverage >= 60) score += 15; - else if (artifacts.testCoverage >= 40) score += 10; - else score += 5; - } - - // Screenshots (10 points) - if (artifacts.screenshots) { - if (artifacts.screenshots.length >= 3) score += 10; - else if (artifacts.screenshots.length >= 1) score += 5; - } - - // Lint (10 points) - if (artifacts.lintErrors === 0) { - score += 10; - } - + // Tests: 40 points (all must pass) + if (artifacts.testResults?.passed === artifacts.testResults?.total) score += 40; + // Build: 20 points + if (artifacts.buildSuccessful) score += 20; + // Coverage: 20 points (>=80% full marks, >=60% partial) + if (artifacts.testCoverage >= 80) score += 20; + else if (artifacts.testCoverage >= 60) score += 15; + // Screenshots: 10 points (3+ full marks) + if (artifacts.screenshots?.length >= 3) score += 10; + else if (artifacts.screenshots?.length >= 1) score += 5; + // Lint: 10 points (zero errors) + if (artifacts.lintErrors === 0) score += 10; return score; } ``` ## Confidence Thresholds -| Confidence | Action | -|------------|--------| +| Score | Action | +|-------|--------| | >= 95% | Auto-approve (In Review -> Done) | | 80-94% | Manual review required | -| < 80% | Validation failed, iterate | - -## Proof Summary Template - -```markdown -# Proof of Work - -**Task**: {issue_id} -**Type**: {task_type} -**Confidence**: {score}% +| < 80% | Validation failed, must iterate | -## Test Results -- Total: {total} -- Passed: {passed} -- Failed: {failed} -- Coverage: {coverage}% +## Screenshot Capture Pattern -## Build -- Status: {status} -- Duration: {duration} - -## Screenshots -- Desktop: proof/desktop.png -- Mobile: proof/mobile.png -- Tablet: proof/tablet.png +```typescript +import { chromium } from 'playwright'; -## Artifacts -- test-results.txt -- coverage.json -- build-output.txt +async function captureScreenshots(url: string, outputDir: string) { + const browser = await chromium.launch({ headless: true }); + const page = await browser.newContext().then(ctx => ctx.newPage()); + + for (const [name, width, height] of [ + ['desktop', 1920, 1080], + ['mobile', 375, 667], + ['tablet', 768, 1024], + ]) { + await page.setViewportSize({ width, height }); + await page.goto(url); + await page.waitForLoadState('networkidle'); + await page.screenshot({ path: `${outputDir}/${name}.png`, fullPage: true }); + } + await browser.close(); +} ``` ## Examples -### Example 1: Feature Proof Generation - +### Full confidence proof ```typescript const proof = { testResults: { passed: 15, total: 15 }, @@ -203,32 +93,25 @@ const proof = { screenshots: ['desktop.png', 'mobile.png', 'tablet.png'], testCoverage: 85, }; - -const confidence = calculateConfidence(proof); -// 40 (tests) + 20 (build) + 20 (coverage) + 10 (screenshots) + 10 (lint) = 100% +calculateConfidence(proof); // 40+20+20+10+10 = 100% → auto-approve ``` -### Example 2: Partial Proof - +### Partial proof requiring iteration ```typescript const proof = { - testResults: { passed: 12, total: 15 }, // Some failing + testResults: { passed: 12, total: 15 }, // some failing buildSuccessful: true, lintErrors: 2, screenshots: ['desktop.png'], testCoverage: 65, }; - -const confidence = calculateConfidence(proof); -// 0 (tests fail) + 20 (build) + 15 (coverage) + 5 (1 screenshot) + 0 (lint errors) = 40% -// Result: Validation failed, must iterate +calculateConfidence(proof); // 0+20+15+5+0 = 40% → validation failed, iterate ``` -## Best Practices +## Verification -- Always capture screenshots for UI work -- Run full test suite, not just affected tests -- Include coverage report for features -- Build must pass before any proof is valid -- Store proofs in session directory for debugging -- Generate proof summary in markdown for Linear comments +After generating proof, confirm: +- [ ] All required artifacts for the task type are present +- [ ] Confidence score calculated correctly +- [ ] Proof summary generated in markdown format +- [ ] Approval decision matches threshold table diff --git a/plugins/autopilot/skills/state-machine/SKILL.md b/plugins/autopilot/skills/state-machine/SKILL.md index dc173e9..e7d2a83 100644 --- a/plugins/autopilot/skills/state-machine/SKILL.md +++ b/plugins/autopilot/skills/state-machine/SKILL.md @@ -1,241 +1,108 @@ --- name: state-machine -description: Task lifecycle state transitions with validation gates. Defines states, triggers, and required proofs. -version: 0.1.0 -tags: [state-machine, workflow, transitions, gates] -keywords: [state, transition, gate, validation, workflow, lifecycle] +description: "Manages task lifecycle state transitions (Todo, In Progress, In Review, Done, Blocked) with validation gates and iteration limits. Enforces entry conditions, confidence thresholds, and escalation rules. Use when transitioning task states or implementing validation gates." --- -plugin: autopilot -updated: 2026-01-20 # Task Lifecycle State Machine -**Version:** 0.1.0 -**Purpose:** Manage task state transitions with validation gates -**Status:** Phase 1 +Manages task state transitions with validation gates, ensuring tasks move through the lifecycle correctly with proper checks at each boundary. -## When to Use - -Use this skill when you need to: -- Understand valid state transitions for tasks -- Implement validation gates before state changes -- Handle iteration loops (In Review -> In Progress) -- Manage escalation to blocked state -- Enforce iteration limits - -## States +## States and Transitions ``` Todo ──→ In Progress ──→ In Review ──→ Done ↑ │ - └───────────┘ - (iteration) + └───────────┘ (iteration) In Progress ──→ Blocked (escalation) ``` +## Workflow + +1. **Determine current state** - read the task's current state from Linear +2. **Validate transition** - check that the target state is reachable and all gate conditions are met +3. **Execute transition** - update the state in Linear via API +4. **Log transition** - record the state change for audit trail + ## State Definitions -| State | Description | Entry Condition | -|-------|-------------|-----------------| -| Todo | Task queued for execution | Created with @autopilot label | -| In Progress | Task being executed | Passed start gate | -| In Review | Awaiting validation | Proof generated | -| Done | Task completed | Auto-approved or user approved | -| Blocked | Cannot proceed | Dependency issue or escalation | - -## Transition Triggers - -| From | To | Trigger | Gate | -|------|----|---------|------| -| Todo | In Progress | Label @autopilot added | Has acceptance criteria | -| In Progress | In Review | Work complete | Proof >= 80% confidence | -| In Review | Done | Confidence >= 95% | Auto-approval | -| In Review | Done | User approves | User feedback = APPROVAL | -| In Review | In Progress | Confidence < 80% | Validation failed | -| In Review | In Progress | User requests changes | Feedback = REQUESTED_CHANGES | -| In Progress | Blocked | Max iterations | Escalation | -| * | Blocked | Unresolvable blocker | Manual trigger | +| State | Entry Condition | +|-------|----------------| +| Todo | Created with `@autopilot` label | +| In Progress | Has acceptance criteria, no blocking dependencies, assigned to autopilot | +| In Review | All tests pass, build successful, no lint errors, proof artifacts exist | +| Done | Confidence >= 95% (auto-approve) OR user explicitly approves | +| Blocked | Max iterations reached, unresolvable dependency, or manual trigger | ## Validation Gates ### Gate 1: Start Work (Todo -> In Progress) - ```typescript async function canStartWork(issue: Issue): Promise { - const checks = [ - // Has acceptance criteria + return [ extractAcceptanceCriteria(issue.description).length > 0, - - // No blocking dependencies (await getBlockingIssues(issue)).length === 0, - - // Assigned to autopilot issue.assignee?.id === AUTOPILOT_BOT_USER_ID, - ]; - - return checks.every(c => c); + ].every(c => c); } ``` ### Gate 2: Submit for Review (In Progress -> In Review) - ```typescript async function canSubmitForReview(proof: Proof): Promise { - const checks = [ - // All tests pass + return [ proof.testResults.passed === proof.testResults.total, - - // Build successful proof.buildSuccessful, - - // No lint errors proof.lintErrors === 0, - - // Has proof artifacts proof.screenshots.length > 0 || proof.deploymentUrl, - ]; - - return checks.every(c => c); + ].every(c => c); } ``` ### Gate 3: Complete (In Review -> Done) - -```typescript -async function canComplete(proof: Proof): Promise<{ - canProceed: boolean; - autoApproved: boolean; -}> { - if (proof.confidence >= 95) { - return { canProceed: true, autoApproved: true }; - } - - if (proof.confidence >= 80) { - return { canProceed: false, autoApproved: false }; - // Wait for user approval - } - - return { canProceed: false, autoApproved: false }; - // Validation failed, should iterate -} -``` +- Confidence >= 95%: auto-approve, transition to Done +- Confidence 80-94%: wait for user approval +- Confidence < 80%: validation failed, iterate back to In Progress ## Iteration Limits -| Loop Type | Max Iterations | Escalation | -|-----------|----------------|------------| +| Loop Type | Max Iterations | Escalation Action | +|-----------|----------------|-------------------| | Execution retry | 2 | Block task | | Feedback rounds | 5 | Manual intervention | | Quality check fixes | 2 | Report to user | -## Implementation - -```typescript -class StateMachine { - async transition( - issueId: string, - targetState: string, - proof?: Proof - ): Promise { - const issue = await linear.issue(issueId); - const currentState = issue.state.name; - - // Validate transition - const isValid = this.validateTransition(currentState, targetState, proof); - - if (!isValid) { - throw new Error(`Invalid transition: ${currentState} -> ${targetState}`); - } - - // Execute transition - await linear.issueUpdate(issueId, { - stateId: await this.getStateId(issue.team.id, targetState), - }); - - // Log transition - await this.logTransition(issueId, currentState, targetState, proof); - } - - private validateTransition( - from: string, - to: string, - proof?: Proof - ): boolean { - const validTransitions: Record = { - 'Todo': ['In Progress', 'Blocked'], - 'In Progress': ['In Review', 'Blocked'], - 'In Review': ['Done', 'In Progress'], - 'Blocked': ['Todo', 'In Progress'], - }; - - return validTransitions[from]?.includes(to) ?? false; - } -} -``` - -## State Transition Diagram - -``` - ┌─────────────────────────────┐ - │ │ - ▼ │ -┌──────┐ ┌─────────────┐ ┌───────────┴───┐ ┌──────┐ -│ Todo │ ────► │ In Progress │ ────► │ In Review │ ────► │ Done │ -└──────┘ └─────────────┘ └───────────────┘ └──────┘ - │ │ │ - │ │ │ - │ ▼ │ - │ ┌─────────┐ │ - └────────► │ Blocked │ ◄─────────────────┘ - └─────────┘ -``` - ## Examples -### Example 1: Happy Path - +### Happy path ```typescript -// Task created -await transitionState(issueId, 'In Progress'); // Gate: Has acceptance criteria - -// Work complete, proof generated -await transitionState(issueId, 'In Review'); // Gate: Proof >= 80% - -// High confidence auto-approval -await transitionState(issueId, 'Done'); // Gate: Confidence >= 95% +await transitionState(issueId, 'In Progress'); // Gate: has acceptance criteria +await transitionState(issueId, 'In Review'); // Gate: proof >= 80% +await transitionState(issueId, 'Done'); // Gate: confidence >= 95% ``` -### Example 2: Iteration Loop - +### Iteration loop ```typescript -// First attempt await transitionState(issueId, 'In Progress'); -await transitionState(issueId, 'In Review'); // Confidence: 85% - -// User requests changes -await transitionState(issueId, 'In Progress'); // Feedback: REQUESTED_CHANGES - -// Second attempt -await transitionState(issueId, 'In Review'); // Confidence: 97% -await transitionState(issueId, 'Done'); // Auto-approved +await transitionState(issueId, 'In Review'); // Confidence: 85% +// User requests changes → back to In Progress +await transitionState(issueId, 'In Progress'); +await transitionState(issueId, 'In Review'); // Confidence: 97% → auto-approved +await transitionState(issueId, 'Done'); ``` -### Example 3: Escalation - +### Escalation ```typescript -// After 5 feedback rounds if (iterationCount >= MAX_FEEDBACK_ROUNDS) { await transitionState(issueId, 'Blocked'); await addComment(issueId, "Escalated: Max iterations reached"); } ``` -## Best Practices +## Verification -- Always validate before transitioning -- Log all transitions for audit trail -- Include proof artifacts when transitioning to In Review -- Enforce iteration limits to prevent infinite loops -- Escalate gracefully rather than failing silently -- Comment on Linear when state changes for visibility +After each transition, confirm: +- [ ] State was updated in Linear +- [ ] Transition was logged for audit trail +- [ ] Gate conditions were checked before transition +- [ ] Iteration count was incremented (if applicable) diff --git a/plugins/autopilot/skills/tag-command-mapping/SKILL.md b/plugins/autopilot/skills/tag-command-mapping/SKILL.md index 7d5d3c6..f44b30f 100644 --- a/plugins/autopilot/skills/tag-command-mapping/SKILL.md +++ b/plugins/autopilot/skills/tag-command-mapping/SKILL.md @@ -1,184 +1,109 @@ --- name: tag-command-mapping -description: How tag-to-command routing works in autopilot. Defines default mappings, precedence rules, and customization patterns. -version: 0.1.0 -tags: [routing, tags, commands, mapping] -keywords: [tag, command, mapping, routing, classification, precedence] +description: "Routes Linear tasks to Claude Code commands based on tag labels, applying precedence rules when multiple tags exist and falling back to text classification. Supports custom mappings via autopilot.local.md. Use when resolving which agent handles a Linear task." --- -plugin: autopilot -updated: 2026-01-20 # Tag-to-Command Mapping -**Version:** 0.1.0 -**Purpose:** Route Linear tasks to appropriate Claude Code commands based on tags -**Status:** Phase 1 +Routes incoming Linear tasks to the appropriate Claude Code command/agent based on tag labels, precedence rules, and text classification fallback. -## When to Use +## Workflow -Use this skill when you need to: -- Understand how Linear tags map to Claude Code commands -- Customize tag-to-command mappings for a project -- Handle tasks with multiple tags (precedence rules) -- Classify tasks based on title/description text -- Resolve the correct agent/command for a task +1. **Extract tags** - filter labels starting with `@` from the Linear issue +2. **Apply precedence** - if multiple tags, select highest-priority tag +3. **Resolve mapping** - look up command, agent, and skills for the selected tag +4. **Fallback to classification** - if no tags, classify from title/description text +5. **Return routing** - provide the resolved command, agent, and skills to the caller -## Overview +## Default Tag Mappings -Tag-to-command mapping is the core routing mechanism in autopilot. When a task arrives from Linear, its labels determine which Claude Code command/agent handles execution. +| Tag | Command | Agent | Skills | +|-----|---------|-------|--------| +| `@debug` | `/dev:debug` | debugger | debugging-strategies | +| `@test` | `/dev:test-architect` | test-architect | testing-strategies | +| `@ui` | `/dev:ui` | ui | ui-design-review | +| `@frontend` | `/dev:feature` | developer | react-typescript | +| `@backend` | `/dev:implement` | developer | golang, api-design | +| `@review` | `/commit-commands:commit-push-pr` | reviewer | universal-patterns | +| `@refactor` | `/dev:implement` | developer | universal-patterns | +| `@research` | `/dev:deep-research` | researcher | n/a | -## Default Mappings +## Precedence Order (highest to lowest) -| Linear Tag | Command | Agent | Skills | -|------------|---------|-------|--------| -| @frontend | /dev:feature | developer | react-typescript | -| @backend | /dev:implement | developer | golang, api-design | -| @debug | /dev:debug | debugger | debugging-strategies | -| @test | /dev:test-architect | test-architect | testing-strategies | -| @review | /commit-commands:commit-push-pr | reviewer | universal-patterns | -| @refactor | /dev:implement | developer | universal-patterns | -| @research | /dev:deep-research | researcher | n/a | -| @ui | /dev:ui | ui | ui-design-review | - -## Precedence Rules - -When multiple tags are present, apply precedence order: +`@debug` > `@test` > `@ui` > `@frontend` > `@backend` > `@review` > `@refactor` > `@research` ```typescript -const PRECEDENCE = [ - '@debug', // Bug fixing takes priority - '@test', // Tests before implementation - '@ui', // UI before generic frontend - '@frontend', // Frontend before generic - '@backend', // Backend before generic - '@review', // Review after implementation - '@refactor', // Refactoring is lower priority - '@research' // Research is lowest -]; +const PRECEDENCE = ['@debug', '@test', '@ui', '@frontend', '@backend', '@review', '@refactor', '@research']; function selectTag(labels: string[]): string { const agentTags = labels.filter(l => l.startsWith('@')); - - if (agentTags.length === 0) return 'default'; - if (agentTags.length === 1) return agentTags[0]; - - // Multiple tags - apply precedence + if (agentTags.length <= 1) return agentTags[0] || 'default'; for (const tag of PRECEDENCE) { if (agentTags.includes(tag)) return tag; } - return 'default'; } ``` +## Text Classification Fallback + +When no tags are present, classify from task text: + +```typescript +function classifyTask(title: string, description: string): string { + const text = `${title} ${description}`.toLowerCase(); + if (/\b(fix|bug|error|crash|broken)\b/.test(text)) return 'BUG_FIX'; // → @debug + if (/\b(add|implement|create|new|feature)\b/.test(text)) return 'FEATURE'; // → @frontend + if (/\b(refactor|clean|optimize|improve)\b/.test(text)) return 'REFACTOR'; // → @refactor + if (/\b(ui|design|component|style|visual)\b/.test(text)) return 'UI_CHANGE'; // → @ui + if (/\b(test|coverage|e2e|spec)\b/.test(text)) return 'TEST'; // → @test + return 'UNKNOWN'; // → @frontend (default) +} +``` + ## Custom Mappings -Users can define custom mappings in `.claude/autopilot.local.md`: +Define project-specific mappings in `.claude/autopilot.local.md`: ```yaml ---- tag_mappings: "@database": command: "/dev:implement" agent: "developer" skills: ["database-patterns"] - systemPrompt: "You are a database specialist." - "@performance": command: "/dev:implement" agent: "developer" skills: ["universal-patterns"] - systemPrompt: "You are a performance optimization expert." ---- -``` - -## Task Classification - -Beyond explicit tags, classify tasks from text: - -```typescript -function classifyTask(title: string, description: string): string { - const text = `${title} ${description}`.toLowerCase(); - - // Keyword patterns - if (/\b(fix|bug|error|crash|broken)\b/.test(text)) return 'BUG_FIX'; - if (/\b(add|implement|create|new|feature)\b/.test(text)) return 'FEATURE'; - if (/\b(refactor|clean|optimize|improve)\b/.test(text)) return 'REFACTOR'; - if (/\b(ui|design|component|style|visual)\b/.test(text)) return 'UI_CHANGE'; - if (/\b(test|coverage|e2e|spec)\b/.test(text)) return 'TEST'; - if (/\b(doc|documentation|readme)\b/.test(text)) return 'DOCUMENTATION'; - - return 'UNKNOWN'; -} -``` - -## Mapping Resolution - -Complete resolution algorithm: - -```typescript -function resolveMapping(labels: string[], title: string, desc: string) { - // 1. Check explicit tags - const tag = selectTag(labels); - - if (tag !== 'default') { - return getMappingForTag(tag); - } - - // 2. Classify from text - const taskType = classifyTask(title, desc); - - // 3. Map task type to default tag - const typeToTag = { - 'BUG_FIX': '@debug', - 'FEATURE': '@frontend', - 'UI_CHANGE': '@ui', - 'TEST': '@test', - 'REFACTOR': '@refactor', - 'DOCUMENTATION': '@research', - }; - - return getMappingForTag(typeToTag[taskType] || '@frontend'); -} ``` ## Examples -### Example 1: Single Tag Resolution - +### Single tag resolution ```typescript -// Task with @frontend label const labels = ['@frontend', 'feature']; -const tag = selectTag(labels); // '@frontend' -const mapping = getMappingForTag(tag); -// Result: { command: '/dev:feature', agent: 'developer', skills: ['react-typescript'] } +selectTag(labels); // '@frontend' +// → { command: '/dev:feature', agent: 'developer', skills: ['react-typescript'] } ``` -### Example 2: Multiple Tag Precedence - +### Multiple tag precedence ```typescript -// Task with both @frontend and @debug const labels = ['@frontend', '@debug']; -const tag = selectTag(labels); // '@debug' (higher precedence) -const mapping = getMappingForTag(tag); -// Result: { command: '/dev:debug', agent: 'debugger', skills: ['debugging-strategies'] } +selectTag(labels); // '@debug' (higher precedence) +// → { command: '/dev:debug', agent: 'debugger', skills: ['debugging-strategies'] } ``` -### Example 3: Text Classification Fallback - +### Text classification fallback ```typescript -// Task without tags -const labels = []; -const title = "Fix login button not working"; -const mapping = resolveMapping(labels, title, ""); -// Classifies as BUG_FIX -> @debug -// Result: { command: '/dev:debug', agent: 'debugger', skills: ['debugging-strategies'] } +resolveMapping([], "Fix login button not working", ""); +// Classifies as BUG_FIX → @debug +// → { command: '/dev:debug', agent: 'debugger', skills: ['debugging-strategies'] } ``` -## Best Practices +## Verification -- Use explicit tags over relying on classification -- Create custom mappings for project-specific workflows -- Debug > Test > UI > Frontend precedence makes sense -- Review mapping effectiveness periodically -- Keep tag names short and descriptive (start with @) +When resolving a mapping, confirm: +- [ ] Correct tag selected based on precedence +- [ ] Mapping resolves to valid command, agent, and skills +- [ ] Custom mappings in autopilot.local.md are checked before defaults +- [ ] Fallback classification produces reasonable routing diff --git a/plugins/bun/skills/claudish-usage/SKILL.md b/plugins/bun/skills/claudish-usage/SKILL.md index 9431a88..4d67ede 100644 --- a/plugins/bun/skills/claudish-usage/SKILL.md +++ b/plugins/bun/skills/claudish-usage/SKILL.md @@ -1,9 +1,7 @@ --- name: claudish-usage -description: CRITICAL - Guide for using Claudish CLI ONLY through sub-agents to run Claude Code with OpenRouter models (Grok, GPT-5, Gemini, MiniMax). NEVER run Claudish directly in main context unless user explicitly requests it. Use when user mentions external AI models, Claudish, OpenRouter, or alternative models. Includes mandatory sub-agent delegation patterns, agent selection guide, file-based instructions, and strict rules to prevent context window pollution. +description: "Delegates Claudish CLI calls to sub-agents for running Claude Code with OpenRouter models (Grok, GPT-5, Gemini, MiniMax). Enforces sub-agent-only execution to prevent context window pollution, provides agent selection guides and file-based instruction patterns. Use when invoking external AI models, running Claudish, or setting up OpenRouter-based multi-model workflows." --- -plugin: bun -updated: 2026-01-20 # Claudish Usage Skill diff --git a/plugins/code-analysis/skills/architect-detective/SKILL.md b/plugins/code-analysis/skills/architect-detective/SKILL.md index 41e942b..025946b 100644 --- a/plugins/code-analysis/skills/architect-detective/SKILL.md +++ b/plugins/code-analysis/skills/architect-detective/SKILL.md @@ -1,484 +1,65 @@ --- name: architect-detective -description: Use when analyzing architecture and system design. Find design patterns, map layers, identify core abstractions via PageRank. Uses claudemem AST structural analysis for efficient architecture investigation. -updated: 2026-01-20 -keywords: architecture, design-patterns, system-design, claudemem, pagerank, layers -allowed-tools: Bash, Task, Read, AskUserQuestion +description: "Analyzes codebase architecture using claudemem AST structural analysis with PageRank. Maps layers, identifies core abstractions, traces dependency flow, and detects dead code. Use when analyzing system design, finding design patterns, mapping architectural boundaries, or auditing code structure." --- -# Architect Detective Skill +# Architect Detective -This skill uses claudemem's AST structural analysis for architecture investigation. +Software Architect perspective for deep architectural investigation using claudemem's `map`, `symbol`, `callers`, `callees`, and `dead-code` commands with PageRank centrality analysis. -## Why Claudemem Works Better for Architecture +## Workflow -| Task | claudemem | Native Tools | -|------|-----------|--------------| -| Find core abstractions | `map` with PageRank ranking | Read all files | -| Identify design patterns | Structural symbol graph | Grep patterns | -| Map dependencies | `callers`/`callees` chains | Manual tracing | -| Find architectural pillars | High-PageRank symbols | Unknown | +1. **Verify claudemem** — confirm v0.3.0+ installed and indexed. Check freshness; reindex if stale. +2. **Map the landscape** — run `claudemem --agent map` and identify high-PageRank symbols (> 0.05) as architectural pillars. +3. **Identify layers** — map presentation, business, and data layers with targeted queries. +4. **Trace dependencies** — for each pillar, run `callers` (who depends on it) and `callees` (what it depends on). +5. **Find boundaries** — search for interfaces, contracts, and dependency injection points. +6. **Detect dead code** (v0.4.0+) — run `dead-code`, categorize by PageRank (high = broken, low = cleanup candidate). +7. **Validate results** — confirm PageRank data is present and symbols match expected architectural patterns. -**Primary commands:** -- `claudemem --agent map "query"` - Architecture overview with PageRank -- `claudemem --agent symbol ` - Exact file:line locations - -# Architect Detective Skill - -**Version:** 3.3.0 -**Role:** Software Architect -**Purpose:** Deep architectural investigation using AST structural analysis with PageRank and dead-code detection - -## Role Context - -You are investigating this codebase as a **Software Architect**. Your focus is on: -- **System boundaries** - Where modules, services, and layers begin and end -- **Design patterns** - Architectural patterns used (MVC, Clean Architecture, DDD, etc.) -- **Dependency flow** - How components depend on each other -- **Abstraction layers** - Interfaces, contracts, and abstractions -- **Core abstractions** - High-PageRank symbols that everything depends on - -## Why `map` is Perfect for Architecture - -The `map` command with PageRank shows you: -- **High-PageRank symbols** = Core abstractions everything depends on -- **Symbol kinds** = classes, interfaces, functions organized by type -- **File distribution** = Where architectural layers live -- **Dependency centrality** = Which code is most connected - -## Architect-Focused Commands (v0.3.0) - -### Architecture Discovery (use `map`) - -```bash -# Get high-level architecture overview -claudemem --agent map "architecture layers" -# Find core abstractions (highest PageRank) -claudemem --agent map # Full map, sorted by importance - -# Map specific architectural concerns -claudemem --agent map "service layer business logic"claudemem --agent map "repository data access"claudemem --agent map "controller API endpoints"claudemem --agent map "middleware request handling"``` - -### Layer Boundary Discovery - -```bash -# Find interfaces/contracts (architectural boundaries) -claudemem --agent map "interface contract abstract" -# Find dependency injection points -claudemem --agent map "inject provider module" -# Find configuration/bootstrap -claudemem --agent map "config bootstrap initialize"``` - -### Pattern Discovery - -```bash -# Find factory patterns -claudemem --agent map "factory create builder" -# Find repository patterns -claudemem --agent map "repository persist query" -# Find event-driven patterns -claudemem --agent map "event emit subscribe handler"``` - -### Dependency Analysis - -```bash -# For a core abstraction, see what depends on it -claudemem --agent callers CoreService -# See what the abstraction depends on -claudemem --agent callees CoreService -# Get full dependency context -claudemem --agent context CoreService``` - -### Dead Code Detection (v0.4.0+ Required) - -```bash -# Find unused symbols for cleanup -claudemem --agent dead-code -# Only truly dead code (very low PageRank) -claudemem --agent dead-code --max-pagerank 0.005``` - -**Architectural insight**: Dead code indicates: -- Failed features that were never removed -- Over-engineering (abstractions nobody uses) -- Potential tech debt cleanup opportunities - -High PageRank + dead = Something broke recently (investigate!) -Low PageRank + dead = Safe to remove - -**Handling Results:** -```bash -DEAD_CODE=$(claudemem --agent dead-code) -if [ -z "$DEAD_CODE" ]; then - echo "No dead code found - architecture is well-maintained" -else - # Categorize by risk - HIGH_PAGERANK=$(echo "$DEAD_CODE" | awk '$5 > 0.01') - LOW_PAGERANK=$(echo "$DEAD_CODE" | awk '$5 <= 0.01') - - if [ -n "$HIGH_PAGERANK" ]; then - echo "WARNING: High-PageRank dead code found (possible broken references)" - echo "$HIGH_PAGERANK" - fi - - if [ -n "$LOW_PAGERANK" ]; then - echo "Cleanup candidates (low PageRank):" - echo "$LOW_PAGERANK" - fi -fi -``` - -**Limitations Note:** -Results labeled "Potentially Dead" require manual verification for: -- Dynamically imported modules -- Reflection-accessed code -- External API consumers - -## PHASE 0: MANDATORY SETUP - -### Step 1: Verify claudemem v0.3.0 - -```bash -which claudemem && claudemem --version -# Must be 0.3.0+ -``` - -### Step 2: If Not Installed → STOP - -Use AskUserQuestion (see ultrathink-detective for template) - -### Step 3: Check Index Status - -```bash -# Check claudemem installation and index -claudemem --version && ls -la .claudemem/index.db 2>/dev/null -``` - -### Step 3.5: Check Index Freshness - -Before proceeding with investigation, verify the index is current: - -```bash -# First check if index exists -if [ ! -d ".claudemem" ] || [ ! -f ".claudemem/index.db" ]; then - # Use AskUserQuestion to prompt for index creation - # Options: [1] Create index now (Recommended), [2] Cancel investigation - exit 1 -fi - -# Count files modified since last index -STALE_COUNT=$(find . -type f \( -name "*.ts" -o -name "*.tsx" -o -name "*.js" -o -name "*.jsx" -o -name "*.py" -o -name "*.go" -o -name "*.rs" \) \ - -newer .claudemem/index.db 2>/dev/null | grep -v "node_modules" | grep -v ".git" | grep -v "dist" | grep -v "build" | wc -l) -STALE_COUNT=$((STALE_COUNT + 0)) # Normalize to integer - -if [ "$STALE_COUNT" -gt 0 ]; then - # Get index time with explicit platform detection - if [[ "$OSTYPE" == "darwin"* ]]; then - INDEX_TIME=$(stat -f "%Sm" -t "%Y-%m-%d %H:%M" .claudemem/index.db 2>/dev/null) - else - INDEX_TIME=$(stat -c "%y" .claudemem/index.db 2>/dev/null | cut -d'.' -f1) - fi - INDEX_TIME=${INDEX_TIME:-"unknown time"} - - # Get sample of stale files - STALE_SAMPLE=$(find . -type f \( -name "*.ts" -o -name "*.tsx" \) \ - -newer .claudemem/index.db 2>/dev/null | grep -v "node_modules" | grep -v ".git" | head -5) - - # Use AskUserQuestion (see template in ultrathink-detective) -fi -``` - -### Step 4: Index if Needed +## Example: Architecture Analysis ```bash -claudemem index -``` - ---- - -## Workflow: Architecture Analysis (v0.3.0) - -### Phase 1: Map the Landscape - -```bash -# Get structural overview with PageRank +# Step 1: Structural overview claudemem --agent map -# Focus on high-PageRank symbols (> 0.01) - these are architectural pillars -``` -### Phase 2: Identify Layers +# Step 2: Layer identification +claudemem --agent map "controller handler endpoint" # Presentation +claudemem --agent map "service business logic" # Business +claudemem --agent map "repository database query" # Data -```bash -# Map each layer -claudemem --agent map "controller handler endpoint" # Presentation -claudemem --agent map "service business logic" # Business -claudemem --agent map "repository database query" # Data -``` - -### Phase 3: Trace Dependencies - -```bash -# For each high-PageRank symbol, understand its role -claudemem --agent symbol UserServiceclaudemem --agent callers UserService # Who depends on it? -claudemem --agent callees UserService # What does it depend on? -``` - -### Phase 4: Identify Boundaries - -```bash -# Find interfaces (architectural contracts) -claudemem --agent map "interface abstract" -# Check how implementations connect -claudemem --agent callers IUserRepository``` +# Step 3: Dependency tracing for a core abstraction +claudemem --agent symbol UserService +claudemem --agent callers UserService # What depends on it +claudemem --agent callees UserService # What it depends on -### Phase 5: Cleanup Opportunities (v0.4.0+ Required) - -```bash -# Find dead code -DEAD_CODE=$(claudemem --agent dead-code) - -if [ -z "$DEAD_CODE" ]; then - echo "No cleanup needed - codebase is well-maintained" -else - # For each dead symbol: - # - Check PageRank (low = utility, high = broken) - # - Verify not used externally (see limitations) - # - Add to cleanup backlog - - echo "Review each item for static analysis limitations:" - echo "- Dynamic imports may hide real usage" - echo "- External callers not visible to static analysis" -fi -``` - -## Output Format: Architecture Report - -### 1. Architecture Overview - -``` -┌─────────────────────────────────────────────────────────┐ -│ ARCHITECTURE ANALYSIS │ -├─────────────────────────────────────────────────────────┤ -│ Pattern: Clean Architecture / Layered │ -│ Core Abstractions (PageRank > 0.05): │ -│ - UserService (0.092) - Central business logic │ -│ - Database (0.078) - Data access foundation │ -│ - AuthMiddleware (0.056) - Security boundary │ -│ Search Method: claudemem v0.3.0 (AST + PageRank) │ -└─────────────────────────────────────────────────────────┘ +# Step 4: Dead code detection (v0.4.0+) +claudemem --agent dead-code ``` -### 2. Layer Map +**Verification:** Confirm `map` output contains PageRank values. If empty or missing PageRank, diagnose and reindex. -``` -┌─────────────────────────────────────────────────────────┐ -│ LAYER STRUCTURE │ -├─────────────────────────────────────────────────────────┤ -│ │ -│ PRESENTATION (src/controllers/, src/routes/) │ -│ └── UserController (0.034) │ -│ └── AuthController (0.028) │ -│ ↓ │ -│ BUSINESS (src/services/) │ -│ └── UserService (0.092) ⭐HIGH PAGERANK │ -│ └── AuthService (0.067) │ -│ ↓ │ -│ DATA (src/repositories/) │ -│ └── UserRepository (0.045) │ -│ └── Database (0.078) ⭐HIGH PAGERANK │ -│ │ -└─────────────────────────────────────────────────────────┘ -``` - -### 3. Dependency Flow - -``` -Entry → Controller → Service → Repository → Database - ↘ Middleware (cross-cutting) -``` - -## PageRank for Architecture +## PageRank Interpretation | PageRank | Architectural Role | Action | |----------|-------------------|--------| -| > 0.05 | Core abstraction | This IS the architecture - understand first | -| 0.01-0.05 | Important component | Key building block, affects many things | -| 0.001-0.01 | Standard component | Normal code, not architecturally significant | -| < 0.001 | Leaf/utility | Implementation detail, skip for arch analysis | - -## Result Validation Pattern - -After EVERY claudemem command, validate results: - -### Map Command Validation - -After `map` commands, validate architectural symbols were found: - -```bash -RESULTS=$(claudemem --agent map "service layer business logic") -EXIT_CODE=$? - -# Check for failure -if [ "$EXIT_CODE" -ne 0 ]; then - DIAGNOSIS=$(claudemem status 2>&1) - # Use AskUserQuestion -fi - -# Check for empty results -if [ -z "$RESULTS" ]; then - echo "WARNING: No symbols found - may be wrong query or index issue" - # Use AskUserQuestion: Reindex, Different query, or Cancel -fi - -# Check for high-PageRank symbols (> 0.01) -HIGH_PR=$(echo "$RESULTS" | grep "pagerank:" | awk -F': ' '{if ($2 > 0.01) print}' | wc -l) - -if [ "$HIGH_PR" -eq 0 ]; then - # No architectural symbols found - may be wrong query or index issue - # Use AskUserQuestion: Reindex, Broaden query, or Cancel -fi -``` - -### Symbol Validation - -```bash -SYMBOL=$(claudemem --agent symbol ArchitecturalComponent) - -if [ -z "$SYMBOL" ] || echo "$SYMBOL" | grep -qi "not found\|error"; then - # Component doesn't exist or index issue - # Use AskUserQuestion -fi -``` - ---- +| > 0.05 | Core abstraction | Analyze first — this IS the architecture | +| 0.01-0.05 | Important component | Key building block | +| 0.001-0.01 | Standard component | Not architecturally significant | +| < 0.001 | Leaf/utility | Skip for architecture analysis | -## FALLBACK PROTOCOL +## Fallback Protocol -**CRITICAL: Never use grep/find/Glob without explicit user approval.** +Never use grep/find/Glob without explicit user approval. If claudemem fails: -If claudemem fails or returns irrelevant results: - -1. **STOP** - Do not silently switch tools -2. **DIAGNOSE** - Run `claudemem status` -3. **REPORT** - Tell user what happened -4. **ASK** - Use AskUserQuestion for next steps - -```typescript -// Fallback options (in order of preference) -AskUserQuestion({ - questions: [{ - question: "claudemem map returned no architectural symbols or failed. How should I proceed?", - header: "Architecture Discovery Issue", - multiSelect: false, - options: [ - { label: "Reindex codebase", description: "Run claudemem index (~1-2 min)" }, - { label: "Try broader query", description: "Use different architectural terms" }, - { label: "Use grep (not recommended)", description: "Traditional search - loses PageRank ranking" }, - { label: "Cancel", description: "Stop investigation" } - ] - }] -}) -``` - -**See ultrathink-detective skill for complete Fallback Protocol documentation.** - ---- - -## Anti-Patterns - -| Anti-Pattern | Why Wrong | Correct Approach | -|--------------|-----------|------------------| -| `grep -r "class"` | No ranking, no structure | `claudemem --agent map` | -| Read all files | Token waste | Focus on high-PageRank symbols | -| Skip `map` command | Miss architecture | ALWAYS start with `map` | -| Ignore PageRank | Miss core abstractions | High PageRank = important | -| `cmd \| head/tail` | Hides high-PageRank symbols | Use full output or `--tokens` | - -### Output Truncation Warning - -╔══════════════════════════════════════════════════════════════════════════════╗ -║ ║ -║ ❌ Anti-Pattern 7: Truncating Claudemem Output ║ -║ ║ -║ FORBIDDEN (any form of output truncation): ║ -║ → BAD: claudemem --agent map "query" | head -80 ║ -║ → BAD: claudemem --agent callers X | tail -50 ║ -║ → BAD: claudemem --agent search "x" | grep -m 10 "y" ║ -║ → BAD: claudemem --agent map "q" | awk 'NR <= 50' ║ -║ → BAD: claudemem --agent callers X | sed '50q' ║ -║ → BAD: claudemem --agent search "x" | sort | head -20 ║ -║ → BAD: claudemem --agent map "q" | grep "pattern" | head -20 ║ -║ ║ -║ CORRECT (use full output or built-in limits): ║ -║ → GOOD: claudemem --agent map "query" ║ -║ → GOOD: claudemem --agent search "x" -n 10 ║ -║ → GOOD: claudemem --agent map "q" --tokens 2000 ║ -║ → GOOD: claudemem --agent search "x" --page-size 20 --page 1 ║ -║ → GOOD: claudemem --agent context Func --max-depth 3 ║ -║ ║ -║ WHY: Output is pre-optimized; truncation hides critical results ║ -║ ║ -╚══════════════════════════════════════════════════════════════════════════════╝ - ---- - -## Feedback Reporting (v0.8.0+) - -After completing investigation, report search feedback to improve future results. - -### When to Report - -Report feedback ONLY if you used the `search` command during investigation: - -| Result Type | Mark As | Reason | -|-------------|---------|--------| -| Read and used | Helpful | Contributed to investigation | -| Read but irrelevant | Unhelpful | False positive | -| Skipped after preview | Unhelpful | Not relevant to query | -| Never read | (Don't track) | Can't evaluate | - -### Feedback Pattern - -```bash -# Track during investigation -SEARCH_QUERY="your original query" -HELPFUL_IDS="" -UNHELPFUL_IDS="" - -# When reading a helpful result -HELPFUL_IDS="$HELPFUL_IDS,$result_id" - -# When reading an unhelpful result -UNHELPFUL_IDS="$UNHELPFUL_IDS,$result_id" - -# Report at end of investigation (v0.8.0+ only) -if claudemem feedback --help 2>&1 | grep -qi "feedback"; then - timeout 5 claudemem feedback \ - --query "$SEARCH_QUERY" \ - --helpful "${HELPFUL_IDS#,}" \ - --unhelpful "${UNHELPFUL_IDS#,}" 2>/dev/null || true -fi -``` - -### Output Update - -Include in investigation report: - -``` -Search Feedback: [X helpful, Y unhelpful] - Submitted (v0.8.0+) -``` - ---- +1. Stop — do not silently switch tools. +2. Diagnose — run `claudemem status`. +3. Ask user via AskUserQuestion (reindex, broaden query, or cancel). ## Notes -- **`map` is your primary tool** - It shows architecture through PageRank -- High-PageRank symbols ARE the architecture - they're what everything depends on -- Use `callers` to see what depends on a component (impact of changes) -- Use `callees` to see what a component depends on (its requirements) -- Works best with TypeScript, Go, Python, Rust codebases - ---- - -**Maintained by:** MadAppGang -**Plugin:** code-analysis v2.7.0 -**Last Updated:** December 2025 (v3.3.0 - Cross-platform compatibility, inline templates, improved validation) +- `map` is the primary tool — it reveals architecture through PageRank centrality +- High PageRank + dead = something recently broke (investigate immediately) +- Low PageRank + dead = safe cleanup candidate +- Never truncate claudemem output — use `--tokens` or `-n` flags for size control +- Submit search feedback via `claudemem feedback` (v0.8.0+) diff --git a/plugins/code-analysis/skills/claudemem-orchestration/SKILL.md b/plugins/code-analysis/skills/claudemem-orchestration/SKILL.md index 1d6954e..6b86cc3 100644 --- a/plugins/code-analysis/skills/claudemem-orchestration/SKILL.md +++ b/plugins/code-analysis/skills/claudemem-orchestration/SKILL.md @@ -1,10 +1,7 @@ --- -name: claudemem-orchestration -description: Use when orchestrating multi-agent code analysis with claudemem. Run claudemem once, share output across parallel agents. Enables parallel investigation, consensus analysis, and role-based command mapping. -updated: 2026-01-20 -keywords: claudemem, orchestration, multi-agent, parallel-execution, consensus +name: mem-orchestration +description: "Coordinates multiple agents using shared claudemem output for code analysis. Use when orchestrating multi-agent code analysis, running parallel investigations, performing consensus analysis, or mapping role-based commands across agents." allowed-tools: Bash, Task, Read, Write, AskUserQuestion -skills: orchestration:multi-model-validation --- # Claudemem Multi-Agent Orchestration diff --git a/plugins/code-analysis/skills/claudemem-search/SKILL.md b/plugins/code-analysis/skills/claudemem-search/SKILL.md index 0c812c8..4cdaed0 100644 --- a/plugins/code-analysis/skills/claudemem-search/SKILL.md +++ b/plugins/code-analysis/skills/claudemem-search/SKILL.md @@ -1,6 +1,6 @@ --- -name: claudemem-search -description: "⚡ PRIMARY TOOL for semantic code search AND structural analysis. NEW: AST tree navigation with map, symbol, callers, callees, context commands. PageRank ranking. Recommended workflow: Map structure first, then search semantically, analyze callers before modifying." +name: mem-search +description: "Provides semantic code search and AST-based structural analysis via claudemem. Use when searching codebases semantically, navigating symbol graphs with PageRank ranking, or analyzing callers and callees before modifying code." allowed-tools: Bash, Task, AskUserQuestion --- diff --git a/plugins/code-analysis/skills/code-search-selector/SKILL.md b/plugins/code-analysis/skills/code-search-selector/SKILL.md index 2c61bfc..5e4e1f6 100644 --- a/plugins/code-analysis/skills/code-search-selector/SKILL.md +++ b/plugins/code-analysis/skills/code-search-selector/SKILL.md @@ -1,16 +1,20 @@ --- name: code-search-selector -description: "💡 Tool selector for code search tasks. Helps choose between semantic search (claudemem) and native tools (Grep/Glob) based on query type. Semantic search recommended for: 'how does X work', 'find all', 'audit', 'investigate', 'architecture'." -allowed-tools: Bash, Read, AskUserQuestion +description: "Selects between semantic search (claudemem) and native tools (Grep/Glob) based on query type. Checks claudemem index status and classifies tasks as semantic or exact-match. Use when choosing a code search tool, before investigation tasks, or when deciding between claudemem search and grep." --- # Code Search Tool Selector -This skill helps choose the most effective search tool for your task. +Chooses the most effective search tool for each task by classifying queries and checking claudemem availability. -## When Semantic Search Works Better +## Workflow -Claudemem provides better results for conceptual queries: +1. **Check claudemem status** — run `claudemem status` (mandatory before any semantic search). +2. **Classify the task** — determine if the query is semantic (conceptual) or exact-match (literal string). +3. **Select the tool** based on classification and claudemem availability. +4. **Execute the search** using the recommended approach. + +## Classification Guide | Query Type | Example | Recommended Tool | |------------|---------|------------------| @@ -18,285 +22,52 @@ Claudemem provides better results for conceptual queries: | Find implementations | "Find all API endpoints" | `claudemem search` | | Architecture questions | "Map the service layer" | `claudemem --agent map` | | Trace data flow | "How does user data flow?" | `claudemem search` | -| Audit integrations | "Audit Prime API usage" | `claudemem search` | - -## When Native Tools Work Better - -| Query Type | Example | Recommended Tool | -|------------|---------|------------------| | Exact string match | "Find 'DEPRECATED_FLAG'" | `Grep` | | Count occurrences | "How many TODO comments?" | `Grep -c` | -| Specific symbol | "Find class UserService" | `Grep` | | File patterns | "Find all *.config.ts" | `Glob` | -## Why Semantic Search is Often More Efficient - -**Token Efficiency**: Reading 5 files costs ~5000 tokens; claudemem search costs ~500 tokens with ranked results. - -**Context Discovery**: Claudemem finds related code you didn't know to ask for. - -**Ranking**: Results sorted by relevance and PageRank, so important code comes first. - ## Example: Semantic Query -**User asks:** "How does authentication work?" - -**Less effective approach:** -```bash -grep -r "auth" src/ -# Result: 500 lines of noise, hard to understand -``` - -**More effective approach:** ```bash -claudemem status # Check if indexed -claudemem search "authentication login flow JWT" -# Result: Top 10 semantically relevant code chunks, ranked -``` - -## Quick Decision Guide - -### Classify the Task - -| User Request | Category | Recommended Tool | -|--------------|----------|------------------| -| "Find all X", "How does X work" | Semantic | claudemem search | -| "Audit X integration", "Map data flow" | Semantic | claudemem search | -| "Understand architecture", "Trace X" | Semantic | claudemem map | -| "Find exact string 'foo'" | Exact Match | Grep | -| "Count occurrences of X" | Exact Match | Grep | -| "Find symbol UserService" | Exact Match | Grep | - -### Step 2: Check claudemem Status (MANDATORY for Semantic) - -```bash -# ALWAYS run this before semantic search +# Step 1: Check index claudemem status -``` - -**Interpret the output:** +# Output shows chunk count (e.g., "938 chunks") → indexed -| Status | What It Means | Next Action | -|--------|---------------|-------------| -| Shows chunk count (e.g., "938 chunks") | ✅ Indexed | **USE CLAUDEMEM** (Step 3) | -| "No index found" | ❌ Not indexed | Offer to index (Step 2b) | -| "command not found" | ❌ Not installed | Fall back to Detective agent | - -### Step 2b: If Not Indexed, Offer to Index - -```typescript -AskUserQuestion({ - questions: [{ - question: "Claudemem is not indexed. Index now for better semantic search results?", - header: "Index?", - multiSelect: false, - options: [ - { label: "Yes, index now (Recommended)", description: "Takes 1-2 minutes, enables semantic search" }, - { label: "No, use grep instead", description: "Faster but less accurate for semantic queries" } - ] - }] -}) -``` - -If user says yes: -```bash -claudemem index -y +# Step 2: Search semantically +claudemem search "authentication login flow JWT" -n 15 +# Result: Top 15 ranked code chunks vs. grep's 500 lines of noise ``` -### Step 3: Execute the Search +**Verification:** Confirm `claudemem status` shows chunk count before proceeding with semantic search. -**IF CLAUDEMEM IS INDEXED (from Step 2):** +## Example: Exact Match ```bash -# Get role-specific guidance first -claudemem ai developer # or architect, tester, debugger - -# Then search semantically -claudemem search "authentication login JWT token validation" -n 15 -``` - -**IF CLAUDEMEM IS NOT AVAILABLE:** - -Use the detective agent: -```typescript -Task({ - subagent_type: "code-analysis:detective", - description: "Investigate [topic]", - prompt: "Use semantic search to find..." -}) -``` - -### Tool Recommendations by Use Case - -| Use Case | Less Efficient | More Efficient | -|----------|----------------|----------------| -| Semantic queries | `grep -r "pattern" src/` | `claudemem search "concept"` | -| Find implementations | `Glob → Read all` | `claudemem search "feature"` | -| Understand flow | `find . -name "*.ts" \| xargs...` | `claudemem --agent map` | - -Native tools (Grep, Glob, find) work well for exact matches but provide no semantic ranking. - ---- - -## When Hooks Redirect to Claudemem - -If a hook provides claudemem results instead of native tool output: - -1. **Use the provided results** - They're ranked by relevance -2. **For more data** - Run additional claudemem queries -3. **Bypass available** - Use `_bypass_claudemem: true` for native tools when needed - -The hook system provides claudemem results proactively when the index is available. - ---- - -## Task-to-Tool Mapping Reference - -| User Request | Native Approach | Semantic Approach (Recommended) | -|--------------|-----------------|--------------------------------| -| "Audit all API endpoints" | `grep -r "router\|endpoint"` | `claudemem search "API endpoint route handler"` | -| "How does auth work?" | `grep -r "auth\|login"` | `claudemem search "authentication login flow"` | -| "Find all database queries" | `grep -r "prisma\|query"` | `claudemem search "database query SQL prisma"` | -| "Map the data flow" | `grep -r "transform\|map"` | `claudemem search "data transformation pipeline"` | -| "What's the architecture?" | `ls -la src/` | `claudemem --agent map "architecture"` | -| "Find error handling" | `grep -r "catch\|error"` | `claudemem search "error handling exception"` | -| "Trace user creation" | `grep -r "createUser"` | `claudemem search "user creation registration"` | - -## When Grep IS Appropriate - -✅ **Use Grep for:** -- Finding exact string: `grep -r "DEPRECATED_FLAG" src/` -- Counting occurrences: `grep -c "import React" src/**/*.tsx` -- Finding specific symbol: `grep -r "class UserService" src/` -- Regex patterns: `grep -r "TODO:\|FIXME:" src/` - -❌ **Never use Grep for:** -- Understanding how something works -- Finding implementations by concept -- Architecture analysis -- Tracing data flow -- Auditing integrations - -## Integration with Detective Skills - -After using this skill's decision tree, invoke the appropriate detective: - -| Investigation Type | Detective Skill | -|-------------------|-----------------| -| Architecture patterns | `code-analysis:architect-detective` | -| Implementation details | `code-analysis:developer-detective` | -| Test coverage | `code-analysis:tester-detective` | -| Bug root cause | `code-analysis:debugger-detective` | -| Comprehensive audit | `code-analysis:ultrathink-detective` | - -## Quick Reference Card - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ CODE SEARCH QUICK REFERENCE │ -├─────────────────────────────────────────────────────────────────┤ -│ │ -│ 1. ALWAYS check first: claudemem status │ -│ │ -│ 2. If indexed: claudemem search "semantic query" │ -│ │ -│ 3. For exact matches: Grep tool (only this case!) │ -│ │ -│ 4. For deep analysis: Task(code-analysis:detective) │ -│ │ -│ ⚠️ GREP IS FOR EXACT MATCHES, NOT SEMANTIC UNDERSTANDING │ -│ │ -└─────────────────────────────────────────────────────────────────┘ +# Exact string — use native tools +grep -r "DEPRECATED_FLAG" src/ ``` -## Pre-Investigation Checklist +## Bulk Read Optimization -Before ANY code investigation task, verify: - -- [ ] Ran `claudemem status` to check index -- [ ] Classified task as SEMANTIC or EXACT MATCH -- [ ] Selected appropriate tool based on classification -- [ ] NOT using grep for semantic queries when claudemem is indexed - ---- - -## Multi-File Read Optimization - -When reading multiple files, consider if a semantic search would be more efficient: - -| Scenario | Optimization | -|----------|-------------| -| Read 3+ files in same directory | Try `claudemem search` first | -| Glob with broad patterns | Try `claudemem --agent map` | -| Sequential reads to "understand" | One semantic query may suffice | - -**Quick check before bulk reads:** -1. Is claudemem indexed? (`claudemem status`) -2. Can this be one semantic query instead of N file reads? - -### Interception Examples - -**❌ About to do:** -``` -Read src/services/auth/login.ts -Read src/services/auth/session.ts -Read src/services/auth/jwt.ts -Read src/services/auth/middleware.ts -Read src/services/auth/types.ts -Read src/services/auth/utils.ts -``` +Before reading 3+ files, check if one semantic query is more efficient: -**✅ Do instead:** -```bash -claudemem search "authentication login session JWT middleware" -n 15 -``` +| Planned Operation | Better Alternative | +|-------------------|-------------------| +| Read 5 auth files individually | `claudemem search "authentication login session" -n 15` | +| Glob all services then read | `claudemem search "service layer business logic"` | +| Sequential reads to understand | One semantic query (~500 vs ~5000 tokens) | -**❌ About to do:** -``` -Glob pattern: src/services/prime/**/*.ts -Then read all 12 matches sequentially -``` +## If Claudemem Not Indexed -**✅ Do instead:** ```bash -claudemem search "Prime API integration service endpoints" -n 20 -``` - -**❌ Parallelization trap:** -``` -"Let me Read these 5 files while the detective agent works..." -``` - -**✅ Do instead:** -``` -Trust the detective agent to use claudemem. -Don't duplicate work with inferior Read/Glob. +# Offer to index +claudemem index -y +# Takes 1-2 minutes, enables semantic search ``` ---- - -## Efficiency Comparison - -| Approach | Token Cost | Result Quality | -|----------|------------|----------------| -| Read 5+ files sequentially | ~5000 tokens | No ranking | -| Glob → Read all matches | ~3000+ tokens | No semantic understanding | -| `claudemem search` once | ~500 tokens | Ranked by relevance | - -**Tip:** Claudemem results include context around matches, so you often don't need to read full files. - ---- - -## Recommended Workflow - -1. **Check index**: `claudemem status` -2. **Search semantically**: `claudemem search "concept query" -n 15` -3. **Read specific code**: Use results to target file:line reads - -This workflow finds relevant code faster than reading files sequentially. - ---- +## Notes -**Maintained by:** MadAppGang -**Plugin:** code-analysis v2.16.0 -**Purpose:** Help choose the most efficient search tool for each task +- Semantic search finds related code even with different terminology +- Results sorted by relevance and PageRank — important code comes first +- Hook system may provide claudemem results proactively when index is available +- Use `_bypass_claudemem: true` when native tool behavior is specifically needed diff --git a/plugins/code-analysis/skills/cross-plugin-detective/SKILL.md b/plugins/code-analysis/skills/cross-plugin-detective/SKILL.md index 6fa274a..90fd9e8 100644 --- a/plugins/code-analysis/skills/cross-plugin-detective/SKILL.md +++ b/plugins/code-analysis/skills/cross-plugin-detective/SKILL.md @@ -1,266 +1,61 @@ --- name: cross-plugin-detective -description: Use when integrating detective skills across plugins. Maps agent roles to appropriate detective skills (developer → developer-detective, architect → architect-detective). Reference this to connect agents with claudemem investigation capabilities. -updated: 2026-01-20 -keywords: cross-plugin, detective, agent-mapping, claudemem, integration -allowed-tools: Bash, Task, Read, AskUserQuestion +description: "Maps agent roles to appropriate detective skills (developer-detective, architect-detective, tester-detective, debugger-detective, ultrathink-detective) for cross-plugin integration. Use when connecting agents from other plugins to claudemem investigation capabilities or deciding which detective to reference in agent frontmatter." --- # Cross-Plugin Detective Integration -**Version:** 1.0.0 -**Purpose:** Connect ANY agent to the appropriate detective skill based on role +Connects any agent across plugins to the appropriate detective skill based on its role, ensuring all code investigation uses claudemem indexed memory exclusively. -## ⛔ CORE PRINCIPLE: INDEXED MEMORY ONLY +## Workflow -``` -╔══════════════════════════════════════════════════════════════════════════════╗ -║ ║ -║ ALL DETECTIVE SKILLS USE claudemem (INDEXED MEMORY) EXCLUSIVELY ║ -║ ║ -║ When ANY agent references a detective skill, they MUST: ║ -║ ❌ NEVER use grep, find, rg, Glob tool, Grep tool ║ -║ ✅ ALWAYS use claudemem search "query" ║ -║ ║ -╚══════════════════════════════════════════════════════════════════════════════╝ -``` - ---- - -## Agent-to-Skill Mapping - -### Frontend Plugin Agents - -| Agent | Should Use Skill | Purpose | -|-------|-----------------|---------| -| `typescript-frontend-dev` | `code-analysis:developer-detective` | Find implementations, trace data flow | -| `frontend-architect` | `code-analysis:architect-detective` | Analyze architecture, design patterns | -| `test-architect` | `code-analysis:tester-detective` | Coverage analysis, test quality | -| `senior-code-reviewer` | `code-analysis:ultrathink-detective` | Comprehensive code review | -| `ui-developer` | `code-analysis:developer-detective` | Find UI implementations | -| `designer` | `code-analysis:architect-detective` | Understand component structure | -| `plan-reviewer` | `code-analysis:architect-detective` | Review architecture plans | - -### Bun Backend Plugin Agents - -| Agent | Should Use Skill | Purpose | -|-------|-----------------|---------| -| `backend-developer` | `code-analysis:developer-detective` | Find implementations, trace data flow | -| `api-architect` | `code-analysis:architect-detective` | API architecture analysis | -| `apidog` | `code-analysis:developer-detective` | Find API implementations | - -### Code Analysis Plugin Agents +1. **Identify the agent's primary role** — implementing, designing, testing, debugging, or reviewing. +2. **Select the matching detective** from the mapping table below. +3. **Add the skill reference** to the agent's frontmatter: `skills: code-analysis:`. +4. **Verify the agent uses claudemem** — never grep, find, Glob, or Grep tools for code discovery. -| Agent | Should Use Skill | Purpose | -|-------|-----------------|---------| -| `codebase-detective` | All detective skills | Full investigation capability | +## Agent-to-Detective Mapping -### Any Other Plugin +| Agent Role | Detective Skill | Primary Focus | +|------------|----------------|---------------| +| Developer agents | `code-analysis:developer-detective` | Implementation, data flow, callers/callees | +| Architect agents | `code-analysis:architect-detective` | System design, layers, PageRank analysis | +| Tester agents | `code-analysis:tester-detective` | Coverage analysis, test quality | +| Debugger agents | `code-analysis:debugger-detective` | Root cause analysis, call chains | +| Reviewer agents | `code-analysis:ultrathink-detective` | Comprehensive multi-perspective audit | -| Agent Role | Should Use Skill | -|------------|-----------------| -| Any "developer" agent | `code-analysis:developer-detective` | -| Any "architect" agent | `code-analysis:architect-detective` | -| Any "tester" agent | `code-analysis:tester-detective` | -| Any "reviewer" agent | `code-analysis:ultrathink-detective` | -| Any "debugger" agent | `code-analysis:debugger-detective` | +## Example: Adding Detective to a Frontend Agent ---- - -## How to Reference Skills in Agent Frontmatter - -### Example: Developer Agent ```yaml --- -name: my-developer-agent +name: typescript-frontend-dev description: Implements features skills: code-analysis:developer-detective --- -# My Developer Agent - -When investigating code, use the developer-detective skill. -This gives you access to indexed memory search via claudemem. - -## Investigation Pattern - -Before implementing: -1. Check claudemem status: `claudemem status` -2. Search for related code: `claudemem search "feature I'm implementing"` -3. Read specific files from results -4. NEVER use grep or find for discovery -``` - -### Example: Architect Agent -```yaml ---- -name: my-architect-agent -description: Designs architecture -skills: code-analysis:architect-detective ---- - -# My Architect Agent - -When analyzing architecture, use the architect-detective skill. - -## Architecture Discovery - -1. Check claudemem status: `claudemem status` -2. Search for patterns: `claudemem search "service layer architecture"` -3. Map dependencies: `claudemem search "import dependency injection"` -4. NEVER use grep or find for discovery -``` - -### Example: Multi-Skill Agent -```yaml ---- -name: comprehensive-reviewer -description: Reviews all aspects -skills: code-analysis:ultrathink-detective, code-analysis:tester-detective ---- -``` - ---- - -## Skill Selection Decision Tree - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ WHICH DETECTIVE SKILL TO USE? │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ What is the agent's PRIMARY focus? │ -│ │ -│ ├── IMPLEMENTING code / Finding where to change │ -│ │ └── Use: developer-detective │ -│ │ │ -│ ├── DESIGNING architecture / Understanding patterns │ -│ │ └── Use: architect-detective │ -│ │ │ -│ ├── TESTING / Coverage analysis / Quality │ -│ │ └── Use: tester-detective │ -│ │ │ -│ ├── DEBUGGING / Finding root cause │ -│ │ └── Use: debugger-detective │ -│ │ │ -│ └── COMPREHENSIVE analysis / Technical debt / Audit │ -│ └── Use: ultrathink-detective │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - ---- - -## Integration Examples - -### Example 1: Frontend Developer Agent Needing to Find Code - -```typescript -// In frontend plugin's typescript-frontend-dev agent: - -// ❌ WRONG - Never do this -Grep({ pattern: "UserService", type: "ts" }); -Glob({ pattern: "**/user*.ts" }); - -// ✅ CORRECT - Use indexed memory via developer-detective skill -// The skill teaches the agent to use: -claudemem search "UserService implementation methods" +# Investigation Pattern +# 1. claudemem status +# 2. claudemem search "feature I'm implementing" +# 3. Read specific file:line from results +# 4. NEVER use grep or find for discovery ``` -### Example 2: Backend Architect Analyzing API Structure - -```typescript -// In bun plugin's api-architect agent: - -// ❌ WRONG - Never do this -find . -name "*.controller.ts" -grep -r "router\." . --include="*.ts" - -// ✅ CORRECT - Use indexed memory via architect-detective skill -claudemem search "API controller endpoint handler" -claudemem search "router pattern REST GraphQL" -``` - -### Example 3: Test Architect Finding Coverage Gaps - -```typescript -// In frontend plugin's test-architect agent: - -// ❌ WRONG - Never do this -Glob({ pattern: "**/*.test.ts" }); -Grep({ pattern: "describe" }); - -// ✅ CORRECT - Use indexed memory via tester-detective skill -claudemem search "test coverage describe spec" -claudemem search "mock stub test assertion" -``` - ---- - -## Skill Inheritance Pattern +**Verification:** Confirm the agent's frontmatter includes the correct `skills:` reference and that investigation code uses `claudemem` commands. -When an agent needs code investigation, it should: +## Plugin Dependency -1. **Reference the appropriate detective skill in frontmatter** -2. **Follow the skill's INDEXED MEMORY ONLY requirement** -3. **Use claudemem for ALL code discovery** -4. **NEVER fall back to grep/find/Glob/Grep tools** - -```yaml ---- -name: any-agent-that-needs-investigation -skills: code-analysis:developer-detective # or architect/tester/debugger/ultrathink ---- - -# This agent inherits: -# - INDEXED MEMORY requirement (claudemem only) -# - Role-specific search patterns -# - Output format guidance -# - FORBIDDEN: grep, find, Glob, Grep tools -``` - ---- - -## Plugin Dependencies - -If your plugin has agents that need code investigation, add this dependency: +Add to your plugin's `plugin.json` to ensure detective skills are available: ```json { - "name": "your-plugin", "dependencies": { "code-analysis@mag-claude-plugins": "^1.6.0" } } ``` -This ensures: -- claudemem skills are available -- Detective skills are accessible via `code-analysis:*` prefix -- Agents can reference skills in frontmatter - ---- - -## Summary: The Golden Rule - -``` -╔══════════════════════════════════════════════════════════════════════════════╗ -║ ║ -║ ANY AGENT + CODE INVESTIGATION = claudemem ONLY ║ -║ ║ -║ Developer agents → code-analysis:developer-detective ║ -║ Architect agents → code-analysis:architect-detective ║ -║ Tester agents → code-analysis:tester-detective ║ -║ Debugger agents → code-analysis:debugger-detective ║ -║ Reviewer agents → code-analysis:ultrathink-detective ║ -║ ║ -║ grep/find/Glob/Grep = FORBIDDEN (always, everywhere, no exceptions) ║ -║ ║ -╚══════════════════════════════════════════════════════════════════════════════╝ -``` - ---- +## Notes -**Maintained by:** MadAppGang -**Plugin:** code-analysis -**Last Updated:** December 2025 +- All detective skills require claudemem indexed memory — grep/find/Glob/Grep are forbidden for semantic queries +- Multi-skill agents can reference multiple detectives: `skills: code-analysis:ultrathink-detective, code-analysis:tester-detective` +- Direct detective usage and /analyze command remain unchanged diff --git a/plugins/code-analysis/skills/deep-analysis/SKILL.md b/plugins/code-analysis/skills/deep-analysis/SKILL.md index da6597a..d7a915c 100644 --- a/plugins/code-analysis/skills/deep-analysis/SKILL.md +++ b/plugins/code-analysis/skills/deep-analysis/SKILL.md @@ -1,368 +1,69 @@ --- name: deep-analysis -description: "⚡ PRIMARY SKILL for: 'how does X work', 'investigate', 'analyze architecture', 'trace flow', 'find implementations'. PREREQUISITE: code-search-selector must validate tool choice. Launches codebase-detective with claudemem INDEXED MEMORY." -allowed-tools: Task -prerequisites: - - code-search-selector # Must run before this skill -dependencies: - - claudemem must be indexed (claudemem status) +description: "Launches codebase-detective with claudemem indexed memory for comprehensive code investigation. Traces code flow, maps architecture, and locates implementations with semantic search. Use when asked 'how does X work', 'investigate', 'analyze architecture', 'trace flow', or 'find implementations'." --- # Deep Code Analysis -This Skill provides comprehensive codebase investigation capabilities using the codebase-detective agent with semantic search and pattern matching. +Provides comprehensive codebase investigation by launching the codebase-detective agent with claudemem semantic search and AST-aware pattern matching. -## Prerequisites (MANDATORY) +## Prerequisites -``` -╔══════════════════════════════════════════════════════════════════════════════╗ -║ BEFORE INVOKING THIS SKILL ║ -╠══════════════════════════════════════════════════════════════════════════════╣ -║ ║ -║ 1. INVOKE code-search-selector skill FIRST ║ -║ → Validates tool selection (claudemem vs grep) ║ -║ → Checks if claudemem is indexed ║ -║ → Prevents tool familiarity bias ║ -║ ║ -║ 2. VERIFY claudemem status ║ -║ → Run: claudemem status ║ -║ → If not indexed: claudemem index -y ║ -║ ║ -║ 3. DO NOT start with Read/Glob ║ -║ → Even if file paths are mentioned in the prompt ║ -║ → Semantic search first, Read specific lines after ║ -║ ║ -╚══════════════════════════════════════════════════════════════════════════════╝ -``` - -## When to use this Skill - -Claude should invoke this Skill when: - -- User asks "how does [feature] work?" -- User wants to understand code architecture or patterns -- User is debugging and needs to trace code flow -- User asks "where is [functionality] implemented?" -- User needs to find all usages of a component/service -- User wants to understand dependencies between files -- User mentions: "investigate", "analyze", "find", "trace", "understand" -- User is exploring an unfamiliar codebase -- User needs to understand complex multi-file functionality - -## Instructions - -### Phase 1: Determine Investigation Scope - -Understand what the user wants to investigate: - -1. **Specific Feature**: "How does user authentication work?" -2. **Find Implementation**: "Where is the payment processing logic?" -3. **Trace Flow**: "What happens when I click the submit button?" -4. **Debug Issue**: "Why is the profile page showing undefined?" -5. **Find Patterns**: "Where are all the API calls made?" -6. **Analyze Architecture**: "What's the structure of the data layer?" - -### Phase 2: Invoke codebase-detective Agent - -Use the Task tool to launch the codebase-detective agent with comprehensive instructions: - -``` -Use Task tool with: -- subagent_type: "code-analysis:detective" -- description: "Investigate [brief summary]" -- prompt: [Detailed investigation instructions] -``` - -**Prompt structure for codebase-detective**: - -```markdown -# Code Investigation Task - -## Investigation Target -[What needs to be investigated - be specific] - -## Context -- Working Directory: [current working directory] -- Purpose: [debugging/learning/refactoring/etc] -- User's Question: [original user question] - -## Investigation Steps - -1. **Initial Search** (CLAUDEMEM REQUIRED): - - FIRST: Check `claudemem status` - is index available? - - ALWAYS: Use `claudemem search "semantic query"` for investigation - - NEVER: Use grep/glob for semantic understanding tasks - - Search for: [concepts, functionality, patterns by meaning] - -2. **Code Location**: - - Find exact file paths and line numbers - - Identify entry points and main implementations - - Note related files and dependencies - -3. **Code Flow Analysis**: - - Trace how data/control flows through the code - - Identify key functions and their roles - - Map out component/service relationships - -4. **Pattern Recognition**: - - Identify architectural patterns used - - Note code conventions and styles - - Find similar implementations for reference - -## Deliverables - -Provide a comprehensive report including: - -1. **📍 Primary Locations**: - - Main implementation files with line numbers - - Entry points and key functions - - Configuration and setup files - -2. **🔍 Code Flow**: - - Step-by-step flow explanation - - How components interact - - Data transformation points - -3. **🗺️ Architecture Map**: - - High-level structure diagram - - Component relationships - - Dependency graph - -4. **📝 Code Snippets**: - - Key implementations (show important code) - - Patterns and conventions used - - Notable details or gotchas - -5. **🚀 Navigation Guide**: - - How to explore the code further - - Related files to examine - - Commands to run for testing - -6. **💡 Insights**: - - Why the code is structured this way - - Potential issues or improvements - - Best practices observed - -## Search Strategy - -### ⚠️ CRITICAL: Tool Selection - -**BEFORE ANY SEARCH, CHECK CLAUDEMEM STATUS:** -```bash -claudemem status -``` - -### ✅ PRIMARY METHOD: claudemem (Indexed Memory) - -```bash -# Index if needed -claudemem index -y - -# Semantic search (ALWAYS use this for investigation) -claudemem search "authentication login session" -n 15 -claudemem search "API endpoint handler route" -n 20 -claudemem search "data transformation pipeline" -n 10 -``` - -**Why claudemem is REQUIRED for investigation:** -- Understands code MEANING, not just text patterns -- Finds related code even with different terminology -- Returns ranked, relevant results -- AST-aware (understands code structure) - -### ❌ WHEN NOT TO USE GREP +1. Run `code-search-selector` skill first to validate tool choice. +2. Verify claudemem is indexed: `claudemem status` +3. If not indexed: `claudemem index -y` +4. Do NOT start with Read/Glob — use semantic search first, then read specific lines. -| User Request | ❌ DON'T | ✅ DO | -|-------------|----------|-------| -| "How does auth work?" | `grep -r "auth" src/` | `claudemem search "authentication flow"` | -| "Find API endpoints" | `grep -r "router" src/` | `claudemem search "API endpoint handler"` | -| "Trace data flow" | `grep -r "transform" src/` | `claudemem search "data transformation"` | -| "Audit architecture" | `ls -la src/` | `claudemem search "architecture layers"` | +## Workflow -### ⚠️ DEGRADED FALLBACK (Only if claudemem unavailable) +1. **Determine investigation scope** — classify the query (feature analysis, implementation search, flow tracing, debugging, pattern finding, or architecture audit). +2. **Launch codebase-detective** via the Task tool with detailed investigation instructions. +3. **Present analysis results** — executive summary, file locations with line numbers, code flow, architecture overview. +4. **Offer follow-up** — suggest deeper dives into specific areas. -**Only use grep/find if:** -1. claudemem is NOT installed, AND -2. User explicitly accepts degraded mode +## Example: Understanding Authentication -```bash -# DEGRADED MODE - inferior results expected -grep -r "pattern" src/ # Text match only, no semantic understanding -find . -name "*.ts" # File discovery only +```typescript +Task({ + subagent_type: "code-analysis:detective", + description: "Investigate user authentication and login flow", + prompt: ` + 1. Check claudemem status, index if needed + 2. claudemem search "authentication login session JWT" -n 15 + 3. Trace: login endpoint → auth service → token generation → middleware + 4. Report file locations with line numbers and data flow diagram + ` +}) ``` -**Always warn user**: "Using grep fallback - results will be less accurate than semantic search." - -## Output Format - -Structure your findings clearly with: -- File paths using backticks: `src/auth/login.ts:45` -- Code blocks for snippets -- Clear headings and sections +**Expected output:** +- Primary file locations with line numbers +- Step-by-step code flow explanation +- Architecture map showing component relationships - Actionable next steps -``` - -### Phase 3: Present Analysis Results - -After the agent completes, present results to the user: - -1. **Executive Summary** (2-3 sentences): - - What was found - - Where it's located - - Key insight - -2. **Detailed Findings**: - - Primary file locations with line numbers - - Code flow explanation - - Architecture overview - -3. **Visual Structure** (if complex): - ``` - EntryPoint (file:line) - ├── Validator (file:line) - ├── BusinessLogic (file:line) - │ └── DataAccess (file:line) - └── ResponseHandler (file:line) - ``` - -4. **Code Examples**: - - Show key code snippets inline - - Highlight important patterns - -5. **Next Steps**: - - Suggest follow-up investigations - - Offer to dive deeper into specific parts - - Provide commands to test/run the code - -### Phase 4: Offer Follow-up -Ask the user: -- "Would you like me to investigate any specific part in more detail?" -- "Do you want to see how [related feature] works?" -- "Should I trace [specific function] further?" - -## Example Scenarios - -### Example 1: Understanding Authentication - -``` -User: "How does login work in this app?" - -Skill invokes codebase-detective agent with: -"Investigate user authentication and login flow: -1. Find login API endpoint or form handler -2. Trace authentication logic -3. Identify token generation/storage -4. Find session management -5. Locate authentication middleware" - -Agent provides: -- src/api/auth/login.ts:34-78 (login endpoint) -- src/services/authService.ts:12-45 (JWT generation) -- src/middleware/authMiddleware.ts:23 (token validation) -- Flow: Form → API → Service → Middleware → Protected Routes -``` - -### Example 2: Debugging Undefined Error - -``` -User: "The dashboard shows 'undefined' for user name" - -Skill invokes codebase-detective agent with: -"Debug undefined user name in dashboard: -1. Find Dashboard component -2. Locate where user name is rendered -3. Trace user data fetching -4. Check data transformation/mapping -5. Identify where undefined is introduced" - -Agent provides: -- src/components/Dashboard.tsx:156 renders user.name -- src/hooks/useUser.ts:45 fetches user data -- Issue: API returns 'full_name' but code expects 'name' -- Fix: Map 'full_name' to 'name' in useUser hook -``` - -### Example 3: Finding All API Calls - -``` -User: "Where are all the API calls made?" - -Skill invokes codebase-detective agent with: -"Find all API call locations: -1. Search for fetch, axios, http client usage -2. Identify API client/service files -3. List all endpoints used -4. Note patterns (REST, GraphQL, etc) -5. Find error handling approach" - -Agent provides: -- 23 API calls across 8 files -- Centralized in src/services/* -- Using axios with interceptors -- Base URL in src/config/api.ts -- Error handling in src/utils/errorHandler.ts -``` - -## Success Criteria - -The Skill is successful when: - -1. ✅ User's question is comprehensively answered -2. ✅ Exact code locations provided with line numbers -3. ✅ Code relationships and flow clearly explained -4. ✅ User can navigate to code and understand it -5. ✅ Architecture patterns identified and explained -6. ✅ Follow-up questions anticipated +## Search Strategy -## Tips for Optimal Results +| Task | Tool | Example | +|------|------|---------| +| Understand how X works | `claudemem search` | `claudemem search "authentication flow"` | +| Find API endpoints | `claudemem search` | `claudemem search "API endpoint handler"` | +| Trace data flow | `claudemem search` | `claudemem search "data transformation"` | +| Exact string match only | `grep` (fallback) | `grep -r "DEPRECATED_FLAG" src/` | -1. **Be Comprehensive**: Don't just find one file, map the entire flow -2. **Provide Context**: Explain why code is structured this way -3. **Show Examples**: Include actual code snippets -4. **Think Holistically**: Connect related pieces across files -5. **Anticipate Questions**: Answer follow-up questions proactively +## Fallback Protocol -## Integration with Other Tools +If claudemem is unavailable or returns no results: -This Skill works well with: +1. **Stop** — do not silently switch to grep. +2. **Diagnose** — run `claudemem status`. +3. **Ask user** — offer reindex, different query, grep fallback (with quality warning), or cancel. -- **claudemem CLI**: For local semantic code search with Tree-sitter parsing -- **MCP gopls**: For Go-specific analysis -- **Standard CLI tools**: grep, ripgrep, find, git -- **Project-specific tools**: Use project's search/navigation tools +**Verification:** Confirm results include exact file paths with line numbers before presenting to user. ## Notes - The codebase-detective agent uses extended thinking for complex analysis -- **claudemem is REQUIRED** - grep/find produce inferior results -- Fallback to grep ONLY if claudemem unavailable AND user accepts degraded mode - claudemem requires OpenRouter API key (https://openrouter.ai) -- Default model: `voyage/voyage-code-3` (best code understanding) -- Run `claudemem --models` to see all options and pricing -- Results are actionable and navigable -- Great for onboarding to new codebases -- Helps prevent incorrect assumptions about code - -## Tool Selection Quick Reference - -``` -┌─────────────────────────────────────────────────────────────────────┐ -│ BEFORE ANY CODE INVESTIGATION: │ -│ │ -│ 1. INVOKE code-search-selector skill │ -│ 2. Run: claudemem status │ -│ 3. If indexed → USE claudemem search │ -│ 4. If not indexed → Index first OR ask user │ -│ 5. NEVER default to grep when claudemem available │ -│ 6. NEVER start with Read/Glob for semantic questions │ -│ │ -│ grep is for EXACT STRING MATCHES only, NOT semantic understanding │ -└─────────────────────────────────────────────────────────────────────┘ -``` - ---- - -**Maintained by:** MadAppGang -**Plugin:** code-analysis v2.2.0 -**Last Updated:** December 2025 +- Default model: `voyage/voyage-code-3` for code understanding +- grep is for exact string matches only, not semantic understanding diff --git a/plugins/code-analysis/skills/developer-detective/SKILL.md b/plugins/code-analysis/skills/developer-detective/SKILL.md index f94477c..6ff1452 100644 --- a/plugins/code-analysis/skills/developer-detective/SKILL.md +++ b/plugins/code-analysis/skills/developer-detective/SKILL.md @@ -1,495 +1,63 @@ --- name: developer-detective -description: "⚡ Implementation analysis skill. Best for: 'how does X work', 'find implementation of', 'trace data flow', 'where is X defined', 'find all usages'. Uses claudemem AST with callers/callees for efficient code tracing." -allowed-tools: Bash, Task, Read, AskUserQuestion +description: "Traces implementation details using claudemem AST callers/callees analysis. Locates function definitions, maps data flow, finds all usages, and assesses change impact. Use when asked 'how does X work', 'find implementation of', 'trace data flow', 'where is X defined', or 'find all usages'." --- -# Developer Detective Skill +# Developer Detective -This skill uses claudemem's callers/callees analysis for implementation investigation. +Software Developer perspective for implementation investigation using claudemem's `callers`, `callees`, `context`, `symbol`, and `impact` commands for precise code tracing. -## Why Claudemem Works Better for Development +## Workflow -| Task | claudemem | Native Tools | -|------|-----------|--------------| -| Find usages | `callers` shows all call sites | Grep (text match) | -| Trace dependencies | `callees` shows called functions | Manual reading | -| Understand context | `context` gives full picture | Multiple reads | -| Impact analysis | Caller chain reveals risk | Unknown | +1. **Verify claudemem** — confirm v0.3.0+ installed and indexed. Check freshness; reindex if stale. +2. **Map the area** — run `claudemem --agent map "feature area"` to get an overview. +3. **Find the entry point** — run `claudemem --agent symbol ` to locate the highest-PageRank symbol. +4. **Trace the flow** — run `callees` to see what the function calls (data flows out), then follow the chain. +5. **Understand usage** — run `callers` to see every place that calls the function (impact of changes). +6. **Check impact** (v0.4.0+) — before modifying code, run `claudemem --agent impact ` for full transitive caller tree. +7. **Read specific code** — use Read tool on exact file:line ranges from results, never whole files. -**Primary commands:** -- `claudemem --agent callers ` - What calls this code -- `claudemem --agent callees ` - What this code calls -- `claudemem --agent context ` - Full understanding - -# Developer Detective Skill - -**Version:** 3.3.0 -**Role:** Software Developer -**Purpose:** Implementation investigation using AST callers/callees and impact analysis - -## Role Context - -You are investigating this codebase as a **Software Developer**. Your focus is on: -- **Implementation details** - How code actually works -- **Data flow** - How data moves through the system (via callees) -- **Usage patterns** - How code is used (via callers) -- **Dependencies** - What a function needs to work -- **Impact analysis** - What breaks if you change something - -## Why callers/callees is Perfect for Development - -The `callers` and `callees` commands show you: -- **callers** = Every place that calls this code (impact of changes) -- **callees** = Every function this code calls (its dependencies) -- **Exact file:line** = Precise locations for reading/editing -- **Call kinds** = call, import, extends, implements - -## Developer-Focused Commands (v0.3.0) - -### Find Implementation +## Example: Tracing Payment Flow ```bash -# Find where a function is defined +# Step 1: Find the function claudemem --agent symbol processPayment -# Get full context with callers and callees -claudemem --agent context processPayment``` -### Trace Data Flow - -```bash -# What does this function call? (data flows OUT) +# Step 2: What does it call? (data flow) claudemem --agent callees processPayment -# Follow the chain -claudemem --agent callees validateCardclaudemem --agent callees chargeStripe``` +# Output: validateCard, getCustomer, chargeStripe, saveTransaction -### Find All Usages - -```bash -# Who calls this function? (usage patterns) +# Step 3: Who calls it? (usage/impact) claudemem --agent callers processPayment -# This shows EVERY place that uses this code -``` - -### Impact Analysis (v0.4.0+ Required) - -```bash -# Before modifying ANY code, check full impact -claudemem --agent impact functionToChange -# Output shows ALL transitive callers: -# direct_callers: -# - LoginController.authenticate:34 -# - SessionMiddleware.validate:12 -# transitive_callers (depth 2): -# - AppRouter.handleRequest:45 -# - TestSuite.runAuth:89 -``` - -**Why impact matters**: -- `callers` shows only direct callers (1 level) -- `impact` shows ALL transitive callers (full tree) -- Critical for refactoring decisions - -**Handling Empty Results:** -```bash -IMPACT=$(claudemem --agent impact functionToChange) -if echo "$IMPACT" | grep -q "No callers"; then - echo "No callers found. This is either:" - echo " 1. An entry point (API handler, main function) - expected" - echo " 2. Dead code - verify with: claudemem dead-code" - echo " 3. Dynamically called - check for import(), reflection" -fi -``` - -### Impact Analysis (BEFORE Modifying) - -```bash -# Quick check - direct callers only (v0.3.0) -claudemem --agent callers functionToChange -# Deep check - ALL transitive callers (v0.4.0+ Required) -IMPACT=$(claudemem --agent impact functionToChange) - -# Handle results -if [ -z "$IMPACT" ] || echo "$IMPACT" | grep -q "No callers"; then - echo "No static callers found - verify dynamic usage patterns" -else - echo "$IMPACT" - echo "" - echo "This tells you:" - echo "- Direct callers (immediate impact)" - echo "- Transitive callers (ripple effects)" - echo "- Grouped by file (for systematic updates)" -fi -``` - -### Understanding Complex Code - -```bash -# Get full picture: definition + callers + callees -claudemem --agent context complexFunction``` - -## PHASE 0: MANDATORY SETUP - -### Step 1: Verify claudemem v0.3.0 - -```bash -which claudemem && claudemem --version -# Must be 0.3.0+ -``` - -### Step 2: If Not Installed → STOP - -Use AskUserQuestion (see ultrathink-detective for template) - -### Step 3: Check Index Status - -```bash -# Check claudemem installation and index -claudemem --version && ls -la .claudemem/index.db 2>/dev/null -``` - -### Step 3.5: Check Index Freshness - -Before proceeding with investigation, verify the index is current: - -```bash -# First check if index exists -if [ ! -d ".claudemem" ] || [ ! -f ".claudemem/index.db" ]; then - # Use AskUserQuestion to prompt for index creation - # Options: [1] Create index now (Recommended), [2] Cancel investigation - exit 1 -fi - -# Count files modified since last index -STALE_COUNT=$(find . -type f \( -name "*.ts" -o -name "*.tsx" -o -name "*.js" -o -name "*.jsx" -o -name "*.py" -o -name "*.go" -o -name "*.rs" \) \ - -newer .claudemem/index.db 2>/dev/null | grep -v "node_modules" | grep -v ".git" | grep -v "dist" | grep -v "build" | wc -l) -STALE_COUNT=$((STALE_COUNT + 0)) # Normalize to integer - -if [ "$STALE_COUNT" -gt 0 ]; then - # Get index time with explicit platform detection - if [[ "$OSTYPE" == "darwin"* ]]; then - INDEX_TIME=$(stat -f "%Sm" -t "%Y-%m-%d %H:%M" .claudemem/index.db 2>/dev/null) - else - INDEX_TIME=$(stat -c "%y" .claudemem/index.db 2>/dev/null | cut -d'.' -f1) - fi - INDEX_TIME=${INDEX_TIME:-"unknown time"} - - # Get sample of stale files - STALE_SAMPLE=$(find . -type f \( -name "*.ts" -o -name "*.tsx" \) \ - -newer .claudemem/index.db 2>/dev/null | grep -v "node_modules" | grep -v ".git" | head -5) - - # Use AskUserQuestion (see template in ultrathink-detective) -fi -``` - -### Step 4: Index if Needed - -```bash -claudemem index -``` - ---- - -## Workflow: Implementation Investigation (v0.3.0) - -### Phase 1: Map the Area - -```bash -# Get overview of the feature area -claudemem --agent map "payment processing"``` - -### Phase 2: Find the Entry Point - -```bash -# Locate the main function (highest PageRank in area) -claudemem --agent symbol PaymentService``` - -### Phase 3: Trace the Flow - -```bash -# What does PaymentService call? -claudemem --agent callees PaymentService -# For each major callee, trace further -claudemem --agent callees validatePaymentclaudemem --agent callees processChargeclaudemem --agent callees saveTransaction``` - -### Phase 4: Understand Usage - -```bash -# Who uses PaymentService? -claudemem --agent callers PaymentService -# This shows the entry points -``` - -### Phase 5: Read Specific Code - -```bash -# Now read ONLY the relevant file:line ranges from results -# DON'T read whole files -``` - -## Output Format: Implementation Report +# Output: CheckoutController.submit:45, SubscriptionService.renew:89 -### 1. Symbol Overview - -``` -┌─────────────────────────────────────────────────────────┐ -│ IMPLEMENTATION ANALYSIS │ -├─────────────────────────────────────────────────────────┤ -│ Symbol: processPayment │ -│ Location: src/services/payment.ts:45-89 │ -│ Kind: function │ -│ PageRank: 0.034 │ -│ Search Method: claudemem v0.3.0 (AST analysis) │ -└─────────────────────────────────────────────────────────┘ -``` - -### 2. Data Flow (Callees) - -``` -processPayment - ├── validateCard (src/validators/card.ts:12) - ├── getCustomer (src/services/customer.ts:34) - ├── chargeStripe (src/integrations/stripe.ts:56) - │ └── stripe.charges.create (external) - └── saveTransaction (src/repositories/transaction.ts:78) - └── database.insert (src/db/index.ts:23) -``` - -### 3. Usage (Callers) - -``` -processPayment is called by: - ├── CheckoutController.submit (src/controllers/checkout.ts:45) - ├── SubscriptionService.renew (src/services/subscription.ts:89) - └── RetryQueue.processPayment (src/workers/retry.ts:23) -``` - -### 4. Impact Analysis - -``` -⚠️ IMPACT: Changing processPayment will affect: - - 3 direct callers (shown above) - - Checkout flow (user-facing) - - Subscription renewals (automated) - - Payment retry logic (background) -``` - -## Scenarios - -### Scenario: "How does X work?" - -```bash -# Step 1: Find X -claudemem --agent symbol X -# Step 2: See what X does -claudemem --agent callees X -# Step 3: See how X is used -claudemem --agent callers X -# Step 4: Read the specific code -# Use Read tool on exact file:line from results -``` - -### Scenario: Refactoring - -```bash -# Step 1: Find ALL usages (callers) -claudemem --agent callers oldFunction -# Step 2: Document each caller location -# Step 3: Update each caller systematically -``` - -### Scenario: Adding to Existing Code - -```bash -# Step 1: Find where to add -claudemem --agent symbol targetModule -# Step 2: Understand dependencies -claudemem --agent callees targetModule -# Step 3: Check existing patterns -claudemem --agent callers targetModule``` - -## Result Validation Pattern - -After EVERY claudemem command, validate results: - -### Symbol/Callers Validation - -When tracing implementation: - -```bash -# Find symbol -SYMBOL=$(claudemem --agent symbol PaymentService) -EXIT_CODE=$? - -if [ "$EXIT_CODE" -ne 0 ] || [ -z "$SYMBOL" ] || echo "$SYMBOL" | grep -qi "not found\|error"; then - # Symbol doesn't exist, typo, or index issue - # Diagnose index health - DIAGNOSIS=$(claudemem --version && ls -la .claudemem/index.db 2>&1) - # Use AskUserQuestion with suggestions: - # [1] Reindex, [2] Try different name, [3] Cancel -fi - -# Check callers -CALLERS=$(claudemem --agent callers PaymentService) -# 0 callers is valid (entry point or unused) -# But error message is not -if echo "$CALLERS" | grep -qi "error\|failed"; then - # Use AskUserQuestion -fi -``` - -### Empty/Irrelevant Results - -```bash -RESULTS=$(claudemem --agent callees FunctionName) - -# Validate relevance -# Extract keywords from the user's investigation query -# Example: QUERY="how does auth work" → KEYWORDS="auth work authentication" -# The orchestrating agent must populate KEYWORDS before this check -MATCH_COUNT=0 -for kw in $KEYWORDS; do - if echo "$RESULTS" | grep -qi "$kw"; then - MATCH_COUNT=$((MATCH_COUNT + 1)) - fi -done - -if [ "$MATCH_COUNT" -eq 0 ]; then - # Results don't match expected dependencies - # Use AskUserQuestion: Reindex, Different query, or Cancel -fi +# Step 4: Full impact before refactoring (v0.4.0+) +claudemem --agent impact processPayment ``` ---- - -## FALLBACK PROTOCOL - -**CRITICAL: Never use grep/find/Glob without explicit user approval.** - -If claudemem fails or returns irrelevant results: +**Verification:** Confirm callers/callees output includes file:line references. If symbol returns "not found", check spelling or reindex. -1. **STOP** - Do not silently switch tools -2. **DIAGNOSE** - Run `claudemem status` -3. **REPORT** - Tell user what happened -4. **ASK** - Use AskUserQuestion for next steps - -```typescript -// Fallback options (in order of preference) -AskUserQuestion({ - questions: [{ - question: "claudemem [command] failed or returned no relevant results. How should I proceed?", - header: "Investigation Issue", - multiSelect: false, - options: [ - { label: "Reindex codebase", description: "Run claudemem index (~1-2 min)" }, - { label: "Try different query", description: "Rephrase the search" }, - { label: "Use grep (not recommended)", description: "Traditional search - loses call graph analysis" }, - { label: "Cancel", description: "Stop investigation" } - ] - }] -}) -``` - -**See ultrathink-detective skill for complete Fallback Protocol documentation.** - ---- - -## Anti-Patterns - -| Anti-Pattern | Why Wrong | Correct Approach | -|--------------|-----------|------------------| -| `grep -r "function"` | No call relationships | `claudemem --agent callees func` | -| Modify without callers | Breaking changes | ALWAYS check `callers` first | -| Read whole files | Token waste | Read specific file:line from results | -| Guess dependencies | Miss connections | Use `callees` for exact deps | -| `cmd \| head/tail` | Hides callers/callees | Use full output or `--tokens` | - -### Output Truncation Warning - -╔══════════════════════════════════════════════════════════════════════════════╗ -║ ║ -║ ❌ Anti-Pattern 7: Truncating Claudemem Output ║ -║ ║ -║ FORBIDDEN (any form of output truncation): ║ -║ → BAD: claudemem --agent map "query" | head -80 ║ -║ → BAD: claudemem --agent callers X | tail -50 ║ -║ → BAD: claudemem --agent search "x" | grep -m 10 "y" ║ -║ → BAD: claudemem --agent map "q" | awk 'NR <= 50' ║ -║ → BAD: claudemem --agent callers X | sed '50q' ║ -║ → BAD: claudemem --agent search "x" | sort | head -20 ║ -║ → BAD: claudemem --agent map "q" | grep "pattern" | head -20 ║ -║ ║ -║ CORRECT (use full output or built-in limits): ║ -║ → GOOD: claudemem --agent map "query" ║ -║ → GOOD: claudemem --agent search "x" -n 10 ║ -║ → GOOD: claudemem --agent map "q" --tokens 2000 ║ -║ → GOOD: claudemem --agent search "x" --page-size 20 --page 1 ║ -║ → GOOD: claudemem --agent context Func --max-depth 3 ║ -║ ║ -║ WHY: Output is pre-optimized; truncation hides critical results ║ -║ ║ -╚══════════════════════════════════════════════════════════════════════════════╝ - ---- +## Common Scenarios -## Feedback Reporting (v0.8.0+) +| Scenario | Commands | +|----------|----------| +| "How does X work?" | `symbol X` → `callees X` → `callers X` | +| Refactoring | `callers oldFunction` → document each location → update systematically | +| Adding to existing code | `symbol targetModule` → `callees` (deps) → `callers` (patterns) | +| Impact assessment | `impact functionToChange` → review all transitive callers | -After completing investigation, report search feedback to improve future results. +## Fallback Protocol -### When to Report +Never use grep/find/Glob without explicit user approval. If claudemem fails: -Report feedback ONLY if you used the `search` command during investigation: - -| Result Type | Mark As | Reason | -|-------------|---------|--------| -| Read and used | Helpful | Contributed to investigation | -| Read but irrelevant | Unhelpful | False positive | -| Skipped after preview | Unhelpful | Not relevant to query | -| Never read | (Don't track) | Can't evaluate | - -### Feedback Pattern - -```bash -# Track during investigation -SEARCH_QUERY="your original query" -HELPFUL_IDS="" -UNHELPFUL_IDS="" - -# When reading a helpful result -HELPFUL_IDS="$HELPFUL_IDS,$result_id" - -# When reading an unhelpful result -UNHELPFUL_IDS="$UNHELPFUL_IDS,$result_id" - -# Report at end of investigation (v0.8.0+ only) -if claudemem feedback --help 2>&1 | grep -qi "feedback"; then - timeout 5 claudemem feedback \ - --query "$SEARCH_QUERY" \ - --helpful "${HELPFUL_IDS#,}" \ - --unhelpful "${UNHELPFUL_IDS#,}" 2>/dev/null || true -fi -``` - -### Output Update - -Include in investigation report: - -``` -Search Feedback: [X helpful, Y unhelpful] - Submitted (v0.8.0+) -``` - ---- +1. Stop — do not silently switch tools. +2. Diagnose — run `claudemem status`. +3. Ask user via AskUserQuestion (reindex, different query, grep fallback with warning, or cancel). ## Notes -- **`callers` is essential before any modification** - Know your impact -- **`callees` traces data flow** - Follow the execution path -- **`context` gives complete picture** - Symbol + callers + callees -- Always read specific file:line ranges, not whole files -- Works best with TypeScript, Go, Python, Rust codebases - ---- - -**Maintained by:** MadAppGang -**Plugin:** code-analysis v2.7.0 -**Last Updated:** December 2025 (v3.3.0 - Cross-platform compatibility, inline templates, improved validation) +- Always check `callers` before modifying any code — know the impact +- `callees` traces data flow — follow the execution path +- `context` gives complete picture — symbol + callers + callees combined +- Read specific file:line ranges, never whole files +- Never truncate claudemem output — use `--tokens` or `-n` flags for size control diff --git a/plugins/code-analysis/skills/investigate/SKILL.md b/plugins/code-analysis/skills/investigate/SKILL.md index cc9deed..c724c73 100644 --- a/plugins/code-analysis/skills/investigate/SKILL.md +++ b/plugins/code-analysis/skills/investigate/SKILL.md @@ -1,346 +1,67 @@ --- name: investigate -description: "Unified entry point for code investigation. Auto-routes to specialized detective based on query keywords. Use when investigation type is unclear or for general exploration." -allowed-tools: Bash, Task, AskUserQuestion +description: "Routes code investigation queries to specialized detectives (debugger, tester, architect, developer) via priority-based keyword matching and Task delegation. Use when the investigation type is unclear, for general code exploration, or to auto-select the right detective skill." --- # Investigate Skill -**Version:** 1.0.0 -**Purpose:** Keyword-based routing to specialized detective skills -**Pattern:** Smart delegation via Task tool - -## Overview - -This skill analyzes your investigation query and routes to the appropriate detective specialist: -- **debugger-detective** (errors, bugs, crashes) -- **tester-detective** (tests, coverage, edge cases) -- **architect-detective** (architecture, design, patterns) -- **developer-detective** (implementation, data flow - default) - -## Routing Logic - -### Priority System (Highest First) - -1. **Error/Debug** (Priority 1) - Time-critical bug fixes - - Keywords: "debug", "error", "broken", "failing", "crash" - - Route to: `debugger-detective` - -2. **Testing** (Priority 2) - Specialized test analysis - - Keywords: "test", "coverage", "edge case", "mock" - - Route to: `tester-detective` - -3. **Architecture** (Priority 3) - High-level understanding - - Keywords: "architecture", "design", "structure", "layer" - - Route to: `architect-detective` - -4. **Implementation** (Default, Priority 4) - Most common - - Keywords: "implementation", "how does", "code flow" - - Route to: `developer-detective` - -### Conflict Resolution - -When multiple keywords from different categories are detected: -- **Highest priority wins** (Priority 1 beats Priority 2, etc.) -- **No matches**: Default to developer-detective +Unified entry point for code investigation. Analyzes query keywords, selects the best detective specialist, and delegates via the Task tool. ## Workflow -### Phase 1: Extract Query - -The investigation query should be available from the task description or user input. +1. **Extract and normalize the query** from the task description or user input. +2. **Detect keywords** using the priority system below and select a detective. +3. **Show the routing decision** to the user before delegating. +4. **Delegate via the Task tool** to the chosen detective. +5. **Offer override** if the user disagrees with the auto-routing. -```bash -# Query comes from the Task description or user request -INVESTIGATION_QUERY="${TASK_DESCRIPTION:-$USER_QUERY}" - -# Normalize to lowercase for case-insensitive matching -QUERY_LOWER=$(echo "$INVESTIGATION_QUERY" | tr '[:upper:]' '[:lower:]') -``` - -### Phase 2: Keyword Detection +## Priority Routing -```bash -# Priority 1: Error/Debug keywords -if echo "$QUERY_LOWER" | grep -qE "debug|error|broken|failing|crash"; then - DETECTIVE="debugger-detective" - KEYWORDS="debug/error keywords" - PRIORITY=1 - RATIONALE="Bug fixes are time-critical and require call chain tracing" +| Priority | Category | Keywords | Detective | +|----------|----------|----------|-----------| +| 1 | Error/Debug | debug, error, broken, failing, crash | `debugger-detective` | +| 2 | Testing | test, coverage, edge case, mock | `tester-detective` | +| 3 | Architecture | architecture, design, structure, layer | `architect-detective` | +| 4 | Implementation (default) | implementation, how does, code flow | `developer-detective` | -# Priority 2: Testing keywords -elif echo "$QUERY_LOWER" | grep -qE "test|coverage|edge case|mock"; then - DETECTIVE="tester-detective" - KEYWORDS="test/coverage keywords" - PRIORITY=2 - RATIONALE="Test analysis is specialized and requires callers analysis" +When multiple categories match, the highest priority wins. No matches default to `developer-detective`. -# Priority 3: Architecture keywords -elif echo "$QUERY_LOWER" | grep -qE "architecture|design|structure|layer"; then - DETECTIVE="architect-detective" - KEYWORDS="architecture/design keywords" - PRIORITY=3 - RATIONALE="High-level understanding requires PageRank analysis" - -# Priority 4: Implementation (default) -else - DETECTIVE="developer-detective" - KEYWORDS="implementation (default)" - PRIORITY=4 - RATIONALE="Most common investigation type - data flow via callers/callees" -fi -``` - -### Phase 3: User Feedback - -Before delegating, inform the user of the routing decision: - -```bash -echo "" -echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" -echo "🔍 Investigation Routing" -echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" -echo "" -echo "Query: $INVESTIGATION_QUERY" -echo "" -echo "Detected: $KEYWORDS (Priority $PRIORITY)" -echo "Routing to: $DETECTIVE" -echo "Reason: $RATIONALE" -echo "" -echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" -echo "" -``` - -### Phase 4: Delegation via Task Tool - -Use the Task tool to delegate to the selected detective: - -```typescript -Task({ - description: INVESTIGATION_QUERY, - agent: DETECTIVE, - context: { - routing_reason: `Auto-routed based on ${KEYWORDS}`, - original_query: INVESTIGATION_QUERY, - priority: PRIORITY - } -}) -``` - -## Examples - -### Example 1: Debug Keywords +## Example **Input:** "Why is login broken?" -**Detection:** -- Keyword matched: "broken" -- Priority: 1 (Error/Debug) -- Route to: debugger-detective - -**Feedback:** -``` -🔍 Investigation Routing -Query: Why is login broken? -Detected: debug/error keywords (Priority 1) -Routing to: debugger-detective -Reason: Bug fixes are time-critical and require call chain tracing -``` - -### Example 2: Test Keywords - -**Input:** "What's the test coverage for payment?" - -**Detection:** -- Keywords matched: "test", "coverage" -- Priority: 2 (Testing) -- Route to: tester-detective - -**Feedback:** -``` -🔍 Investigation Routing -Query: What's the test coverage for payment? -Detected: test/coverage keywords (Priority 2) -Routing to: tester-detective -Reason: Test analysis is specialized and requires callers analysis -``` - -### Example 3: Architecture Keywords - -**Input:** "What's the architecture of the auth layer?" - -**Detection:** -- Keywords matched: "architecture", "layer" -- Priority: 3 (Architecture) -- Route to: architect-detective - -**Feedback:** -``` -🔍 Investigation Routing -Query: What's the architecture of the auth layer? -Detected: architecture/design keywords (Priority 3) -Routing to: architect-detective -Reason: High-level understanding requires PageRank analysis -``` - -### Example 4: No Keywords (Default) - -**Input:** "How does payment work?" - -**Detection:** -- No keywords matched -- Priority: 4 (Default) -- Route to: developer-detective - -**Feedback:** -``` -🔍 Investigation Routing -Query: How does payment work? -Detected: implementation (default) (Priority 4) -Routing to: developer-detective -Reason: Most common investigation type - data flow via callers/callees -``` - -### Example 5: Multi-Keyword Conflict - -**Input:** "Debug the test coverage" - -**Detection:** -- Keywords matched: "debug" (Priority 1) AND "test" (Priority 2) -- Priority 1 wins -- Route to: debugger-detective - -**Feedback:** -``` -🔍 Investigation Routing -Query: Debug the test coverage -Detected: debug/error keywords (Priority 1) -Routing to: debugger-detective -Reason: Bug fixes are time-critical and require call chain tracing -(Note: Also detected test keywords, but debug takes priority) -``` - -## Complete Implementation - -Here's the full workflow: - ```bash -#!/bin/bash - -# Get investigation query from task description -INVESTIGATION_QUERY="${TASK_DESCRIPTION}" - -# Normalize to lowercase +# Normalize query QUERY_LOWER=$(echo "$INVESTIGATION_QUERY" | tr '[:upper:]' '[:lower:]') -# Keyword detection with priority routing -if echo "$QUERY_LOWER" | grep -qE "debug|error|broken|failing|crash"; then - DETECTIVE="debugger-detective" - KEYWORDS="debug/error keywords" - PRIORITY=1 - RATIONALE="Bug fixes are time-critical and require call chain tracing" - -elif echo "$QUERY_LOWER" | grep -qE "test|coverage|edge case|mock"; then - DETECTIVE="tester-detective" - KEYWORDS="test/coverage keywords" - PRIORITY=2 - RATIONALE="Test analysis is specialized and requires callers analysis" - -elif echo "$QUERY_LOWER" | grep -qE "architecture|design|structure|layer"; then - DETECTIVE="architect-detective" - KEYWORDS="architecture/design keywords" - PRIORITY=3 - RATIONALE="High-level understanding requires PageRank analysis" - -else - DETECTIVE="developer-detective" - KEYWORDS="implementation (default)" - PRIORITY=4 - RATIONALE="Most common investigation type - data flow via callers/callees" -fi - -# Show routing decision -echo "" -echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" -echo "🔍 Investigation Routing" -echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" -echo "" -echo "Query: $INVESTIGATION_QUERY" -echo "" -echo "Detected: $KEYWORDS (Priority $PRIORITY)" -echo "Routing to: $DETECTIVE" -echo "Reason: $RATIONALE" -echo "" -echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" -echo "" +# Keyword detection — "broken" matches Priority 1 +DETECTIVE="debugger-detective" ``` -Then use the Task tool to delegate: - ```typescript +// Delegate to selected detective Task({ - description: INVESTIGATION_QUERY, - agent: DETECTIVE + description: "Why is login broken?", + agent: "debugger-detective" }) ``` -## Fallback Protocol - -If routing produces unexpected results: +**Verification:** Confirm the routing decision is shown to the user before delegation proceeds. -1. **Show routing decision** to user -2. **Ask for override** if needed via AskUserQuestion -3. **Default to developer-detective** if ambiguous +## Conflict Resolution Example -### Override Pattern - -```typescript -// If user wants to override the routing -AskUserQuestion({ - questions: [{ - question: `Auto-routing selected ${DETECTIVE}. Override?`, - header: "Investigation Routing", - multiSelect: false, - options: [ - { label: "Continue with auto-routing", description: `Use ${DETECTIVE}` }, - { label: "debugger-detective", description: "Root cause analysis" }, - { label: "tester-detective", description: "Test coverage analysis" }, - { label: "architect-detective", description: "Architecture patterns" }, - { label: "developer-detective", description: "Implementation details" } - ] - }] -}) -``` - -## Integration with Existing Workflow - -This skill is **additive only** and does not change existing behavior: - -- **Direct detective usage** still works (Task → specific detective) -- **/analyze command** unchanged (launches codebase-detective) -- **Parallel orchestration** patterns unchanged -- **All claudemem hooks** preserved +**Input:** "Debug the test coverage" +- "debug" matches Priority 1, "test" matches Priority 2 +- Priority 1 wins → routes to `debugger-detective` -## Use Cases +## Fallback Protocol -| When to Use Investigate Skill | When to Use Direct Detective | -|-------------------------------|------------------------------| -| Investigation type unclear | You know which specialist you need | -| General exploration | Parallel orchestration (multimodel plugin) | -| Quick routing decision | Specific workflow requirements | -| Learning/experimenting | Production automation | +1. Show the routing decision to the user. +2. Offer override via AskUserQuestion if needed. +3. Default to `developer-detective` when ambiguous. ## Notes - Case-insensitive keyword matching -- Priority system resolves conflicts -- User sees routing decision before delegation -- Original query preserved in Task context -- Default to developer-detective when no keywords match +- Additive only — direct detective usage and /analyze command remain unchanged - Works with all claudemem versions (v0.3.0+) - ---- - -**Maintained by:** MadAppGang -**Plugin:** code-analysis v3.1.0 -**Last Updated:** January 2026 (v1.0.0 - Initial release) diff --git a/plugins/code-analysis/skills/search-interceptor/SKILL.md b/plugins/code-analysis/skills/search-interceptor/SKILL.md index 01e2ef7..f45609a 100644 --- a/plugins/code-analysis/skills/search-interceptor/SKILL.md +++ b/plugins/code-analysis/skills/search-interceptor/SKILL.md @@ -1,213 +1,68 @@ --- name: search-interceptor -description: "💡 Bulk file read optimizer. Suggests semantic search alternatives when reading multiple files. Helps reduce token usage by using claudemem's ranked results instead of sequential file reads." -allowed-tools: Bash, AskUserQuestion +description: "Intercepts bulk file read and glob operations, suggests semantic search alternatives via claudemem to reduce token usage by up to 90%. Use when planning to read 3+ files, using broad glob patterns, or investigating code across multiple files." --- # Search Interceptor -This skill helps optimize bulk file operations by suggesting semantic search alternatives when they would be more efficient. +Optimizes bulk file operations by redirecting to semantic search when more efficient. Checks claudemem status, evaluates planned operations, and suggests ranked alternatives. -## When Semantic Search is More Efficient +## Workflow -| Scenario | Token Cost | Alternative | -|----------|------------|-------------| -| Read 5+ files | ~5000 tokens | `claudemem search` (~500 tokens) | -| Glob all *.ts files | ~3000+ tokens | `claudemem --agent map` | -| Sequential reads to understand | Variable | One semantic query | +1. **Pause before bulk execution** — when about to read 3+ files or use broad globs, stop. +2. **Check claudemem status** — run `claudemem status` to verify the index is available. +3. **Evaluate the operation** against the decision matrix below. +4. **Execute the better alternative** — use one semantic query instead of N file reads. +5. **Read specific lines** from ranked results only after the semantic search. -## When to Consider Alternatives - -### Multiple File Reads - -If planning to read several files, consider: -```bash -# Instead of reading 5 files individually -claudemem search "concept from those files" -n 15 -# Gets ranked results with context -``` - -### Broad Glob Patterns - -If using patterns like `src/**/*.ts`: -```bash -# Instead of globbing and reading all matches -claudemem --agent map "what you're looking for" -# Gets structural overview with PageRank ranking -``` - -### File Paths Mentioned in Task - -Even when specific paths are mentioned, semantic search often finds additional relevant code: -```bash -claudemem search "concept related to mentioned files" -``` - ---- - -## Interception Protocol - -### Step 1: Pause Before Execution - -When you're about to execute bulk file operations, STOP and run: - -```bash -claudemem status -``` - -### Step 2: Evaluate - -**If claudemem is indexed:** - -| Your Plan | Better Alternative | -|-----------|-------------------| -| Read 5 auth files | `claudemem search "authentication login session"` | -| Glob all services | `claudemem search "service layer business logic"` | -| Read mentioned paths | `claudemem search "[concept from those paths]"` | - -**If claudemem is NOT indexed:** - -```bash -claudemem index -y -``` -Then proceed with semantic search. - -### Step 3: Execute Better Alternative - -```bash -# Instead of reading N files, run ONE semantic query -claudemem search "concept describing what you need" -n 15 - -# ONLY THEN read specific lines from results -``` - ---- - -## Interception Decision Matrix +## Decision Matrix | Situation | Intercept? | Action | |-----------|-----------|--------| | Read 1-2 specific files | No | Proceed with Read | -| Read 3+ files in investigation | **YES** | Convert to claudemem search | +| Read 3+ files in investigation | **Yes** | `claudemem search "concept" -n 15` | | Glob for exact filename | No | Proceed with Glob | -| Glob for pattern discovery | **YES** | Convert to claudemem search | +| Glob for pattern discovery | **Yes** | `claudemem search "concept"` | | Grep for exact string | No | Proceed with Grep | -| Grep for semantic concept | **YES** | Convert to claudemem search | -| Files mentioned in prompt | **YES** | Search semantically first | +| Grep for semantic concept | **Yes** | `claudemem search "concept"` | ---- - -## Examples of Interception - -### Example 1: Auth Investigation +## Example: Auth Investigation -**❌ Original plan:** -``` -I see the task mentions auth, let me read: -- src/services/auth/login.ts -- src/services/auth/session.ts -- src/services/auth/jwt.ts -- src/services/auth/middleware.ts -- src/services/auth/utils.ts -``` - -**✅ After interception:** ```bash -claudemem status # Check if indexed +# Instead of reading 5 auth files individually (~5000 tokens): +claudemem status claudemem search "authentication login session JWT token validation" -n 15 -# Now I have ranked, relevant chunks instead of 5 full files -``` - -### Example 2: API Integration Audit - -**❌ Original plan:** -``` -Audit mentions Prime API files: -- src/services/prime/internal_api/client.ts -- src/services/prime/api.ts -Let me just Read these directly... +# Result: ~500 tokens with ranked, relevant chunks ``` -**✅ After interception:** -```bash -claudemem search "Prime API integration endpoints HTTP client" -n 20 -# This finds ALL Prime-related code, ranked by relevance -# Not just the 2 files mentioned -``` +**Verification:** Compare token cost of planned reads vs. semantic search result size. -### Example 3: Pattern Discovery +## Example: Pattern Discovery -**❌ Original plan:** -``` -Glob("src/**/*.controller.ts") -Then read all 15 controllers to understand routing -``` - -**✅ After interception:** ```bash +# Instead of: Glob("src/**/*.controller.ts") then reading 15 files claudemem search "HTTP controller endpoint route handler" -n 20 -# Gets the most relevant routing code, not all controllers +# Gets the most relevant routing code, ranked by PageRank ``` ---- - -## Why Semantic Search Often Works Better +## Token Cost Comparison -| Native Tools | Semantic Search | -|--------------|-----------------| -| No ranking | Ranked by relevance + PageRank | -| No relationships | Shows code connections | -| ~5000 tokens for 5 files | ~500 tokens for ranked results | -| Only explicitly requested code | Discovers related code | +| Approach | Token Cost | Ranking | +|----------|------------|---------| +| Read 5+ files | ~5000 tokens | None | +| Glob + read all matches | ~3000+ tokens | None | +| `claudemem search` once | ~500 tokens | By relevance + PageRank | -**Tip:** For investigation tasks, try `claudemem search` first to get a ranked view of relevant code. +## Bypass Flag ---- - -## Integration with Other Skills - -This skill works with: - -| Skill | Relationship | -|-------|-------------| -| `code-search-selector` | Selector determines WHAT tool; Interceptor validates BEFORE execution | -| `claudemem-search` | Interceptor redirects to claudemem; this skill shows HOW to search | -| `deep-analysis` | Interceptor prevents bad patterns; deep-analysis uses good patterns | -| Detective skills | Interceptor prevents duplicate work by trusting detective agents | - ---- - -## Hook System Integration - -The hook system may provide claudemem results proactively when the index is available: - -- **Grep queries** → May receive claudemem search results instead -- **Bulk reads** → May receive suggestion to use semantic search -- **Broad globs** → May receive map results +When native tool behavior is specifically needed: -### Using the Bypass Flag - -When you specifically need native tool behavior: ```json { "pattern": "exact string", "_bypass_claudemem": true } ``` -This tells hooks you intentionally want native tool output. - ---- - -## Quick Reference - -Before bulk Read/Glob operations, consider: - -1. **Is claudemem indexed?** → `claudemem status` -2. **Can this be one semantic query?** → Often yes -3. **Do you need exact matches?** → Use native tools with bypass flag - -**General guideline:** For understanding/investigation, try semantic search first. For exact matches, use native tools. - ---- +## Notes -**Maintained by:** MadAppGang -**Plugin:** code-analysis v2.16.0 -**Purpose:** Help optimize bulk file operations with semantic search alternatives +- Works with code-search-selector (determines tool), deep-analysis (uses good patterns), and detective skills +- If claudemem is not indexed, run `claudemem index -y` first +- Hook system may proactively provide claudemem results for grep and bulk read operations diff --git a/plugins/code-analysis/skills/ultrathink-detective/SKILL.md b/plugins/code-analysis/skills/ultrathink-detective/SKILL.md index 29ce3e0..34f9a75 100644 --- a/plugins/code-analysis/skills/ultrathink-detective/SKILL.md +++ b/plugins/code-analysis/skills/ultrathink-detective/SKILL.md @@ -1,892 +1,91 @@ --- name: ultrathink-detective -description: "⚡ Comprehensive analysis skill. Best for: 'comprehensive audit', 'deep analysis', 'full codebase review', 'multi-perspective investigation', 'complex questions'. Combines all perspectives (architect+developer+tester+debugger). Uses Opus model with full claudemem AST analysis." -allowed-tools: Bash, Task, Read, AskUserQuestion -model: opus +description: "Runs comprehensive multi-perspective codebase analysis using all claudemem AST commands (map, symbol, callers, callees, context, search). Covers architecture, implementation, testing, reliability, security, performance, and code health dimensions. Use when asked for a comprehensive audit, deep analysis, full codebase review, or multi-perspective investigation." --- -# Ultrathink Detective Skill +# Ultrathink Detective -This skill uses ALL claudemem commands for comprehensive multi-perspective investigation. +Senior Principal Engineer analysis combining all detective perspectives (architect, developer, tester, debugger) using Opus model with full claudemem AST analysis across seven dimensions. -## Combines All Detective Perspectives +## Workflow -| Perspective | Focus | Commands Used | -|-------------|-------|---------------| -| Architect | System design, layers | `map`, `symbol` | -| Developer | Implementation, flow | `callers`, `callees` | -| Tester | Coverage, gaps | `callers` for tests | -| Debugger | Root cause, chains | `context` | +1. **Verify setup** — confirm claudemem v0.3.0+ is installed and indexed. Check index freshness; reindex if stale files detected. +2. **Architecture mapping** — run `claudemem --agent map` to identify high-PageRank symbols (> 0.05) as architectural pillars. +3. **Critical path analysis** — for each pillar, run `symbol`, `callers`, `callees`, and `context` to trace dependencies and usage. +4. **Test coverage assessment** — check callers of critical functions for test file references. High PageRank + 0 test callers = critical gap. +5. **Risk identification** — analyze security symbols, error handling chains, and external integrations. +6. **Technical debt inventory** — search for TODO/FIXME, identify god classes (> 20 callees), find orphaned code. +7. **Code health check** (v0.4.0+) — run `dead-code` and `test-gaps` commands, categorize by PageRank impact. +8. **Generate report** — produce executive summary with per-dimension scores and prioritized action items. -**Full command set:** -- `claudemem --agent map "query"` - Architecture overview -- `claudemem --agent symbol ` - Exact locations -- `claudemem --agent callers ` - Impact analysis -- `claudemem --agent callees ` - Dependency tracing -- `claudemem --agent context ` - Full call chain -- `claudemem --agent search "query"` - Semantic search +## Seven Analysis Dimensions -# Ultrathink Detective Skill +| Dimension | Primary Command | Focus | +|-----------|----------------|-------| +| Architecture | `map` | Layers, core abstractions, PageRank | +| Implementation | `callers`/`callees` | Data flow, dependencies | +| Testing | `callers` (test files) | Coverage gaps | +| Reliability | `context` | Error handling chains | +| Security | `symbol` + `callers` | Auth flow, sensitive data | +| Performance | `search` | Database patterns, async, caching | +| Code Health | `dead-code`, `test-gaps` | Cleanup candidates, coverage | -**Version:** 3.3.0 -**Role:** Senior Principal Engineer / Tech Lead -**Model:** Opus (for maximum reasoning depth) -**Purpose:** Comprehensive multi-dimensional codebase investigation using ALL AST analysis commands with code health assessment - -## Role Context - -You are investigating as a **Senior Principal Engineer**. Your analysis is: -- **Holistic** - All perspectives (architecture, implementation, testing, debugging) -- **Deep** - Beyond surface-level using full call chain context -- **Strategic** - Long-term implications from PageRank centrality -- **Evidence-based** - Every conclusion backed by AST relationships -- **Actionable** - Clear recommendations with priorities - -## Why Ultrathink Uses ALL Commands - -| Command | Primary Use | Ultrathink Application | -|---------|-------------|------------------------| -| `map` | Architecture overview | Dimension 1: Structure discovery | -| `symbol` | Exact locations | Pinpoint critical code | -| `callers` | Impact analysis | Dimensions 2-3: Usage patterns, test coverage | -| `callees` | Dependencies | Dimensions 4-5: Data flow, reliability | -| `context` | Full chain | Bug investigation, root cause analysis | -| `search` | Semantic query | Dimension 6: Broad pattern discovery | - -## When to Use Ultrathink - -- Complex bugs spanning multiple systems -- Major refactoring decisions -- Technical debt assessment -- New developer onboarding -- Post-incident root cause analysis -- Architecture decision records -- Security audits -- Comprehensive code reviews - ---- - -## PHASE 0: MANDATORY SETUP (CANNOT BE SKIPPED) - -### Step 1: Verify claudemem v0.3.0 - -```bash -which claudemem && claudemem --version -# Must be 0.3.0+ -``` - -### Step 2: If Not Installed → STOP - -**DO NOT FALL BACK TO GREP.** Use AskUserQuestion: - -```typescript -AskUserQuestion({ - questions: [{ - question: "claudemem v0.3.0 (AST structural analysis) is required. Grep/find are NOT acceptable alternatives. How proceed?", - header: "Required", - multiSelect: false, - options: [ - { label: "Install via npm (Recommended)", description: "npm install -g claude-codemem" }, - { label: "Install via Homebrew", description: "brew tap MadAppGang/claude-mem && brew install --cask claudemem" }, - { label: "Cancel", description: "I'll install manually" } - ] - }] -}) -``` - -### Step 3: Check Index Status - -```bash -# Check claudemem installation and index -claudemem --version && ls -la .claudemem/index.db 2>/dev/null -``` - -### Step 3.5: Check Index Freshness - -Before proceeding with investigation, verify the index is current: - -```bash -# First check if index exists -if [ ! -d ".claudemem" ] || [ ! -f ".claudemem/index.db" ]; then - # Use AskUserQuestion to prompt for index creation - # Options: [1] Create index now (Recommended), [2] Cancel investigation - exit 1 -fi - -# Count files modified since last index -STALE_COUNT=$(find . -type f \( -name "*.ts" -o -name "*.tsx" -o -name "*.js" -o -name "*.jsx" -o -name "*.py" -o -name "*.go" -o -name "*.rs" \) \ - -newer .claudemem/index.db 2>/dev/null | grep -v "node_modules" | grep -v ".git" | grep -v "dist" | grep -v "build" | wc -l) -STALE_COUNT=$((STALE_COUNT + 0)) # Normalize to integer - -if [ "$STALE_COUNT" -gt 0 ]; then - # Get index time with explicit platform detection - if [[ "$OSTYPE" == "darwin"* ]]; then - INDEX_TIME=$(stat -f "%Sm" -t "%Y-%m-%d %H:%M" .claudemem/index.db 2>/dev/null) - else - INDEX_TIME=$(stat -c "%y" .claudemem/index.db 2>/dev/null | cut -d'.' -f1) - fi - INDEX_TIME=${INDEX_TIME:-"unknown time"} - - # Get sample of stale files - STALE_SAMPLE=$(find . -type f \( -name "*.ts" -o -name "*.tsx" \) \ - -newer .claudemem/index.db 2>/dev/null | grep -v "node_modules" | grep -v ".git" | head -5) - - # Use AskUserQuestion to ask user how to proceed - # Options: [1] Reindex now (Recommended), [2] Proceed with stale index, [3] Cancel -fi -``` - -**AskUserQuestion Template for Stale Index:** - -```typescript -AskUserQuestion({ - questions: [{ - question: `${STALE_COUNT} files have been modified since the last index (${INDEX_TIME}). The claudemem index may be outdated, which could cause missing or incorrect results. How would you like to proceed?`, - header: "Index Freshness Warning", - multiSelect: false, - options: [ - { - label: "Reindex now (Recommended)", - description: `Run claudemem index to update. Takes ~1-2 minutes. Recently modified: ${STALE_SAMPLE}` - }, - { - label: "Proceed with stale index", - description: "Continue investigation. May miss recent code changes." - }, - { - label: "Cancel investigation", - description: "I'll handle this manually." - } - ] - }] -}) -``` - -**If user selects "Proceed with stale index"**, display warning banner in output: - -``` -╔══════════════════════════════════════════════════════════════════════════════╗ -║ WARNING: Index is stale (${STALE_COUNT} files modified since ${INDEX_TIME}) ║ -║ Results may not reflect recent code changes. ║ -╚══════════════════════════════════════════════════════════════════════════════╝ -``` - -### Step 4: Index if Needed - -```bash -claudemem index -``` - ---- - -## Multi-Dimensional Analysis Framework (v0.3.0) - -### Dimension 1: Architecture (map command) - -```bash -# Get overall structure with PageRank -claudemem --agent map -# Focus on high-PageRank symbols (> 0.05) - these ARE the architecture - -# Layer identification -claudemem --agent map "controller handler endpoint" # Presentation -claudemem --agent map "service business logic" # Business -claudemem --agent map "repository database query" # Data - -# Pattern detection -claudemem --agent map "factory create builder"claudemem --agent map "interface abstract contract"claudemem --agent map "event emit subscribe"``` - -### Dimension 2: Implementation (callers/callees) - -```bash -# For high-PageRank symbols, trace dependencies -claudemem --agent callees PaymentService -# What calls critical code? -claudemem --agent callers processPayment -# Full dependency chain -claudemem --agent context OrderController``` - -### Dimension 3: Test Coverage (callers analysis) - -```bash -# Find tests for critical functions -claudemem --agent callers authenticateUser# Look for callers from *.test.ts or *.spec.ts - -# Map test infrastructure -claudemem --agent map "test spec describe it"claudemem --agent map "mock stub spy helper" -# Coverage gaps = functions with 0 test callers -claudemem --agent callers criticalFunction# If no test file callers → coverage gap -``` - -### Dimension 4: Reliability (context command) +## Example: Full Audit ```bash -# Error handling chains -claudemem --agent context handleError -# Exception flow -claudemem --agent map "throw error exception"claudemem --agent callers CustomError -# Recovery patterns -claudemem --agent map "retry fallback circuit"``` - -### Dimension 5: Security (symbol + callers) - -```bash -# Authentication -claudemem --agent symbol authenticateclaudemem --agent callees authenticateclaudemem --agent callers authenticate -# Authorization -claudemem --agent map "permission role check guard" -# Sensitive data -claudemem --agent map "password hash token secret"claudemem --agent callers encrypt``` - -### Dimension 6: Performance (semantic search) - -```bash -# Database patterns -claudemem --agent search "query database batch" -# Async patterns -claudemem --agent map "async await promise parallel" -# Caching -claudemem --agent map "cache memoize store"``` - -### Dimension 6: Performance Feedback Tracking (v0.8.0+) - -Ultrathink uses `search` in the Performance dimension. Track feedback for these searches: - -```bash -# Dimension 6: Performance (semantic search) -PERF_QUERY="query database batch" -PERF_RESULTS=$(claudemem --agent search "$PERF_QUERY") - -# Initialize tracking strings (POSIX-compatible) -PERF_HELPFUL="" -PERF_UNHELPFUL="" - -# During analysis, track results: -# When you read a result and it's useful for performance analysis: -PERF_HELPFUL="$PERF_HELPFUL,abc123" - -# When you read a result and it's not relevant: -PERF_UNHELPFUL="$PERF_UNHELPFUL,def456" - -# At end of investigation, report (v0.8.0+ only): -if claudemem feedback --help 2>&1 | grep -qi "feedback"; then - timeout 5 claudemem feedback \ - --query "$PERF_QUERY" \ - --helpful "${PERF_HELPFUL#,}" \ - --unhelpful "${PERF_UNHELPFUL#,}" \ - 2>/dev/null || true -fi -``` - -### Dimension 7: Code Health (v0.4.0+ Required) - -```bash -# Dead code detection -DEAD=$(claudemem --agent dead-code) - -if [ -n "$DEAD" ]; then - # Categorize: - # - High PageRank dead = Something broke (investigate) - # - Low PageRank dead = Cleanup candidate - echo "Dead Code Analysis:" - echo "$DEAD" -else - echo "No dead code found - excellent hygiene!" -fi - -# Test coverage gaps -GAPS=$(claudemem --agent test-gaps) - -if [ -n "$GAPS" ]; then - # Impact analysis for high-PageRank gaps - echo "Test Gap Analysis:" - echo "$GAPS" - - # For critical gaps, show full impact - for symbol in $(echo "$GAPS" | grep "pagerank: 0.0[5-9]" | awk '{print $4}'); do - echo "Impact for critical untested: $symbol" - claudemem --agent impact "$symbol" done -else - echo "No test gaps found - excellent coverage!" -fi -``` - ---- - -## Comprehensive Analysis Workflow (v0.3.0) - -### Phase 1: Architecture Mapping (10 min) - -```bash -# Get structural overview with PageRank +# Phase 1: Architecture claudemem --agent map -# Document high-PageRank symbols (> 0.05) -# These are architectural pillars - understand first - -# Map each layer -claudemem --agent map "controller route endpoint"claudemem --agent map "service business domain"claudemem --agent map "repository data persist"``` +claudemem --agent map "controller handler endpoint" +claudemem --agent map "service business logic" -### Phase 2: Critical Path Analysis (15 min) - -```bash -# For each high-PageRank symbol: - -# 1. Get exact location +# Phase 2: Critical paths claudemem --agent symbol PaymentService -# 2. Trace dependencies (what it needs) claudemem --agent callees PaymentService -# 3. Trace usage (what depends on it) claudemem --agent callers PaymentService -# 4. Full context for complex ones -claudemem --agent context PaymentService``` - -### Phase 3: Test Coverage Assessment (10 min) - -```bash -# For each critical function, check callers -claudemem --agent callers processPaymentclaudemem --agent callers authenticateUserclaudemem --agent callers updateProfile -# Count: -# - Test callers (from *.test.ts, *.spec.ts) -# - Production callers - -# High PageRank + 0 test callers = CRITICAL GAP -``` - -### Phase 4: Risk Identification (10 min) - -```bash -# Security symbols -claudemem --agent map "auth session token"claudemem --agent callers validateToken -# Error handling -claudemem --agent map "error exception throw"claudemem --agent context handleFailure -# External integrations -claudemem --agent map "API external webhook"claudemem --agent callers stripeClient``` - -### Phase 5: Technical Debt Inventory (10 min) - -```bash -# Deprecated patterns -claudemem --agent search "TODO FIXME deprecated" -# Complexity indicators (high PageRank but many callees) -claudemem --agent callees LargeService# > 20 callees = potential god class - -# Orphaned code (low PageRank, 0 callers) -claudemem --agent callers unusedFunction``` - ---- - -## Output Format: Comprehensive Report (v0.3.0) - -### Executive Summary - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ CODEBASE COMPREHENSIVE ANALYSIS (v0.3.0) │ -├─────────────────────────────────────────────────────────────────┤ -│ Overall Health: 🟡 MODERATE (7.2/10) │ -│ Search Method: claudemem v0.3.0 (AST + PageRank) │ -│ │ -│ Dimensions: │ -│ ├── Architecture: 🟢 GOOD (8/10) [map analysis] │ -│ ├── Implementation: 🟡 MODERATE (7/10) [callers/callees] │ -│ ├── Testing: 🔴 POOR (5/10) [test-gaps] │ -│ ├── Reliability: 🟢 GOOD (8/10) [context tracing] │ -│ ├── Security: 🟡 MODERATE (7/10) [auth callers] │ -│ ├── Performance: 🟢 GOOD (8/10) [async patterns] │ -│ └── Code Health: 🟡 MODERATE (6/10) [dead-code + impact] │ -│ │ -│ Critical: 3 | Major: 7 | Minor: 15 │ -│ │ -│ Search Feedback: │ -│ └── Performance queries: 2 submitted │ -│ └── Helpful results: 5 │ -│ └── Unhelpful results: 3 │ -└─────────────────────────────────────────────────────────────────┘ -``` - -### Dimension 1: Architecture (from map) - -``` -Core Abstractions (PageRank > 0.05): -├── UserService (0.092) - Central business logic -├── Database (0.078) - Data access foundation -├── AuthMiddleware (0.056) - Security boundary -└── EventBus (0.051) - Cross-cutting concerns - -Layer Structure: -┌─────────────────────────────────────────────────────────┐ -│ PRESENTATION (src/controllers/) │ -│ └── UserController (0.034) │ -│ └── AuthController (0.028) │ -│ ↓ │ -│ BUSINESS (src/services/) │ -│ └── UserService (0.092) ⭐HIGH PAGERANK │ -│ └── AuthService (0.067) │ -│ ↓ │ -│ DATA (src/repositories/) │ -│ └── UserRepository (0.045) │ -│ └── Database (0.078) ⭐HIGH PAGERANK │ -└─────────────────────────────────────────────────────────┘ -``` - -### Dimension 2: Implementation (from callers/callees) - -``` -Critical Data Flows: -processPayment (PageRank: 0.045) -├── CALLEES (dependencies): -│ ├── validateCard → stripeClient.validateCard -│ ├── getCustomer → Database.query -│ ├── chargeStripe → stripeClient.charge -│ └── saveTransaction → TransactionRepository.save -│ -└── CALLERS (usage): - ├── CheckoutController.submit:45 - ├── SubscriptionService.renew:89 - └── RetryQueue.processPayment:23 -``` - -### Dimension 3: Test Coverage (from callers) - -``` -| Function | Test Callers | Prod Callers | Coverage | -|---------------------|--------------|--------------|----------| -| authenticateUser | 5 | 12 | ✅ Good | -| processPayment | 3 | 8 | ✅ Good | -| calculateDiscount | 0 | 4 | ❌ None | -| sendEmail | 1 | 6 | ⚠️ Low | -| updateUserProfile | 0 | 3 | ❌ None | - -🔴 CRITICAL GAPS (high PageRank + 0 test callers): - └── calculateDiscount (PageRank: 0.034) - └── callers: 4 production, 0 tests -``` - -### Dimension 4: Reliability (from context) - -``` -Error Handling Chain: - -handleAuthError (context analysis): -├── Defined: src/middleware/auth.ts:45 -├── CALLERS (error sources): -│ ├── validateToken:23 → throws on invalid -│ ├── refreshSession:67 → throws on expired -│ └── checkPermission:89 → throws on denied -└── CALLEES (error handling): - ├── logError → Logger.error - ├── notifyAdmin → AlertService.send (if critical) - └── formatResponse → ErrorFormatter.toJSON -``` - -### Dimension 5: Security (from symbol + callers) - -``` -Authentication Flow: - -authenticate (PageRank: 0.067) -├── Location: src/services/auth.ts:23-67 -├── CALLEES: -│ ├── bcrypt.compare (password verification) -│ ├── jwt.sign (token generation) -│ └── SessionStore.create (session persistence) -└── CALLERS (entry points): - ├── LoginController.login:12 ✅ - ├── OAuthController.callback:45 ✅ - └── APIMiddleware.verify:23 ⚠️ (rate limiting?) -``` - -### Dimension 6: Performance (from map + callees) - -``` -Database Access Patterns: - -UserRepository.findWithRelations (PageRank: 0.028) -├── CALLEES: -│ ├── Database.query (1 call) -│ ├── RelationLoader.load (per relation) ⚠️ N+1? -│ └── Cache.get (optimization) -└── CALLERS: 8 locations - └── 3 in loops ⚠️ Potential N+1 - -Recommendation: Batch relation loading or use joins -``` - ---- - -## Action Items (Prioritized by PageRank Impact) - -``` -🔴 IMMEDIATE (This Sprint) - Affects High-PageRank Code - - 1. Add tests for calculateDiscount (PageRank: 0.034) - └── callers show: 4 production uses, 0 tests - - 2. Fix N+1 query in UserRepository.findWithRelations - └── callees show: RelationLoader called per item - - 3. Add rate limiting to APIMiddleware.verify - └── callers show: All API endpoints exposed - -🟠 SHORT-TERM (Next 2 Sprints) - - 4. Add error recovery to PaymentService - └── context shows: No retry on Stripe failures - - 5. Increase test coverage for AuthService - └── callers show: Only 2 test files cover critical code - -🟡 MEDIUM-TERM (This Quarter) - - 6. Refactor UserService (PageRank: 0.092) - └── callees show: 23 dependencies (god class pattern) - - 7. Add observability to EventBus - └── callers show: 15 publishers, no monitoring -``` - ---- - -## Result Validation Pattern - -After EVERY claudemem command, validate results to ensure quality: - -### Validation Per Dimension - -Each dimension MUST validate its claudemem results before proceeding: - -**Dimension 1: Architecture (map)** - -```bash -RESULTS=$(claudemem --agent map) -EXIT_CODE=$? - -# Check for command failure -if [ "$EXIT_CODE" -ne 0 ]; then - echo "ERROR: claudemem map failed" - # Diagnose and ask user (see Fallback Protocol below) - exit 1 -fi - -# Check for empty results -if [ -z "$RESULTS" ]; then - echo "WARNING: No architectural symbols found - index may be empty" - # Ask user to reindex or cancel -fi - -# Validate PageRank values present -if ! echo "$RESULTS" | grep -q "pagerank:"; then - echo "WARNING: No PageRank data - index may be corrupted or outdated" - # Ask user to reindex -fi -``` - -**Dimension 2-6: All Other Commands** - -```bash -RESULTS=$(claudemem --agent [command] [args]) -EXIT_CODE=$? - -# Check exit code -if [ "$EXIT_CODE" -ne 0 ]; then - # Diagnose index health - DIAGNOSIS=$(claudemem --version && ls -la .claudemem/index.db 2>&1) - # Use AskUserQuestion for recovery options -fi - -# Check for empty/irrelevant results -# Extract keywords from the user's investigation query -# Example: QUERY="how does auth work" → KEYWORDS="auth work authentication" -# The orchestrating agent must populate KEYWORDS before this check -MATCH_COUNT=0 -for kw in $KEYWORDS; do - if echo "$RESULTS" | grep -qi "$kw"; then - MATCH_COUNT=$((MATCH_COUNT + 1)) - fi -done - -if [ "$MATCH_COUNT" -eq 0 ]; then - # Results don't match query - potentially irrelevant - # Use AskUserQuestion (see Fallback Protocol) -fi -``` - -**Dimension 3: Test Coverage (callers)** - -```bash -RESULTS=$(claudemem --agent callers $FUNCTION) +# Phase 3: Test coverage +claudemem --agent callers authenticateUser +# Check: are any callers from *.test.ts or *.spec.ts? -# Even 0 callers is valid - but validate it's not an error -if echo "$RESULTS" | grep -qi "error\|not found"; then - # Actual error vs no callers - # Use AskUserQuestion -fi +# Phase 4: Code health (v0.4.0+) +claudemem --agent dead-code +claudemem --agent test-gaps ``` ---- - -## FALLBACK PROTOCOL - -**CRITICAL: Never use grep/find/Glob without explicit user approval.** - -``` -╔══════════════════════════════════════════════════════════════════════════════╗ -║ ║ -║ FALLBACK PROTOCOL (NEVER SILENT) ║ -║ ║ -║ If claudemem fails OR returns irrelevant results: ║ -║ ║ -║ 1. STOP - Do not silently switch to grep/find ║ -║ 2. DIAGNOSE - Run claudemem status to check index health ║ -║ 3. COMMUNICATE - Tell user what happened ║ -║ 4. ASK - Get explicit user permission via AskUserQuestion ║ -║ ║ -║ grep/find/Glob ARE FORBIDDEN without explicit user approval ║ -║ ║ -╚══════════════════════════════════════════════════════════════════════════════╝ -``` - -### Fallback Decision Tree - -If claudemem fails or returns unexpected results: - -1. **STOP** - Do not silently switch tools -2. **DIAGNOSE** - Run `claudemem status` -3. **REPORT** - Tell user what happened -4. **ASK** - Use AskUserQuestion for next steps - -```typescript -// Fallback AskUserQuestion Template -AskUserQuestion({ - questions: [{ - question: "claudemem [command] failed or returned irrelevant results. How should I proceed?", - header: "Investigation Issue", - multiSelect: false, - options: [ - { label: "Reindex codebase", description: "Run claudemem index (~1-2 min)" }, - { label: "Try different query", description: "Rephrase the search" }, - { label: "Use grep (not recommended)", description: "Traditional search - loses semantic understanding" }, - { label: "Cancel", description: "Stop investigation" } - ] - }] -}) -``` - -### Grep Fallback Warning - -If user explicitly chooses grep fallback, display this warning: - -```markdown -## WARNING: Using Fallback Search (grep) - -You have chosen to use grep as a fallback. Please understand the limitations: - -| Feature | claudemem | grep | -|---------|-----------|------| -| Semantic understanding | Yes | No | -| Call graph analysis | Yes | No | -| Symbol relationships | Yes | No | -| PageRank ranking | Yes | No | -| False positives | Low | High | - -**Recommendation:** After completing this task, run `claudemem index` to rebuild -the index for future investigations. - -Proceeding with grep... -``` - ---- - -## 🚫 FORBIDDEN: DO NOT USE - -```bash -# ❌ ALL OF THESE ARE FORBIDDEN -grep -r "pattern" . -rg "pattern" -find . -name "*.ts" -git grep "term" -Glob({ pattern: "**/*.ts" }) -Grep({ pattern: "function" }) -``` - -## ✅ REQUIRED: ALWAYS USE - -```bash -# ✅ claudemem v0.3.0 AST Commands -claudemem --agent map "query" # Architecture -claudemem --agent symbol # Location -claudemem --agent callers # Impact -claudemem --agent callees # Dependencies -claudemem --agent context # Full chain -claudemem --agent search "query" # Semantic -``` - ---- - -## CRITICAL: NEVER TRUNCATE CLAUDEMEM OUTPUT - -╔══════════════════════════════════════════════════════════════════════════════╗ -║ ║ -║ ⛔ OUTPUT TRUNCATION IS FORBIDDEN ║ -║ ║ -║ claudemem output is ALREADY OPTIMIZED for LLM context windows. ║ -║ Truncating it may hide the most critical results. ║ -║ ║ -║ ❌ NEVER DO THIS (any form of output truncation): ║ -║ claudemem --agent map "query" | head -80 ║ -║ claudemem --agent callers UserService | head -100 ║ -║ claudemem --agent callees Func | tail -50 ║ -║ claudemem --agent impact Svc | head -N ║ -║ claudemem --agent search "auth" | grep -m 10 "pattern" ║ -║ claudemem --agent map "q" | awk 'NR <= 50' ║ -║ claudemem --agent callers X | sed '50q' ║ -║ claudemem --agent search "x" | sort | head -20 ║ -║ claudemem --agent map "q" | grep "pattern" | head -20 ║ -║ ║ -║ WHY `tail` IS EQUALLY PROBLEMATIC: ║ -║ `tail` skips the BEGINNING of output, which often contains: ║ -║ • Summary headers showing total counts ║ -║ • Highest-ranked results (PageRank, relevance score) ║ -║ • Context that explains what follows ║ -║ ║ -║ ✅ ALWAYS DO THIS: ║ -║ claudemem --agent map "query" ║ -║ claudemem --agent callers UserService ║ -║ claudemem --agent callees Func ║ -║ claudemem --agent impact Svc ║ -║ claudemem --agent search "auth" -n 10 # Use built-in limit ║ -║ ║ -║ WHY THIS MATTERS: ║ -║ • search results are sorted by relevance - truncating loses best matches ║ -║ • map results are sorted by PageRank - truncating loses core architecture ║ -║ • callers/callees show ALL dependencies - truncating causes missed changes ║ -║ • impact shows full blast radius - truncating underestimates risk ║ -║ ║ -║ ═══════════════════════════════════════════════════════════════════════ ║ -║ IF OUTPUT IS TOO LARGE, USE BUILT-IN FLAGS: ║ -║ ═══════════════════════════════════════════════════════════════════════ ║ -║ ║ -║ --tokens N Token-limited output (respects LLM context) ║ -║ Example: claudemem --agent map "query" --tokens 2000 ║ -║ ║ -║ --page-size N Pagination with N results per page ║ -║ --page N Fetch specific page number ║ -║ Example: claudemem --agent search "x" --page-size 20 --page 1║ -║ ║ -║ -n N Limit result count at query level (not post-hoc) ║ -║ Example: claudemem --agent search "auth" -n 10 ║ -║ ║ -║ --max-depth N Limit traversal depth (for context, callers, impact) ║ -║ Example: claudemem --agent context Func --max-depth 3 ║ -║ ║ -║ ACCEPTABLE: Piping to file for later analysis ║ -║ claudemem --agent map "query" > /tmp/full-map.txt ║ -║ (Full output preserved, can be processed separately) ║ -║ ║ -╚══════════════════════════════════════════════════════════════════════════════╝ - -NOTE: The freshness check pattern `head -5` for sampling stale files remains valid. - This prohibition applies only to truncating claudemem COMMAND OUTPUT. - ---- - -## Feedback Reporting (v0.8.0+) - -After completing investigation, report search feedback to improve future results. - -### When to Report - -Report feedback ONLY if you used the `search` command during investigation: - -| Result Type | Mark As | Reason | -|-------------|---------|--------| -| Read and used | Helpful | Contributed to investigation | -| Read but irrelevant | Unhelpful | False positive | -| Skipped after preview | Unhelpful | Not relevant to query | -| Never read | (Don't track) | Can't evaluate | - -### Feedback Pattern - -```bash -# Track during investigation -SEARCH_QUERY="your original query" -HELPFUL_IDS="" -UNHELPFUL_IDS="" - -# When reading a helpful result -HELPFUL_IDS="$HELPFUL_IDS,$result_id" +**Verification:** Confirm every conclusion is backed by specific claudemem output with file:line references. -# When reading an unhelpful result -UNHELPFUL_IDS="$UNHELPFUL_IDS,$result_id" +## Output Format -# Report at end of investigation (v0.8.0+ only) -if claudemem feedback --help 2>&1 | grep -qi "feedback"; then - timeout 5 claudemem feedback \ - --query "$SEARCH_QUERY" \ - --helpful "${HELPFUL_IDS#,}" \ - --unhelpful "${UNHELPFUL_IDS#,}" 2>/dev/null || true -fi ``` +CODEBASE COMPREHENSIVE ANALYSIS +Overall Health: [score]/10 -### Output Update +Dimensions: + Architecture: [score] [map analysis] + Implementation: [score] [callers/callees] + Testing: [score] [test-gaps] + Reliability: [score] [context tracing] + Security: [score] [auth callers] + Performance: [score] [async patterns] + Code Health: [score] [dead-code + impact] -Include in investigation report: - -``` -Search Feedback: [X helpful, Y unhelpful] - Submitted (v0.8.0+) +Action Items (by PageRank impact): + IMMEDIATE: [high-PageRank critical gaps] + SHORT-TERM: [important improvements] + MEDIUM-TERM: [tech debt cleanup] ``` ---- - -## Cross-Plugin Integration +## Fallback Protocol -This skill should be used by ANY agent that needs deep analysis: +Never use grep/find/Glob without explicit user approval. If claudemem fails: -| Agent Type | Should Use | From Plugin | -|------------|-----------|-------------| -| `frontend-architect` | `ultrathink-detective` | frontend | -| `api-architect` | `ultrathink-detective` | bun | -| `senior-code-reviewer` | `ultrathink-detective` | frontend | -| Any architect agent | `ultrathink-detective` | any | - -**Agents reference this skill in their frontmatter:** -```yaml ---- -skills: code-analysis:ultrathink-detective ---- -``` +1. **Stop** — do not silently switch tools. +2. **Diagnose** — run `claudemem status`. +3. **Ask user** via AskUserQuestion for next steps (reindex, different query, or cancel). ---- - -## ⚠️ FINAL REMINDER - -``` -╔══════════════════════════════════════════════════════════════════════════════╗ -║ ║ -║ ULTRATHINK = ALL claudemem v0.3.0 AST COMMANDS ║ -║ ║ -║ WORKFLOW: ║ -║ 1. claudemem --agent map ← Architecture (PageRank) ║ -║ 2. claudemem --agent symbol ← Exact locations ║ -║ 3. claudemem --agent callers ← Impact analysis ║ -║ 4. claudemem --agent callees ← Dependencies ║ -║ 5. claudemem --agent context ← Full call chain ║ -║ 6. claudemem --agent search ← Semantic search ║ -║ 7. Read specific file:line (NOT whole files) ║ -║ 8. claudemem feedback ... ← Report helpful/unhelpful (if search used) ║ -║ ║ -║ ❌ grep, find, rg, Glob, Grep tool ║ -║ ║ -║ PageRank > 0.05 = Architectural pillar = Analyze FIRST ║ -║ High PageRank + 0 test callers = CRITICAL coverage gap ║ -║ Performance dimension uses search → Track feedback for Dimension 6 ║ -║ ║ -╚══════════════════════════════════════════════════════════════════════════════╝ -``` - ---- +## Notes -**Maintained by:** MadAppGang -**Plugin:** code-analysis v2.7.0 -**Last Updated:** December 2025 (v3.4.0 - Search feedback protocol support) +- Never truncate claudemem output — use built-in flags (`-n`, `--tokens`, `--max-depth`) instead +- PageRank > 0.05 = architectural pillar, analyze first +- Submit search feedback via `claudemem feedback` (v0.8.0+) after investigation +- Works best with TypeScript, Go, Python, Rust codebases diff --git a/plugins/conductor/skills/help/SKILL.md b/plugins/conductor/skills/help/SKILL.md index 43c2d9a..ae546d6 100644 --- a/plugins/conductor/skills/help/SKILL.md +++ b/plugins/conductor/skills/help/SKILL.md @@ -1,67 +1,47 @@ --- name: help -description: Get help with Conductor - commands, usage examples, and best practices -version: 1.0.0 -tags: [conductor, help, documentation, guide] -keywords: [help, guide, usage, commands, reference] +description: "Displays Conductor commands, usage examples, directory structure, and troubleshooting tips. Provides quick-start guide and best practices for context-driven development. Use when asking how Conductor works or what commands are available." --- -plugin: conductor -updated: 2026-01-20 # Conductor Help -Conductor implements Context-Driven Development for Claude Code. +Conductor implements Context-Driven Development for Claude Code - your project context (goals, tech stack, workflow) is documented and maintained alongside your code. -## Philosophy +## Workflow -**Context as a Managed Artifact:** -Your project context (goals, tech stack, workflow) is documented and maintained alongside your code. This context guides all development work. +1. **Identify user need** - determine if user wants command reference, troubleshooting help, or conceptual guidance +2. **Present relevant section** - show the specific information requested rather than the entire reference +3. **Suggest next action** - recommend the appropriate Conductor command to run -**Pre-Implementation Planning:** -Before coding, create a spec (WHAT) and plan (HOW). This ensures clear direction and traceable progress. +## Available Commands -**Safe Iteration:** -Human approval gates at key points. Git-linked commits for traceability. Easy rollback when needed. +| Command | Purpose | +|---------|---------| +| `conductor:setup` | Initialize Conductor - creates `conductor/` with product.md, tech-stack.md, workflow.md | +| `conductor:new-track` | Create a development track with spec.md and hierarchical plan.md | +| `conductor:implement` | Execute tasks from plan with TDD workflow and git commits | +| `conductor:status` | View progress, current tasks, and blockers across all tracks | +| `conductor:revert` | Git-aware logical undo at track, phase, or task level | +| `conductor:help` | Show this reference | -## Available Skills - -### conductor:setup -Initialize Conductor for your project. -- Creates conductor/ directory structure -- Generates product.md, tech-stack.md, workflow.md -- Interactive Q&A with resume capability +## Quick Start -### conductor:new-track -Create a new development track. -- Generates spec.md with requirements -- Creates hierarchical plan.md (phases -> tasks) -- Updates tracks.md index +```bash +# 1. Initialize project context +conductor:setup -### conductor:implement -Execute tasks from your plan. -- Status progression: [ ] -> [~] -> [x] -- Git commits linked to track/task -- Follows workflow.md procedures +# 2. Plan your first feature +conductor:new-track -### conductor:status -View project progress. -- Overall completion percentage -- Current task and blockers -- Multi-track overview +# 3. Start implementing +conductor:implement -### conductor:revert -Git-aware logical undo. -- Revert at Track, Phase, or Task level -- Preview before executing -- State validation after revert +# 4. Check progress anytime +conductor:status -## Quick Start - -1. **Initialize:** Run `conductor:setup` to create context files -2. **Plan:** Run `conductor:new-track` to create your first track -3. **Implement:** Run `conductor:implement` to start working -4. **Check:** Run `conductor:status` to see progress -5. **Undo:** Run `conductor:revert` if you need to roll back +# 5. Roll back if needed +conductor:revert +``` ## Directory Structure @@ -80,24 +60,16 @@ conductor/ ## Best Practices -1. **Keep Context Updated:** Review product.md and tech-stack.md periodically -2. **One Task at a Time:** Focus on completing tasks fully before moving on -3. **Commit Often:** Each task should result in at least one commit -4. **Use Blockers:** Mark tasks as [!] blocked rather than skipping silently -5. **Review Before Proceeding:** Use phase gates to verify quality +1. **Keep context updated** - review product.md and tech-stack.md periodically +2. **One task at a time** - complete tasks fully before moving on +3. **Commit often** - each task should produce at least one commit +4. **Use blockers** - mark tasks as `[!]` blocked rather than skipping silently +5. **Review at phase gates** - verify quality before proceeding to next phase ## Troubleshooting -**"Conductor not initialized"** -Run `conductor:setup` to initialize the conductor/ directory. - -**"Track not found"** -Check tracks.md for available tracks. Track IDs are case-sensitive. - -**"Revert failed"** -Check for uncommitted changes. Commit or stash before reverting. - -## Getting Help - -Use `conductor:help` anytime for this reference. -For issues, check the project documentation or file an issue. +| Error | Solution | +|-------|----------| +| "Conductor not initialized" | Run `conductor:setup` to create the conductor/ directory | +| "Track not found" | Check `tracks.md` for available tracks (IDs are case-sensitive) | +| "Revert failed" | Commit or stash uncommitted changes before reverting | diff --git a/plugins/conductor/skills/implement/SKILL.md b/plugins/conductor/skills/implement/SKILL.md index 2aec3f7..4ff46f5 100644 --- a/plugins/conductor/skills/implement/SKILL.md +++ b/plugins/conductor/skills/implement/SKILL.md @@ -1,267 +1,51 @@ --- name: implement -description: Execute tasks from track plan with TDD workflow and git commit integration -version: 1.1.0 -tags: [conductor, implement, execute, tasks, git, tdd] -keywords: [implement, execute, task, commit, progress, workflow, tdd, phase] +description: "Executes tasks from a track plan using TDD red/green/refactor workflow with git commit integration. Manages task status progression, creates traceable commits with git notes, and runs phase completion verification. Use when ready to start coding tasks from a plan." --- -plugin: conductor -updated: 2026-01-20 - - Implementation Guide & Progress Tracker - - - Task execution and status management - - TDD workflow (Red/Green/Refactor) - - Git commit integration with track references - - Git Notes for audit trail - - Workflow.md procedure following - - Phase Completion Verification Protocol - - Progress tracking and reporting - - - Guide systematic implementation of track tasks using TDD methodology, - maintaining clear status visibility, creating traceable git commits - with notes, following established workflow procedures, and executing - the Phase Completion Protocol at phase boundaries. - - +# Conductor Implement - - - - Use Tasks to mirror plan.md tasks. - Keep Tasks and plan.md in sync. - Mark tasks in BOTH when status changes. - +Guides systematic implementation of track tasks using TDD methodology, maintaining clear status visibility, creating traceable git commits, and executing phase completion verification at phase boundaries. - - Task status MUST follow this progression: - - [ ] (pending) - Not started - - [~] (in_progress) - Currently working - - [x] (complete) - Finished - - [!] (blocked) - Blocked by issue +## Workflow - Only ONE task can be [~] at a time. - +1. **Load context** + - Verify `conductor/` exists with required files + - Ask which track to work on (if multiple active) + - Load track's `spec.md`, `plan.md`, and `conductor/workflow.md` - - Follow Test-Driven Development for each task: +2. **Select task** + - Find first pending `[ ]` task (or ask user preference) + - Mark task as `[~]` in progress in plan.md + - Only ONE task can be `[~]` at a time - **Red Phase:** - 1. Create test file for the feature - 2. Write tests defining expected behavior - 3. Run tests - confirm they FAIL - 4. Do NOT proceed until tests fail +3. **TDD implementation cycle** + - **Red:** Write failing tests for the task, run tests to confirm they FAIL + - **Green:** Write minimum code to pass tests, run tests to confirm they PASS + - **Refactor:** Improve code quality, run tests to confirm they still PASS + - Verify coverage meets >80% requirement - **Green Phase:** - 1. Write MINIMUM code to pass tests - 2. Run tests - confirm they PASS - 3. No refactoring yet +4. **Commit and update** + - Run quality checks (lint, typecheck, test) + - Stage changes and commit with format: `(): ` + - Add git note: `git notes add -m "Task: {phase}.{task} - {title}\nSummary: ...\nFiles Changed: ..."` + - Mark task as `[x]` in plan.md, update `metadata.json` with commit SHA - **Refactor Phase:** - 1. Improve code clarity and performance - 2. Remove duplication - 3. Run tests - confirm they still PASS - +5. **Phase transition check** + - If phase incomplete, continue to next pending task + - If phase complete, execute Phase Completion Protocol (see below) - - After completing each task: - 1. Stage relevant changes - 2. Commit with proper format: - ``` - (): +## Task Status Progression - - Detail 1 - - Detail 2 +| Symbol | Status | Meaning | +|--------|--------|---------| +| `[ ]` | pending | Not started | +| `[~]` | in_progress | Currently working (only one at a time) | +| `[x]` | complete | Finished and committed | +| `[!]` | blocked | Blocked by issue (add note with reason) | - Task: {phase}.{task} - ``` - 3. Attach git note with task summary: - ```bash - git notes add -m "Task: {phase}.{task} - {title} +## Commit Message Format - Summary: {what was accomplished} - - Files Changed: - - {file1}: {description} - - Why: {business reason}" $(git log -1 --format="%H") - ``` - 4. Update metadata.json with commit SHA - - - - | Type | Use For | - |------|---------| - | feat | New feature | - | fix | Bug fix | - | docs | Documentation | - | style | Formatting | - | refactor | Code restructuring | - | test | Adding tests | - | chore | Maintenance | - | perf | Performance | - - - - ALWAYS follow procedures in conductor/workflow.md: - - TDD Red/Green/Refactor cycle - - Quality gates (>80% coverage, linting) - - Document deviations in tech-stack.md - - Phase Completion Protocol at phase end - - - - Pause and ask for user approval: - - Before starting each new phase - - When encountering blockers - - Before marking phase complete - - During Phase Completion Protocol Step 5 - - - - - - Focus on exactly one task. - Complete it fully before moving to next. - No partial implementations. - - - - Write failing tests BEFORE implementation. - This is the Red phase of TDD. - Never skip this step. - - - - Update plan.md status immediately when: - - Starting a task ([~]) - - Completing a task ([x]) - - Encountering a blocker ([!] with note) - - - - Every commit links to track/task. - Commit messages follow type convention. - Git notes provide audit trail. - - - - - - Check conductor/ exists with required files - Ask which track to work on (if multiple active) - Load track's spec.md and plan.md - Load conductor/workflow.md for procedures - Initialize Tasks from plan.md tasks - - - - Find first pending task (or ask user) - Mark task as [~] in_progress in plan.md - TaskUpdate to match - Read task requirements and context - - - - **Red Phase:** Write failing tests for the task - Run tests, confirm they FAIL - **Green Phase:** Write minimum code to pass - Run tests, confirm they PASS - **Refactor Phase:** Improve code quality - Run tests, confirm they still PASS - Verify coverage meets >80% requirement - - - - Run all quality checks (lint, typecheck, test) - If checks fail, fix before proceeding - Stage relevant file changes - Create commit with proper type and message - Add git note with task summary - Mark task as [x] complete in plan.md - Commit plan.md update separately - Update metadata.json with commit info - TaskUpdate to match - - - - Check if phase is complete (all tasks [x]) - If NOT complete, continue to next pending task - If phase IS complete, execute Phase Completion Protocol - - - - - **Execute when all tasks in a phase are [x]:** - - 1. **Announce Protocol Start** - Inform user: "Phase {N} complete. Starting verification protocol." - - 2. **Ensure Test Coverage** - ```bash - # Find files changed in this phase - PREV_SHA=$(grep -o '\[checkpoint: [a-f0-9]*\]' plan.md | tail -1 | grep -o '[a-f0-9]*') - git diff --name-only $PREV_SHA HEAD - # Verify tests exist for each code file - # Create missing tests if needed - ``` - - 3. **Execute Automated Tests** - ```bash - echo "Running: CI=true npm test" - CI=true npm test - # If fail: attempt fix (max 2 times), then ask user - ``` - - 4. **Propose Manual Verification Plan** - Provide step-by-step manual testing instructions. - Include specific commands and expected outcomes. - - 5. **Await User Confirmation** - Ask: "Does this meet your expectations? Confirm with 'yes' or provide feedback." - **PAUSE** - do not proceed without explicit yes. - - 6. **Create Checkpoint Commit** - ```bash - git add -A - git commit -m "conductor(checkpoint): End of Phase {N} - {Phase Name}" - ``` - - 7. **Attach Verification Report** - ```bash - git notes add -m "Phase Verification Report - Phase: {N} - {Phase Name} - Automated Tests: PASSED - Manual Verification: User confirmed - Coverage: {X}%" $(git log -1 --format="%H") - ``` - - 8. **Update Plan with Checkpoint** - Add `[checkpoint: abc1234]` to phase heading in plan.md. - - 9. **Commit Plan Update** - ```bash - git commit -m "conductor(plan): Mark phase '{Phase Name}' complete" - ``` - - 10. **Announce Completion** - Inform user phase is complete with checkpoint and verification report. - - - - - - | Symbol | Status | Meaning | - |--------|--------|---------| - | [ ] | pending | Not started | - | [~] | in_progress | Currently working | - | [x] | complete | Finished | - | [!] | blocked | Blocked by issue | - - - ``` (): @@ -271,194 +55,60 @@ updated: 2026-01-20 Task: {phase}.{task} ({task_title}) ``` - Example: -``` -feat(auth): Implement password hashing +Valid types: `feat`, `fix`, `docs`, `style`, `refactor`, `test`, `chore`, `perf` -- Added bcrypt dependency -- Created hashPassword utility function -- Added unit tests for hashing +## Phase Completion Protocol -Task: 2.1 (Implement password hashing) -``` - - - -``` -Task: {phase}.{task} - {task_title} - -Summary: {what was accomplished} - -Files Changed: -- {file1}: {description} -- {file2}: {description} - -Why: {business reason for this change} -``` - +Execute when all tasks in a phase are `[x]`: - - When encountering a blocker: - 1. Mark task as [!] blocked in plan.md - 2. Add note describing blocker: - ```markdown - - [!] 2.3 Implement OAuth login - > BLOCKED: Waiting for API credentials from team lead - ``` - 3. Ask user for guidance - 4. Either resolve or skip to different task - 5. Track blocker in metadata.json - +1. Announce: "Phase {N} complete. Starting verification protocol." +2. Check test coverage for all files changed in this phase +3. Run full test suite: `CI=true npm test` +4. Present manual verification steps to user +5. **PAUSE** - await explicit user confirmation before proceeding +6. Create checkpoint commit: `conductor(checkpoint): End of Phase {N}` +7. Attach verification report via git notes +8. Add `[checkpoint: {sha}]` to phase heading in plan.md - - If implementation differs from tech-stack.md: - 1. STOP implementation - 2. Update tech-stack.md with new design - 3. Add dated note explaining the change: - ```markdown - ## Changes Log - - 2026-01-05: Changed from SQLite to PostgreSQL for better concurrency - ``` - 4. Resume implementation - - +## Human Approval Gates - - - Start implementing the auth feature - - 1. Load feature_auth_20260105 track - 2. Read plan.md - find first pending task: 1.1 Create user table - 3. Mark 1.1 as [~] in plan.md - 4. Initialize Tasks with plan tasks +Pause and ask for user approval: +- Before starting each new phase +- When encountering blockers +- Before marking phase complete (step 5 of Protocol) - **Red Phase:** - 5. Create test file: tests/user-table.test.ts - 6. Write tests for user table schema - 7. Run tests - confirm they FAIL +## Examples - **Green Phase:** - 8. Create migration file - 9. Define user table schema - 10. Run tests - confirm they PASS - - **Refactor Phase:** - 11. Clean up migration code - 12. Run tests - confirm still PASS - - **Commit:** - 13. Run quality checks (all pass) - 14. Commit: "feat(db): Create user table schema" - 15. Add git note with task summary - 16. Commit plan.md update - 17. Mark 1.1 as [x] in plan.md - 18. Update metadata.json - 19. Move to next task 1.2 - - - - - Continue implementing auth - - 1. Complete final task of Phase 1 - 2. All Phase 1 tasks now [x] - 3. Announce: "Phase 1 complete. Starting verification protocol." - - **Phase Completion Protocol:** - 4. Check test coverage for all Phase 1 files - 5. Run: CI=true npm test (PASSED) - 6. Present manual verification steps to user - 7. Ask: "Does this meet your expectations?" - 8. User confirms: "yes" - 9. Create checkpoint commit - 10. Add verification report via git notes - 11. Update plan.md with [checkpoint: abc1234] - 12. Commit plan update - 13. Announce: "Phase 1 checkpoint created. Proceeding to Phase 2." - 14. Ask approval before starting Phase 2 - - - - - Continue implementing auth - - 1. Load track, find current task 2.1 - 2. Start Red Phase, encounter issue - 3. Issue: Missing database credentials - 4. Mark 2.1 as [!] blocked - 5. Add note: "> BLOCKED: Need database credentials configured" - 6. Ask user: "Task 2.1 is blocked. Options: - (A) Provide credentials to continue - (B) Skip to task 2.2 - (C) Pause implementation" - 7. User provides credentials - 8. Remove blocker, mark [~] in_progress - 9. Continue TDD cycle - - - - - - +### Complete a task with TDD ``` -## Implementation Progress - -Track: feature_auth_20260105 -Phase: 2/4 - Core Authentication -Task: 2.1/2.5 - Implement password hashing - -[==========-----] 40% complete - -Recent: -- [x] 1.1 Create user table schema (abc1234) -- [x] 1.2 Add migration scripts (def5678) -- [x] 1.3 Set up database connection (ghi9012) -- [x] 2.1 Implement password hashing (just completed) - -Next: -- [ ] 2.2 Create login endpoint +User: Start implementing the auth feature + +Agent: +1. Loads feature_auth track, finds first pending task: 1.1 Create user table +2. Marks 1.1 as [~] in plan.md +3. Red: Creates test file, writes schema tests, runs → FAIL +4. Green: Creates migration, defines schema, runs → PASS +5. Refactor: Cleans up, runs → PASS +6. Commits: "feat(db): Create user table schema" +7. Adds git note, marks 1.1 as [x], moves to 1.2 ``` - - - -## Task Complete - -**Track:** {track_id} -**Task:** {phase}.{task} - {task_title} -**Commit:** {short_sha} -**Type:** {feat/fix/refactor/etc.} - -**TDD Cycle:** -- Red: Tests written and failing -- Green: Implementation complete, tests passing -- Refactor: Code cleaned up - -**Quality Checks:** -- Lint: PASS -- Tests: PASS ({N} tests, {X}% coverage) -- TypeCheck: PASS -**Next Task:** {next_task_id} - {next_task_title} - -Continue to next task? [Yes/No] - - - -## Phase Complete - -**Track:** {track_id} -**Phase:** {N} - {phase_name} -**Checkpoint:** {checkpoint_sha} - -**Verification Report:** -- Automated Tests: PASSED -- Coverage: {X}% -- Manual Verification: User confirmed +### Handle a blocker +``` +User: Continue implementing auth -**Git Note:** Attached to checkpoint commit +Agent: +1. Starts task 2.1, encounters missing database credentials +2. Marks 2.1 as [!] with note: "BLOCKED: Need database credentials" +3. Asks user: "(A) Provide credentials (B) Skip to 2.2 (C) Pause" +4. User provides credentials → removes blocker, continues TDD cycle +``` -**Next Phase:** {N+1} - {next_phase_name} +## Verification -Proceed to next phase? [Yes/No] - - +After each task completion, confirm: +- [ ] Tests pass with >80% coverage +- [ ] Commit follows type convention with task reference +- [ ] Git note attached with task summary +- [ ] plan.md status updated to `[x]` +- [ ] metadata.json updated with commit SHA diff --git a/plugins/conductor/skills/new-track/SKILL.md b/plugins/conductor/skills/new-track/SKILL.md index 483eb95..8d37493 100644 --- a/plugins/conductor/skills/new-track/SKILL.md +++ b/plugins/conductor/skills/new-track/SKILL.md @@ -1,259 +1,113 @@ --- name: new-track -description: Create development track with spec and hierarchical plan through interactive Q&A -version: 1.0.0 -tags: [conductor, track, planning, spec, phases] -keywords: [new track, feature, bugfix, plan, spec, phases, tasks] +description: "Creates a development track by generating spec.md and hierarchical plan.md through interactive Q&A. Reads project context from product.md and tech-stack.md to inform planning. Use when planning a new feature, bugfix, or refactor." --- -plugin: conductor -updated: 2026-01-20 - - Development Planner & Spec Writer - - - Requirements elicitation and specification - - Hierarchical plan creation (phases/tasks/subtasks) - - Track lifecycle management - - Context-aware planning (reads product.md, tech-stack.md) - - - Transform user requirements into structured, actionable development - plans with clear phases, tasks, and subtasks that enable systematic - implementation. - - +# Conductor New Track - - - - You MUST use Tasks to track planning progress: - 1. Validate conductor setup exists - 2. Gather track requirements - 3. Generate track ID - 4. Create spec.md - 5. Create plan.md with phases - 6. Create metadata.json - 7. Update tracks.md index - +Transforms user requirements into structured, actionable development plans with clear phases, tasks, and subtasks that enable systematic implementation. - - FIRST check if conductor/ directory exists with required files. - If not, HALT and guide user to run conductor:setup first. - +## Workflow - - ALWAYS read these files before planning: - - conductor/product.md (understand project goals) - - conductor/tech-stack.md (know technical constraints) - - conductor/workflow.md (follow team processes) - +1. **Validate conductor setup** + - Check `conductor/` directory exists with `product.md`, `tech-stack.md`, `workflow.md` + - If missing, halt and guide user to run `conductor:setup` first - - Format: {shortname}_{YYYYMMDD} - Examples: - - feature_auth_20260105 - - bugfix_login_20260105 - - refactor_api_20260105 - - +2. **Load project context** + - Read `conductor/product.md` (project goals) + - Read `conductor/tech-stack.md` (technical constraints) + - Read `conductor/tracks.md` (existing tracks) - - - Always create spec.md BEFORE plan.md. - Spec defines WHAT, Plan defines HOW. - +3. **Define track type and ID** + - Ask: What type of work? (Feature / Bugfix / Refactor / Task) + - Ask: Short name for this track? (3-10 chars, lowercase) + - Generate track ID: `{type}_{shortname}_{YYYYMMDD}` - - Plans must have: - - 2-6 Phases (major milestones) - - 2-5 Tasks per phase - - 0-3 Subtasks per task (optional) - +4. **Generate specification** + - Ask: What is the goal? (1-2 sentences) + - Ask: Acceptance criteria? (3-5 items) + - Ask: Technical constraints or dependencies? + - Ask: Edge cases or error scenarios? + - Create `conductor/tracks/{track_id}/spec.md` - - Each task must be: - - Specific (clear outcome) - - Estimable (roughly 1-4 hours) - - Independent (minimal dependencies) - - +5. **Generate plan** + - Propose 2-6 phases based on spec + - Ask user to confirm or modify phases + - Generate 2-5 tasks per phase with optional subtasks + - Create `conductor/tracks/{track_id}/plan.md` - - - Check conductor/ directory exists - Check required files: product.md, tech-stack.md, workflow.md - If missing, HALT with guidance to run setup - Initialize Tasks - +6. **Finalize** + - Create `conductor/tracks/{track_id}/metadata.json` + - Update `conductor/tracks.md` index + - Present summary with phase/task counts - - Read conductor/product.md - Read conductor/tech-stack.md - Read conductor/tracks.md for existing tracks - +## Plan Structure - - Ask: What type of work? [Feature, Bugfix, Refactor, Task, Other] - Ask: Short name for this track? (3-10 chars, lowercase) - Generate track ID: {type}_{shortname}_{YYYYMMDD} - - - - Ask: What is the goal of this work? (1-2 sentences) - Ask: What are the acceptance criteria? (list 3-5) - Ask: Any technical constraints or dependencies? - Ask: Any edge cases or error scenarios to handle? - Generate conductor/tracks/{track_id}/spec.md - - - - Based on spec, propose 2-6 phases - Ask user to confirm or modify phases - For each phase, generate 2-5 tasks - Add subtasks where complexity warrants - Generate conductor/tracks/{track_id}/plan.md - - - - Create conductor/tracks/{track_id}/metadata.json - Update conductor/tracks.md with new track - Present summary to user - - - - - - - **Feature:** New functionality - - Larger scope, 4-6 phases typical - - Includes testing and documentation phases - - **Bugfix:** Fix existing issue - - Smaller scope, 2-3 phases typical - - Includes reproduction and verification phases - - **Refactor:** Code improvement - - Medium scope, 3-4 phases typical - - Includes before/after comparison phase - - **Task:** General work item - - Variable scope - - Flexible structure - - - ```markdown # Plan: {Track Title} Track ID: {track_id} -Type: {Feature/Bugfix/Refactor/Task} -Created: {YYYY-MM-DD} +Type: Feature +Created: 2026-01-05 Status: Active ## Phase 1: {Phase Name} - [ ] 1.1 {Task description} - [ ] 1.2 {Task description} - [ ] 1.2.1 {Subtask} - - [ ] 1.2.2 {Subtask} - [ ] 1.3 {Task description} ## Phase 2: {Phase Name} - [ ] 2.1 {Task description} -- [ ] 2.2 {Task description} - -## Phase 3: Testing & Documentation -- [ ] 3.1 Write unit tests -- [ ] 3.2 Update documentation ``` - - -```markdown -# Spec: {Track Title} +## Constraints -Track ID: {track_id} -Type: {Feature/Bugfix/Refactor/Task} -Created: {YYYY-MM-DD} - -## Goal -{1-2 sentence description of what this achieves} - -## Background -{Context from product.md relevant to this work} +- Always create spec.md BEFORE plan.md (spec defines WHAT, plan defines HOW) +- Plans must have 2-6 phases, 2-5 tasks per phase, 0-3 subtasks per task +- Each task should be specific, estimable (1-4 hours), and minimally dependent +- Conductor setup MUST exist before creating tracks -## Acceptance Criteria -- [ ] {Criterion 1} -- [ ] {Criterion 2} -- [ ] {Criterion 3} +## Track Types -## Technical Constraints -- {Constraint 1 from tech-stack.md} -- {Constraint 2} +| Type | Typical Phases | Scope | +|------|---------------|-------| +| Feature | 4-6 phases (includes testing/docs) | New functionality | +| Bugfix | 2-3 phases (reproduce, fix, verify) | Fix existing issue | +| Refactor | 3-4 phases (includes before/after comparison) | Code improvement | +| Task | Variable | General work item | -## Edge Cases -- {Edge case 1} -- {Edge case 2} +## Examples -## Out of Scope -- {What this track does NOT include} +### New feature track +``` +User: I want to add user authentication + +Agent: +1. Validates conductor/ exists, loads context +2. Track type: Feature, short name: "auth" +3. Generates ID: feature_auth_20260105 +4. Gathers spec: goal, criteria, constraints, edge cases +5. Proposes phases: Database, Core Auth, Sessions, Testing +6. User confirms → generates plan.md with tasks +7. Updates tracks.md index, presents summary ``` - - - - - - I want to add user authentication - - 1. Validate conductor/ exists with required files - 2. Load product.md, tech-stack.md context - 3. Ask track type - "Feature" - 4. Ask short name - "auth" - 5. Generate ID: feature_auth_20260105 - 6. Ask spec questions (goal, criteria, constraints) - 7. Generate spec.md - 8. Propose phases: Database, Core Auth, Sessions, Testing - 9. User confirms phases - 10. Generate plan.md with tasks - 11. Update tracks.md index - - - - - Login page keeps redirecting in a loop - - 1. Validate conductor/ exists - 2. Load context - 3. Track type: "Bugfix" - 4. Short name: "login-loop" - 5. Generate ID: bugfix_login-loop_20260105 - 6. Spec: Reproduce, root cause, fix approach - 7. Plan phases: Reproduce (1), Fix (2), Verify (3) - 8. Generate files and update index - - - - - - -## Track Created Successfully -**Track ID:** {track_id} -**Type:** {type} +### Bugfix track +``` +User: Login page keeps redirecting in a loop -**Files Created:** -- conductor/tracks/{track_id}/spec.md -- conductor/tracks/{track_id}/plan.md -- conductor/tracks/{track_id}/metadata.json +Agent: +1. Track type: Bugfix, short name: "login-loop" +2. ID: bugfix_login-loop_20260105 +3. Spec: Reproduction steps, root cause hypothesis +4. Plan: Phase 1 (Reproduce), Phase 2 (Fix), Phase 3 (Verify) +``` -**Plan Summary:** -- Phase 1: {name} ({N} tasks) -- Phase 2: {name} ({N} tasks) -- Phase 3: {name} ({N} tasks) -- Total: {X} phases, {Y} tasks +## Verification -**Next Steps:** -1. Review spec.md and plan.md -2. Adjust if needed -3. Run `conductor:implement` to start executing - - +After track creation, confirm: +- [ ] `conductor/tracks/{track_id}/spec.md` exists with acceptance criteria +- [ ] `conductor/tracks/{track_id}/plan.md` exists with phases and tasks +- [ ] `conductor/tracks/{track_id}/metadata.json` exists +- [ ] `conductor/tracks.md` updated with new track entry diff --git a/plugins/conductor/skills/revert/SKILL.md b/plugins/conductor/skills/revert/SKILL.md index 47c8993..9d62c6f 100644 --- a/plugins/conductor/skills/revert/SKILL.md +++ b/plugins/conductor/skills/revert/SKILL.md @@ -1,238 +1,92 @@ --- name: revert -description: Git-aware logical undo at track, phase, or task level with confirmation gates -version: 1.0.0 -tags: [conductor, revert, undo, git, rollback] -keywords: [revert, undo, rollback, git, track, phase, task] +description: "Performs git-aware logical undo at track, phase, or task level with impact preview and confirmation gates. Creates revert commits to preserve history and validates state consistency after rollback. Use when needing to undo completed work or roll back a phase." --- -plugin: conductor -updated: 2026-01-20 - - - Safe Revert Specialist - - - Git history analysis and reversal - - Logical grouping of commits by track/phase/task - - State validation after reversal - - Safe rollback with confirmation gates - - - Enable safe, logical rollback of development work at meaningful - granularity (track/phase/task) while maintaining git history integrity - and project consistency. - - - - - - - You MUST use Tasks to track the revert workflow. - - **Before starting**, create todo list with these 5 phases: - 1. Scope Selection - Identify what to revert (track/phase/task) - 2. Impact Analysis - Find commits, files, status changes - 3. User Confirmation - Present impact and get approval - 4. Execution - Create revert commits and update files - 5. Validation - Verify consistency and report results - - **Update continuously**: - - Mark "in_progress" when starting each phase - - Mark "completed" immediately after finishing - - Keep only ONE phase "in_progress" at a time - - - - ALWAYS require explicit user confirmation before: - - Reverting any commits - - Modifying plan.md status - - Deleting track files - - Show exactly what will be changed BEFORE doing it. - - - - Default to creating revert commits, not force-pushing. - Preserve git history unless user explicitly requests otherwise. - - - - After any revert: - 1. Verify plan.md matches git state - 2. Verify metadata.json is consistent - 3. Run project quality checks - 4. Report any inconsistencies - - - - - - Revert by logical units (track/phase/task), not raw commits. - A task might have multiple commits - revert them together. - - - - Show user exactly what will be reverted before doing it. - List commits, files, status changes. - - - - If full revert fails, offer partial revert options. - Never leave project in inconsistent state. - - - - - - Ask: What to revert? [Track, Phase, Task] - If Track: Ask which track - If Phase: Ask which track, which phase - If Task: Ask which track, which task - - - - Read metadata.json to find related commits - List all commits that will be reverted - List all files that will be affected - List status changes in plan.md - - - - Present impact analysis to user - Ask for explicit confirmation - If declined, abort with no changes - - - - Create revert commits for each original commit - Update plan.md statuses back to [ ] - Update metadata.json to reflect revert - Remove completed tasks from history - - - - Verify git state matches plan.md - Run project quality checks - Report final state to user - - - - - - - **Task Level:** - - Reverts single task's commits - - Updates task status to [ ] - - Preserves other tasks in phase - - **Phase Level:** - - Reverts all tasks in phase - - Updates all task statuses to [ ] - - Preserves other phases - - **Track Level:** - - Reverts entire track - - Optionally deletes track files - - Updates tracks.md index - - - - Find commits for a task using: - 1. metadata.json commit array - 2. Git log searching for "[{track_id}]" pattern - 3. Git notes with task references - - - - **Safe Revert (Default):** - - Create revert commits - - Preserves full history - - Can be undone - - **Hard Reset (Requires explicit request):** - - Reset branch to before commits - - Loses history (unless pushed) - - Cannot be easily undone - - - - - - Undo task 2.3 - - 1. Identify track with task 2.3 - 2. Find commits for task 2.3 from metadata.json - 3. Show impact: - "Will revert 2 commits: - - abc123: [feature_auth] Implement login form - - def456: [feature_auth] Add login validation - Files affected: src/login.tsx, src/auth.ts" - 4. Ask confirmation - 5. Create revert commits - 6. Update plan.md: 2.3 [x] -> [ ] - 7. Update metadata.json - 8. Validate state - - - - - Roll back Phase 2 of the auth feature - - 1. Find all tasks in Phase 2 - 2. Find all commits for those tasks - 3. Show impact: - "Will revert 8 commits affecting Phase 2 (5 tasks): - - 2.1 Implement password hashing (2 commits) - - 2.2 Create login endpoint (3 commits) - - 2.3 Create registration endpoint (3 commits) - Files affected: 12 files" - 4. Ask confirmation: "This will undo significant work. Proceed?" - 5. Create revert commits in reverse order - 6. Update all Phase 2 task statuses to [ ] - 7. Update metadata.json - 8. Validate state - - - - - - -## Revert Impact Analysis - -**Scope:** {Task/Phase/Track} {identifier} - -**Commits to Revert:** {N} -{#each commit} -- {short_sha}: {message} -{/each} - -**Files Affected:** {N} -{#each file} -- {filepath} -{/each} - -**Status Changes in plan.md:** -{#each task} -- {task_id}: [x] -> [ ] -{/each} - -**WARNING:** This action will create {N} revert commits. -Git history will be preserved. - -Proceed with revert? [Yes/No] - - - -## Revert Complete - -**Reverted:** {scope} {identifier} -**Commits Created:** {N} revert commits -**Tasks Reset:** {N} tasks now pending - -**Validation:** -- Plan.md: Consistent -- Git State: Clean -- Quality Checks: PASS - -The {scope} has been reverted. You can re-implement or abandon this work. - - + +# Conductor Revert + +Enables safe, logical rollback of development work at meaningful granularity (track/phase/task) while maintaining git history integrity and project consistency. + +## Workflow + +1. **Select revert scope** + - Ask user: What to revert? (Track / Phase / Task) + - For Track: ask which track + - For Phase: ask which track and which phase + - For Task: ask which track and which task + +2. **Analyze impact** + - Read `metadata.json` to find related commits + - Search git log for `[{track_id}]` pattern + - List all commits, affected files, and plan.md status changes + +3. **Present impact and confirm** + - Show exact commits to revert, files affected, and status changes + - Ask for explicit user confirmation before proceeding + - If declined, abort with no changes + +4. **Execute revert** + - Create revert commits for each original commit (preserves history) + - Update plan.md task statuses back to `[ ]` + - Update `metadata.json` to reflect revert + +5. **Validate consistency** + - Verify plan.md matches git state + - Verify `metadata.json` is consistent + - Run project quality checks + - Report final state to user + +## Revert Levels + +| Level | Scope | Effect | +|-------|-------|--------| +| Task | Single task's commits | Reverts task commits, sets status to `[ ]`, preserves other tasks | +| Phase | All tasks in a phase | Reverts all phase commits, resets all task statuses | +| Track | Entire track | Reverts all commits, optionally deletes track files, updates `tracks.md` | + +## Constraints + +- ALWAYS require explicit user confirmation before reverting any commits +- Default to creating revert commits (non-destructive); only hard-reset if user explicitly requests +- Never leave project in inconsistent state - if full revert fails, offer partial options +- Show exact changes BEFORE executing them + +## Examples + +### Revert a single task +``` +User: Undo task 2.3 + +Agent: +1. Finds commits for task 2.3 from metadata.json +2. Shows impact: + "Will revert 2 commits: + - abc123: [feature_auth] Implement login form + - def456: [feature_auth] Add login validation + Files affected: src/login.tsx, src/auth.ts" +3. Asks confirmation → user confirms +4. Creates revert commits +5. Updates plan.md: 2.3 [x] → [ ] +6. Validates state consistency +``` + +### Revert an entire phase +``` +User: Roll back Phase 2 of the auth feature + +Agent: +1. Finds all tasks in Phase 2 (5 tasks, 8 commits) +2. Shows impact with file count and task list +3. Asks: "This will undo significant work. Proceed?" +4. Creates revert commits in reverse order +5. Resets all Phase 2 task statuses to [ ] +6. Validates state consistency +``` + +## Verification + +After every revert, confirm: +- [ ] Plan.md statuses match git state +- [ ] metadata.json is consistent +- [ ] Project quality checks pass +- [ ] No uncommitted changes remain diff --git a/plugins/conductor/skills/setup/SKILL.md b/plugins/conductor/skills/setup/SKILL.md index 356028d..1bcba06 100644 --- a/plugins/conductor/skills/setup/SKILL.md +++ b/plugins/conductor/skills/setup/SKILL.md @@ -1,240 +1,106 @@ --- name: setup -description: Initialize Conductor with product.md, tech-stack.md, and workflow.md -version: 1.1.0 -tags: [conductor, setup, initialization, context, project] -keywords: [conductor setup, initialize, project context, greenfield, brownfield] +description: "Initializes Conductor by creating product.md, tech-stack.md, and workflow.md through interactive Q&A. Detects greenfield vs brownfield projects and supports resume from interruption. Use when starting a new project or onboarding an existing codebase to Conductor." --- -plugin: conductor -updated: 2026-01-20 - - - Project Context Architect - - - Project initialization and context gathering - - Interactive Q&A for requirements elicitation - - State management and resume capability - - Greenfield vs Brownfield project handling - - - Guide users through structured project initialization, creating - comprehensive context artifacts that serve as the foundation for - all future development work. - - - - - - - You MUST use Tasks to track setup progress: - 1. Check for existing conductor/ directory - 2. Determine project type (Greenfield/Brownfield) - 3. Create product.md through Q&A - 4. Create product-guidelines.md - 5. Create tech-stack.md through Q&A - 6. Create code styleguides - 7. Copy workflow.md template - 8. Finalize setup - - - - Check for conductor/setup_state.json FIRST. - If exists with status != "complete": - 1. Load saved answers - 2. Resume from last incomplete section - 3. Show user what was already collected - - - - - Ask questions SEQUENTIALLY (one at a time) - - Maximum 5 questions per section - - Always include "Type your own answer" option - - Use AskUserQuestion with appropriate question types - - Save state after EACH answer (for resume) - - - - Before any operation: - 1. Check if conductor/ already exists - 2. If complete setup exists, ask: "Re-initialize or abort?" - 3. Respect .gitignore patterns - - - - - - Never ask multiple questions at once. - Wait for answer before asking next question. - - - - Save progress after each answer. - Enable resume from any interruption point. - - - - Gather enough context to be useful. - Don't overwhelm with excessive questions. - - - - - - Check if conductor/ directory exists - If exists, check setup_state.json for resume - If complete setup exists, confirm re-initialization - Initialize Tasks with setup phases - - - - Check for existing code files (src/, package.json, etc.) - Ask user: Greenfield (new) or Brownfield (existing)? - For Brownfield: Scan existing code for context - - - - Ask: What is this project about? (1-2 sentences) - Ask: Who is the target audience? - Ask: What are the 3 main goals? - Ask: Any constraints or requirements? - Generate product.md from answers - - - - Ask: Primary programming language(s)? - Ask: Key frameworks/libraries? - Ask: Database/storage preferences? - Ask: Deployment target? - Generate tech-stack.md from answers - - - - Ask: Any specific coding conventions? - Ask: Testing requirements? - Generate product-guidelines.md - Generate code_styleguides/general.md (always) - Generate language-specific styleguides based on tech stack: - - TypeScript/JavaScript → typescript.md, javascript.md - - Web projects → html-css.md - - Python → python.md - - Go → go.md - - - - - Copy workflow.md template - Create empty tracks.md - Mark setup_state.json as complete - Present summary to user - - - - - - - **Greenfield (New Project):** - - No existing code to analyze - - More questions needed about vision - - Focus on future architecture - - **Brownfield (Existing Project):** - - Scan existing files for context - - Infer tech stack from package.json, requirements.txt, etc. - - Focus on documenting current state - - - - **Additive (Multi-Select):** - - "Which frameworks are you using?" [React, Vue, Angular, Other] - - User can select multiple - - **Exclusive (Single-Select):** - - "Primary language?" [TypeScript, Python, Go, Other] - - User picks one - - **Open-Ended:** - - "Describe your project in 1-2 sentences" - - Free text response - - - + +# Conductor Setup + +Guides users through structured project initialization, creating context artifacts that serve as the foundation for all future development work. + +## Workflow + +1. **Validate existing state** + - Check if `conductor/` directory exists + - If `conductor/setup_state.json` exists with `status != "complete"`, load saved answers and resume from last incomplete section + - If complete setup exists, ask user: "Re-initialize or abort?" + +2. **Detect project type** + - Scan for existing code files (`src/`, `package.json`, `requirements.txt`) + - Ask user: Greenfield (new) or Brownfield (existing)? + - For Brownfield: scan existing code to infer context + +3. **Gather product context** (one question at a time, max 5 per section) + - What is this project about? (1-2 sentences) + - Who is the target audience? + - What are the 3 main goals? + - Any constraints or requirements? + - Generate `conductor/product.md` from answers + +4. **Gather technical context** + - Primary programming language(s)? + - Key frameworks/libraries? + - Database/storage preferences? + - Deployment target? + - Generate `conductor/tech-stack.md` from answers + +5. **Generate guidelines** + - Ask about coding conventions and testing requirements + - Generate `conductor/product-guidelines.md` + - Generate `conductor/code_styleguides/general.md` + - Generate language-specific styleguides (TypeScript, Python, Go, etc.) + +6. **Finalize setup** + - Copy `workflow.md` template + - Create empty `tracks.md` + - Mark `setup_state.json` as complete + - Present summary to user + +## Constraints + +- Ask questions SEQUENTIALLY (one at a time, never multiple) +- Save state to `setup_state.json` after EACH answer for resume capability +- Always include "Type your own answer" option in question prompts +- Respect `.gitignore` patterns when creating files + +## State File Schema + ```json { - "status": "in_progress" | "complete", + "status": "in_progress", "startedAt": "ISO-8601", "lastUpdated": "ISO-8601", - "projectType": "greenfield" | "brownfield", - "currentSection": "product" | "tech" | "guidelines", + "projectType": "greenfield", + "currentSection": "product", "answers": { - "product": { - "description": "...", - "audience": "...", - "goals": ["...", "...", "..."] - }, - "tech": { - "languages": ["TypeScript"], - "frameworks": ["React", "Node.js"] - } + "product": { "description": "...", "audience": "...", "goals": ["..."] }, + "tech": { "languages": ["TypeScript"], "frameworks": ["React"] } } } ``` - - - - - - I want to set up Conductor for my new project - - 1. Check for existing conductor/ - not found - 2. Ask: "Is this a new project (Greenfield) or existing codebase (Brownfield)?" - 3. User: "New project" - 4. Begin product context questions (one at a time) - 5. Save each answer to setup_state.json - 6. After all sections, generate artifacts - 7. Present summary with next steps - - - - - Continue setting up Conductor - - 1. Check conductor/setup_state.json - found, status: "in_progress" - 2. Load previous answers from state - 3. Show: "Resuming setup. You've completed: Product Context" - 4. Continue from Technical Context section - 5. Complete remaining sections - - - - - - - - Friendly, guiding tone - - Clear progress indicators - - Explain why each question matters - - Confirm understanding before proceeding - - - -## Conductor Setup Complete - -**Project:** {project_name} -**Type:** {Greenfield/Brownfield} - -**Created Artifacts:** -- conductor/product.md - Project vision and goals -- conductor/product-guidelines.md - Standards and conventions -- conductor/tech-stack.md - Technical preferences -- conductor/workflow.md - Development workflow (comprehensive) -- conductor/tracks.md - Track index (empty) -- conductor/code_styleguides/general.md - General coding principles -- conductor/code_styleguides/{language}.md - Language-specific guides - -**Next Steps:** -1. Review generated artifacts and adjust as needed -2. Use `conductor:new-track` to plan your first feature -3. Use `conductor:implement` to execute the plan - -Your project is now ready for Context-Driven Development! - - + +## Examples + +### New project setup +``` +User: I want to set up Conductor for my new project + +Agent: +1. Checks for existing conductor/ → not found +2. Asks: "Is this a new project or existing codebase?" +3. User: "New project" +4. Begins product context questions (one at a time) +5. Saves each answer to setup_state.json +6. Generates all artifacts +7. Presents summary with next steps +``` + +### Resume interrupted setup +``` +User: Continue setting up Conductor + +Agent: +1. Finds conductor/setup_state.json with status: "in_progress" +2. Loads previous answers +3. Shows: "Resuming setup. You've completed: Product Context" +4. Continues from Technical Context section +``` + +## Verification + +After setup completes, confirm these files exist: +- `conductor/product.md` +- `conductor/product-guidelines.md` +- `conductor/tech-stack.md` +- `conductor/workflow.md` +- `conductor/tracks.md` +- `conductor/code_styleguides/general.md` diff --git a/plugins/conductor/skills/status/SKILL.md b/plugins/conductor/skills/status/SKILL.md index d2a728b..1cfa530 100644 --- a/plugins/conductor/skills/status/SKILL.md +++ b/plugins/conductor/skills/status/SKILL.md @@ -1,184 +1,87 @@ --- name: status -description: Show active tracks, progress, current tasks, and blockers -version: 1.0.0 -tags: [conductor, status, progress, overview] -keywords: [status, progress, tracks, overview, blockers, current] +description: "Reads conductor tracks, calculates completion percentages, identifies blockers, and recommends next actions. Parses plan.md and metadata.json across all active tracks. Use when checking project progress or asking what to work on next." --- -plugin: conductor -updated: 2026-01-20 - - - Progress Reporter & Status Analyzer - - - Plan.md parsing and analysis - - Progress calculation and visualization - - Blocker identification - - Multi-track overview - - - Provide clear, actionable status reports that help users understand - their project progress, identify next actions, and spot blockers. - - - - - - - This is a read-only skill that only displays status. - Tasks are NOT required because there are no implementation phases. - The skill performs a single atomic operation: read and present status. - - - - This skill ONLY reads files. - It does NOT modify any conductor/ files. - For modifications, use other skills. - - - - Parse ALL of: - - conductor/tracks.md (index) - - conductor/tracks/*/plan.md (all plans) - - conductor/tracks/*/metadata.json (state) - - - - - - Always end with clear "Next Action" recommendation. - Don't just report status, guide next step. - - - - Prominently display any blocked tasks. - Blockers need attention. - - - - - - Check conductor/ exists - Read conductor/tracks.md for track list - For each track, read plan.md and metadata.json - - - - Count tasks by status: pending, in_progress, complete, blocked - Calculate completion percentage per track - Identify current phase and task - Find any blocked tasks - - - - Display overview summary - Show per-track details - Highlight blockers if any - Recommend next action - - - - - - + +# Conductor Status + +Provides clear, actionable status reports showing project progress, blockers, and recommended next actions. This is a read-only skill that never modifies conductor files. + +## Workflow + +1. **Collect data** + - Verify `conductor/` directory exists + - Read `conductor/tracks.md` for track list + - For each track, read `plan.md` and `metadata.json` + +2. **Analyze progress** + - Count tasks by status: `[ ]` pending, `[~]` in progress, `[x]` complete, `[!]` blocked + - Calculate completion percentage per track: `(completed / total) * 100` + - Identify current phase and active task + - Find any blocked tasks + +3. **Present status report** + - Display overview summary (active tracks, total progress, blocker count) + - Show per-track details with progress bars + - Highlight blockers prominently at the top + - Recommend specific next action + +## Display Priority + +1. BLOCKED tasks (need immediate attention) +2. IN_PROGRESS tasks (current work) +3. Active tracks summary +4. Completed tracks (brief mention) + +## Constraints + +- Read-only: never modify any conductor files +- Always end with a clear "Next Action" recommendation +- Parse ALL tracks, not just the first one found + +## Examples + +### Single active track +``` +User: What's my status? + +Agent displays: +## Conductor Status + +### Active Tracks: 1 + +**feature_auth_20260105** - User Authentication +- Progress: [========------] 53% (8/15 tasks) +- Phase: 2/4 - Core Authentication +- Current: 2.3 Create login endpoint +- Blockers: None + +### Next Action +Continue task 2.3 (Create login endpoint). Run: `conductor:implement` ``` -Completion % = (completed_tasks / total_tasks) * 100 - -Example: -- Total tasks: 15 -- Completed: 6 -- In Progress: 1 -- Pending: 8 -- Completion: 40% + +### Multiple tracks with blocker ``` - - - - Display order by priority: - 1. BLOCKED tasks (need attention) - 2. IN_PROGRESS tasks (current work) - 3. Active tracks summary - 4. Completed tracks (brief) - - - - - - What's my status? - - ## Conductor Status - - ### Active Tracks: 1 - - **feature_auth_20260105** - User Authentication - - Progress: [========------] 53% (8/15 tasks) - - Phase: 2/4 - Core Authentication - - Current: 2.3 Create login endpoint - - Blockers: None - - ### Next Action - Continue task 2.3 (Create login endpoint) in feature_auth track. - Run: `conductor:implement` - - - - - Show me the status - - ## Conductor Status - - ### BLOCKERS (1) - - **feature_auth_20260105** Task 2.4: "Waiting for API key from team lead" - - ### Active Tracks: 2 - - **feature_auth_20260105** - User Authentication - - Progress: [=======-------] 47% (7/15 tasks) - - Phase: 2/4 - Core Authentication - - Status: BLOCKED at 2.4 - - **bugfix_login_20260104** - Login redirect fix - - Progress: [==============] 100% (5/5 tasks) - - Status: Ready for review - - ### Next Action - Resolve blocker in feature_auth (Task 2.4) or complete review of bugfix_login. - - - - - - +User: Show me the status + +Agent displays: ## Conductor Status -### Overview -- Active Tracks: {N} -- Total Progress: {X}% ({completed}/{total} tasks) -- Blockers: {N} - -{#if blockers} -### BLOCKERS -{#each blocker} -- **{track_id}** Task {task_id}: "{blocker_description}" -{/each} -{/if} - -### Active Tracks -{#each active_track} -**{track_id}** - {title} -- Progress: [{progress_bar}] {percent}% ({completed}/{total}) -- Phase: {current_phase}/{total_phases} - {phase_name} -- Current: {current_task_id} {current_task_title} -{/each} - -{#if completed_tracks} -### Completed Tracks -{#each completed_track} -- {track_id} - Completed {date} -{/each} -{/if} +### BLOCKERS (1) +- **feature_auth** Task 2.4: "Waiting for API key from team lead" + +### Active Tracks: 2 + +**feature_auth_20260105** - Progress: 47% (7/15) - BLOCKED at 2.4 +**bugfix_login_20260104** - Progress: 100% (5/5) - Ready for review ### Next Action -{recommendation} - - +Resolve blocker in feature_auth (Task 2.4) or review bugfix_login. +``` + +## Verification + +Confirm the status report includes: +- [ ] Completion percentages for all tracks +- [ ] Any blocked tasks highlighted +- [ ] Clear next action recommendation diff --git a/plugins/dev/skills/backend/auth-patterns/SKILL.md b/plugins/dev/skills/backend/auth-patterns/SKILL.md index bbf8a95..5e81759 100644 --- a/plugins/dev/skills/backend/auth-patterns/SKILL.md +++ b/plugins/dev/skills/backend/auth-patterns/SKILL.md @@ -1,21 +1,6 @@ --- name: auth-patterns -version: 1.0.0 -description: Use when implementing authentication (JWT, sessions, OAuth), authorization (RBAC, ABAC), password hashing, MFA, or security best practices for backend services. -keywords: - - authentication - - authorization - - JWT - - sessions - - OAuth - - RBAC - - ABAC - - password hashing - - bcrypt - - MFA - - security -plugin: dev -updated: 2026-01-20 +description: "Implements JWT token generation with refresh rotation, session-based auth with Redis, OAuth 2.0 flows, RBAC/ABAC authorization, bcrypt password hashing, and TOTP multi-factor authentication. Use when adding authentication, authorization, password security, MFA, or rate limiting to backend services." --- # Authentication Patterns diff --git a/plugins/dev/skills/backend/bunjs-architecture/SKILL.md b/plugins/dev/skills/backend/bunjs-architecture/SKILL.md index 05a3dd4..0aaa2ae 100644 --- a/plugins/dev/skills/backend/bunjs-architecture/SKILL.md +++ b/plugins/dev/skills/backend/bunjs-architecture/SKILL.md @@ -1,18 +1,6 @@ --- name: bunjs-architecture -version: 1.0.0 -description: Use when implementing clean architecture (routes/controllers/services/repositories), establishing camelCase conventions, designing Prisma schemas, or planning structured workflows for Bun.js applications. See bunjs for basics, bunjs-production for deployment. -keywords: - - clean architecture - - layered architecture - - camelCase - - naming conventions - - Prisma schema - - repository pattern - - separation of concerns - - code organization -plugin: dev -updated: 2026-01-20 +description: "Implements clean layered architecture (routes/controllers/services/repositories) for Bun.js TypeScript backends, enforces camelCase conventions end-to-end, and provides Prisma schema templates with Zod validation. Use when structuring Bun.js applications, designing database schemas, or establishing coding conventions." --- # Bun.js Clean Architecture Patterns diff --git a/plugins/dev/skills/backend/database-patterns/SKILL.md b/plugins/dev/skills/backend/database-patterns/SKILL.md index 0198804..c24feab 100644 --- a/plugins/dev/skills/backend/database-patterns/SKILL.md +++ b/plugins/dev/skills/backend/database-patterns/SKILL.md @@ -1,20 +1,6 @@ --- name: database-patterns -version: 1.0.0 -description: Use when designing database schemas, implementing repository patterns, writing optimized queries, managing migrations, or working with indexes and transactions for SQL/NoSQL databases. -keywords: - - database design - - schema design - - repository pattern - - SQL queries - - PostgreSQL - - MySQL - - MongoDB - - indexes - - migrations - - transactions -plugin: dev -updated: 2026-01-20 +description: "Designs normalized schemas with indexing strategies, implements repository patterns with typed queries, manages migrations safely, and handles transactions with proper isolation levels for SQL and NoSQL databases. Use when designing schemas, optimizing queries, implementing pagination, or managing database migrations." --- # Database Patterns diff --git a/plugins/dev/skills/backend/error-handling/SKILL.md b/plugins/dev/skills/backend/error-handling/SKILL.md index 989900b..d2d6952 100644 --- a/plugins/dev/skills/backend/error-handling/SKILL.md +++ b/plugins/dev/skills/backend/error-handling/SKILL.md @@ -1,18 +1,6 @@ --- name: error-handling -version: 1.0.0 -description: Use when implementing custom error classes, error middleware, structured logging, retry logic, or graceful shutdown patterns in backend applications. -keywords: - - error handling - - custom errors - - error middleware - - logging - - retry logic - - graceful shutdown - - error responses - - debugging -plugin: dev -updated: 2026-01-20 +description: "Implements custom error class hierarchies, Express error middleware with structured JSON responses, retry logic with exponential backoff, and graceful shutdown handlers. Use when building error handling for backend services, adding structured logging, or implementing retry patterns." --- # Error Handling Patterns diff --git a/plugins/dev/skills/backend/python/SKILL.md b/plugins/dev/skills/backend/python/SKILL.md index c49796e..9476978 100644 --- a/plugins/dev/skills/backend/python/SKILL.md +++ b/plugins/dev/skills/backend/python/SKILL.md @@ -1,18 +1,6 @@ --- name: python -version: 1.0.0 -description: Use when building FastAPI applications, implementing async endpoints, setting up Pydantic schemas, working with SQLAlchemy, or writing pytest tests for Python backend services. -keywords: - - Python - - FastAPI - - Pydantic - - SQLAlchemy - - async - - pytest - - backend - - API -plugin: dev -updated: 2026-01-20 +description: "Provides FastAPI application templates with async SQLAlchemy repositories, Pydantic schema validation, dependency injection patterns, and pytest integration test fixtures. Use when building Python backends, implementing async API endpoints, designing Pydantic models, or writing pytest tests." --- # Python Backend Patterns diff --git a/plugins/dev/skills/backend/rust/SKILL.md b/plugins/dev/skills/backend/rust/SKILL.md index 9f2e02b..98b44bc 100644 --- a/plugins/dev/skills/backend/rust/SKILL.md +++ b/plugins/dev/skills/backend/rust/SKILL.md @@ -1,18 +1,6 @@ --- name: rust -version: 1.0.0 -description: Use when building Axum applications, implementing type-safe handlers, working with SQLx, setting up error handling with thiserror, or writing Rust backend services. -keywords: - - Rust - - Axum - - SQLx - - tokio - - async - - type safety - - backend - - thiserror -plugin: dev -updated: 2026-01-20 +description: "Provides Axum application templates with SQLx type-safe queries, thiserror-based error handling that implements IntoResponse, repository patterns with async Tokio runtime, and integration test setup. Use when building Rust backends, implementing Axum handlers, designing SQLx models, or writing Rust API tests." --- # Rust Backend Patterns diff --git a/plugins/dev/skills/context-detection/SKILL.md b/plugins/dev/skills/context-detection/SKILL.md index 564ae52..b94ba8e 100644 --- a/plugins/dev/skills/context-detection/SKILL.md +++ b/plugins/dev/skills/context-detection/SKILL.md @@ -1,19 +1,6 @@ --- name: context-detection -version: 1.1.0 -description: Use when detecting project technology stack from files/configs/directory structure, auto-loading framework-specific skills, or analyzing multi-stack fullstack projects (e.g., React + Go). -keywords: - - context detection - - stack detection - - technology stack - - project analysis - - auto-detection - - framework detection - - skill discovery -plugin: dev -updated: 2026-02-03 -used_by: stack-detector agent, all dev commands -allowed-tools: Bash(node *) +description: "Detects project technology stacks from config files, directory structure, and file extensions, then auto-loads matching framework-specific skills. Use when analyzing project structure, detecting multi-stack fullstack setups (e.g., React + Go), or mapping stacks to quality checks." --- # Context Detection Skill diff --git a/plugins/dev/skills/core/debugging-strategies/SKILL.md b/plugins/dev/skills/core/debugging-strategies/SKILL.md index 48ce920..1279853 100644 --- a/plugins/dev/skills/core/debugging-strategies/SKILL.md +++ b/plugins/dev/skills/core/debugging-strategies/SKILL.md @@ -1,18 +1,6 @@ --- name: debugging-strategies -version: 1.0.0 -description: Use when troubleshooting bugs, analyzing stack traces, using debugging tools (breakpoints, loggers), or applying systematic debugging methodology across any technology stack. -keywords: - - debugging - - troubleshooting - - stack traces - - breakpoints - - logging - - error analysis - - bug fixing - - debugging tools -plugin: dev -updated: 2026-01-20 +description: "Applies systematic debugging methodology (scientific method, binary search, wolf fence) with structured logging strategies, breakpoint placement patterns, and common bug pattern detection. Use when troubleshooting bugs, analyzing stack traces, tracing data flow, or diagnosing race conditions across any stack." --- # Universal Debugging Strategies diff --git a/plugins/dev/skills/core/testing-strategies/SKILL.md b/plugins/dev/skills/core/testing-strategies/SKILL.md index bc525a3..1b26113 100644 --- a/plugins/dev/skills/core/testing-strategies/SKILL.md +++ b/plugins/dev/skills/core/testing-strategies/SKILL.md @@ -1,19 +1,6 @@ --- name: testing-strategies -version: 1.0.0 -description: Use when writing tests, setting up test frameworks, implementing mocking strategies, or establishing testing best practices (unit, integration, E2E) across any technology stack. -keywords: - - testing - - unit tests - - integration tests - - E2E testing - - mocking - - test coverage - - test-driven development - - TDD - - test pyramid -plugin: dev -updated: 2026-01-20 +description: "Implements the test pyramid (unit/integration/E2E) with AAA pattern, provides mocking strategies (stubs, spies, fakes), and establishes coverage targets by code criticality. Use when writing tests, setting up test frameworks, implementing mocks, or defining testing best practices across any technology stack." --- # Universal Testing Strategies diff --git a/plugins/dev/skills/core/universal-patterns/SKILL.md b/plugins/dev/skills/core/universal-patterns/SKILL.md index 9292c46..8d269c7 100644 --- a/plugins/dev/skills/core/universal-patterns/SKILL.md +++ b/plugins/dev/skills/core/universal-patterns/SKILL.md @@ -1,17 +1,6 @@ --- name: universal-patterns -version: 1.0.0 -description: Use when implementing language-agnostic patterns like layered architecture, dependency injection, error handling, or code organization principles across any technology stack. -keywords: - - architecture - - design patterns - - clean code - - SOLID principles - - error handling - - best practices - - code organization -plugin: dev -updated: 2026-01-20 +description: "Applies language-agnostic architecture patterns including layered architecture, dependency injection, SOLID principles, and error handling strategies. Use when organizing code structure, implementing design patterns, or establishing best practices across any technology stack." --- # Universal Development Patterns diff --git a/plugins/dev/skills/design/design-references/SKILL.md b/plugins/dev/skills/design/design-references/SKILL.md index 11edce4..65d4843 100644 --- a/plugins/dev/skills/design/design-references/SKILL.md +++ b/plugins/dev/skills/design/design-references/SKILL.md @@ -1,10 +1,6 @@ --- name: design-references -version: 1.0.0 -description: | - Predefined design system references for UI reviews. Includes Material Design 3, - Apple Human Interface Guidelines, Tailwind UI, Ant Design, and Shadcn/ui. - Use when conducting design reviews against established design systems. +description: "Provides predefined design system references (Material Design 3, Apple HIG, Tailwind UI, Ant Design, Shadcn/ui) with color palettes, typography scales, spacing tokens, and review checklists. Use when conducting design reviews against established design systems or validating component consistency." --- # Design References Skill diff --git a/plugins/dev/skills/design/ui-analyse/SKILL.md b/plugins/dev/skills/design/ui-analyse/SKILL.md index 548d2b3..83fd8c5 100644 --- a/plugins/dev/skills/design/ui-analyse/SKILL.md +++ b/plugins/dev/skills/design/ui-analyse/SKILL.md @@ -1,10 +1,6 @@ --- name: ui-analyse -version: 2.0.0 -description: | - UI visual analysis patterns using Gemini 3 Pro Preview multimodal capabilities. - Analysis-only - no code changes. Use dev:ui-implement for applying improvements. - Includes provider detection, prompting patterns, and severity guidelines. +description: "Performs visual UI analysis using Gemini multimodal capabilities, detects usability issues with severity scoring, and audits WCAG accessibility compliance. Use when reviewing screenshots, conducting accessibility audits, or comparing implementation against design references." --- # UI Analysis Skill diff --git a/plugins/dev/skills/design/ui-design-review/SKILL.md b/plugins/dev/skills/design/ui-design-review/SKILL.md index 3e15e2b..1417bc5 100644 --- a/plugins/dev/skills/design/ui-design-review/SKILL.md +++ b/plugins/dev/skills/design/ui-design-review/SKILL.md @@ -1,9 +1,6 @@ --- name: ui-design-review -version: 1.0.0 -description: | - Prompting patterns and review templates for UI design analysis with Gemini multimodal capabilities. - Use when conducting design reviews, accessibility audits, or design system validation. +description: "Provides structured prompting patterns for Gemini-powered UI analysis including usability reviews, WCAG accessibility audits, design system consistency checks, and comparative design reviews with severity-ranked output. Use when reviewing screenshots, auditing accessibility, or validating implementation against design specs." --- # UI Design Review Skill diff --git a/plugins/dev/skills/design/ui-implement/SKILL.md b/plugins/dev/skills/design/ui-implement/SKILL.md index ad9caa9..b2f9bbf 100644 --- a/plugins/dev/skills/design/ui-implement/SKILL.md +++ b/plugins/dev/skills/design/ui-implement/SKILL.md @@ -1,10 +1,6 @@ --- name: ui-implement -version: 1.0.0 -description: | - Patterns for implementing UI improvements based on design analysis. - Works with review documents from dev:ui-analyse or /dev:ui command. - Includes Anti-AI design rules and visual verification. +description: "Transforms design review findings into code changes using 5 Anti-AI design rules (break symmetry, add texture, dramatic typography, micro-interactions, bespoke colors) with optional Gemini visual verification. Use when applying UI improvements from analysis, implementing design system changes, or converting generic layouts into distinctive designs." --- # UI Implementation Skill diff --git a/plugins/dev/skills/design/ui-style-format/SKILL.md b/plugins/dev/skills/design/ui-style-format/SKILL.md index 04b06e8..11ec341 100644 --- a/plugins/dev/skills/design/ui-style-format/SKILL.md +++ b/plugins/dev/skills/design/ui-style-format/SKILL.md @@ -1,10 +1,6 @@ --- name: ui-style-format -version: 1.0.0 -description: | - UI design style file format specification with reference image support. - Defines the schema for .claude/design-style.md and .claude/design-references/. - Use when creating, validating, or parsing project design styles. +description: "Defines the schema for .claude/design-style.md and .claude/design-references/ including brand colors, typography, spacing tokens, and reference image management with validation checklists. Use when creating project design style files, validating design tokens, or configuring style-aware UI reviews." --- # UI Style Format Specification diff --git a/plugins/dev/skills/documentation-standards/SKILL.md b/plugins/dev/skills/documentation-standards/SKILL.md index ba71aa4..246ebff 100644 --- a/plugins/dev/skills/documentation-standards/SKILL.md +++ b/plugins/dev/skills/documentation-standards/SKILL.md @@ -1,18 +1,6 @@ --- name: documentation-standards -version: 1.0.0 -description: Use when writing README files, API documentation, user guides, or technical documentation following industry standards from Google, Microsoft, and GitLab style guides. -keywords: - - documentation - - README - - technical writing - - API docs - - style guides - - Markdown - - documentation best practices -plugin: dev -updated: 2026-01-20 -research_source: 73+ authoritative sources with 98% factual integrity +description: "Applies 15 ranked documentation best practices from Google, Microsoft, and GitLab style guides, provides 7 ready-to-use templates (README, TSDoc, ADR, changelog), and detects 42 anti-patterns. Use when writing READMEs, API docs, troubleshooting guides, or validating documentation quality." --- # Documentation Standards diff --git a/plugins/dev/skills/frontend/css-modules/SKILL.md b/plugins/dev/skills/frontend/css-modules/SKILL.md index 7a53195..81d6820 100644 --- a/plugins/dev/skills/frontend/css-modules/SKILL.md +++ b/plugins/dev/skills/frontend/css-modules/SKILL.md @@ -1,9 +1,6 @@ --- name: css-modules -description: | - CSS Modules with Lightning CSS and PostCSS for component-scoped styling. - Covers *.module.css patterns, TypeScript integration, Vite configuration, and composition. - Use when building complex animations, styling third-party components, or migrating legacy CSS. +description: "Configures CSS Modules with Lightning CSS and Vite for component-scoped styling, provides TypeScript type generation options, and implements composition patterns with hybrid Tailwind approaches. Use when building complex animations, styling third-party components, configuring CSS Module TypeScript support, or migrating legacy CSS." --- # CSS Modules diff --git a/plugins/dev/skills/frontend/testing-frontend/SKILL.md b/plugins/dev/skills/frontend/testing-frontend/SKILL.md index eb88310..efa58b5 100644 --- a/plugins/dev/skills/frontend/testing-frontend/SKILL.md +++ b/plugins/dev/skills/frontend/testing-frontend/SKILL.md @@ -1,18 +1,6 @@ --- name: testing-frontend -version: 1.0.0 -description: Use when writing component tests, testing user interactions, mocking APIs, or setting up Vitest/React Testing Library/Vue Test Utils for frontend applications. -keywords: - - frontend testing - - Vitest - - React Testing Library - - Vue Test Utils - - component testing - - user event testing - - mocking - - accessibility testing -plugin: dev -updated: 2026-01-20 +description: "Implements user-centric component tests with React Testing Library and Vue Test Utils, provides API mocking patterns with Vitest, and includes accessibility testing with jest-axe. Use when writing component tests, testing async data loading, mocking APIs, or validating form interactions in frontend applications." --- # Frontend Testing Patterns diff --git a/plugins/dev/skills/optimize/SKILL.md b/plugins/dev/skills/optimize/SKILL.md index 1ca715e..54afeb8 100644 --- a/plugins/dev/skills/optimize/SKILL.md +++ b/plugins/dev/skills/optimize/SKILL.md @@ -1,11 +1,6 @@ --- name: optimize -description: On-demand performance and optimization analysis. Use when identifying bottlenecks, improving build times, reducing bundle size, or optimizing code performance. Trigger keywords - "optimize", "performance", "bottleneck", "bundle size", "build time", "speed up". -version: 0.1.0 -tags: [dev, optimize, performance, bottleneck, bundle] -keywords: [optimize, performance, bottleneck, bundle-size, build-time, speed, profiling] -plugin: dev -updated: 2026-01-28 +description: "Profiles build times, analyzes bundle sizes, identifies runtime bottlenecks, and generates prioritized optimization reports with before/after metrics. Use when improving performance, reducing bundle size, optimizing database queries, or diagnosing slow API endpoints." --- # Optimize Skill diff --git a/plugins/dev/skills/planning/brainstorming/SKILL.md b/plugins/dev/skills/planning/brainstorming/SKILL.md index bd2cd09..7fc8d6c 100644 --- a/plugins/dev/skills/planning/brainstorming/SKILL.md +++ b/plugins/dev/skills/planning/brainstorming/SKILL.md @@ -1,122 +1,11 @@ --- name: brainstorming -version: 2.0.0 -description: "Collaborative ideation and planning with resilient multi-model exploration, consensus scoring, and adaptive confidence-based validation" -author: "MAG Claude Plugins" -tags: - - planning - - ideation - - collaboration - - multi-model - - resilient -dependencies: - skills: - - superpowers:using-git-worktrees - - superpowers:writing-plans - tools: - - Task - - TaskCreate - - TaskUpdate - - TaskList - - TaskGet - - Read - - Write - - Edit - - Glob - models: - primary: - - anthropic/claude-opus-4-20250514 - - anthropic/claude-sonnet-4-20250514 - - anthropic/claude-haiku-3-20250514 - explorers: - fallback_chain: - - x-ai/grok-code-fast-1 - - google/gemini-2-5-pro - - deepseek/deepseek-coder - - anthropic/claude-sonnet-4-20250514 - - anthropic/claude-haiku-3-20250514 -parameters: - exploration_models: 3 - chunk_size: 250 - confidence_threshold_auto: 95 - confidence_threshold_confirm: 60 - retry_attempts: 2 - timeout_per_model_ms: 120000 -gates: - - phase: 0 - type: USER_GATE - trigger: "Problem understanding validated" - - phase: 1 - type: AUTO_GATE - trigger: "Parallel exploration consolidated" - - phase: 2 - type: AUTO_GATE - trigger: "Consensus scores calculated" - - phase: 3 - type: USER_GATE - trigger: "User selects approach" - - phase: 4 - type: MIXED_GATE - trigger: "Section-by-section validation" - - phase: 5 - type: USER_GATE - trigger: "Final plan approval" +description: "Generates diverse solutions via parallel multi-model exploration, scores consensus across approaches, and produces validated implementation plans with confidence-based gating. Use when planning features, brainstorming architecture, or evaluating multiple design approaches before implementation." --- -# Brainstorming v2.0: Resilient Multi-Model Planning +# Brainstorming: Resilient Multi-Model Planning -Turn ideas into validated designs through collaborative AI dialogue with resilient model execution and confidence-based validation. - -## Overview - -This skill improves upon v1.0 by addressing critical reliability gaps: - -**Key v2.0 Improvements:** -- **No AskUserQuestion dependency**: Uses Task + Tasks for structured interaction -- **Fallback chains**: 3+ models per role ensures completion even if some fail -- **Explicit parallelism**: Documented Task call patterns for parallel execution -- **Defined algorithms**: Consensus matrix and confidence scoring are mathematically specified - -## When to Use - -Use this skill BEFORE implementing any feature: -- "Design a user authentication system" -- "Brainstorm approaches for API rate limiting" -- "Plan architecture for a new dashboard feature" -- "Evaluate options for real-time data synchronization" - -## Prerequisites - -### Required Setup - -```bash -# 1. Install required skills -/plugin marketplace add MadAppGang/claude-code -skill install superpowers:using-git-worktrees -skill install superpowers:writing-plans - -# 2. Verify OpenRouter access (for multi-model) -export OPENROUTER_API_KEY=your-key - -# 3. Configure models in ~/.claude/settings.json -{ - "brainstorming": { - "primary_model": "anthropic/claude-opus-4-20250514", - "explorer_models": [ - "x-ai/grok-code-fast-1", - "google/gemini-2-5-pro", - "anthropic/claude-sonnet-4-20250514" - ] - } -} -``` - -### Model Requirements - -| Role | Min Context | Capabilities | -|------|-------------|--------------| -| Primary | 200K tokens | Complex reasoning, orchestration | -| Explorer | 100K tokens | Code generation, analysis | +Turn ideas into validated implementation plans through parallel multi-model exploration with consensus scoring and confidence-based gating. ## Workflow diff --git a/plugins/dev/skills/test-coverage/SKILL.md b/plugins/dev/skills/test-coverage/SKILL.md index 7eae6f7..a8877e4 100644 --- a/plugins/dev/skills/test-coverage/SKILL.md +++ b/plugins/dev/skills/test-coverage/SKILL.md @@ -1,11 +1,6 @@ --- name: test-coverage -description: On-demand test coverage analysis. Use when identifying untested code, finding test gaps, measuring coverage metrics, or improving test quality. Trigger keywords - "test coverage", "coverage report", "untested code", "test gaps", "missing tests", "coverage metrics". -version: 0.1.0 -tags: [dev, testing, coverage, quality, gaps] -keywords: [test-coverage, coverage, gaps, untested, metrics, testing, quality] -plugin: dev -updated: 2026-01-28 +description: "Measures line, branch, and function coverage metrics, identifies untested critical code paths prioritized by risk, and generates gap analysis reports with specific test recommendations. Use when measuring coverage, finding test gaps before deployment, or identifying untested security-critical code." --- # Test Coverage Skill diff --git a/plugins/frontend/skills/dependency-check/SKILL.md b/plugins/frontend/skills/dependency-check/SKILL.md index 743ca6b..096fe74 100644 --- a/plugins/frontend/skills/dependency-check/SKILL.md +++ b/plugins/frontend/skills/dependency-check/SKILL.md @@ -1,190 +1,25 @@ --- name: dependency-check -description: Check for required dependencies (Chrome DevTools MCP, OpenRouter API) before running commands that need them. Use at the start of /implement, /review, /validate-ui commands to provide helpful setup guidance. -allowed-tools: Bash, AskUserQuestion +description: "Checks for Chrome DevTools MCP and OpenRouter API key before running frontend commands. Provides setup guidance, offers graceful degradation, and caches results per session. Use at the start of /implement, /review, or /validate-ui commands to verify required external dependencies." --- -# Dependency Check Skill +# Dependency Check -This skill provides standardized dependency checking for frontend plugin commands that require external tools and services. +Standardized dependency verification for frontend plugin commands that require external tools and services. -## When to Use This Skill +## Workflow -Claude should invoke this skill at the **start** of commands that require: +1. **Check Chrome DevTools MCP** — attempt `mcp__chrome-devtools__list_pages` to verify browser automation is available. +2. **Check OpenRouter API Key** — verify `OPENROUTER_API_KEY` environment variable is set. +3. **Check Claudish CLI** — verify `npx claudish --version` succeeds. +4. **Report status** — show which dependencies are available and which are missing. +5. **Offer options** — if dependencies are missing, present setup instructions and ask user to continue or install first. +6. **Cache results** — store dependency status in session metadata to avoid repeated checks. -1. **Chrome DevTools MCP** - For automated UI verification, screenshot capture, DOM inspection - - Commands: `/implement` (UI validation), `/validate-ui`, browser-debugger skill - -2. **OpenRouter API Key** - For multi-model orchestration with external AI models - - Commands: `/implement` (multi-model code review), `/review` - -## Dependency Check Protocol - -### Phase 1: Check Chrome DevTools MCP - -**When to check:** Before any command that needs browser automation (screenshots, UI testing, DOM inspection) - -**How to check:** - -```bash -# Check if chrome-devtools MCP tools are available -# Try to list pages - if MCP is available, this will work -mcp__chrome-devtools__list_pages 2>/dev/null -``` - -**If MCP is NOT available, show this message:** - -```markdown -## Chrome DevTools MCP Not Available - -For automated UI verification (screenshots, DOM inspection, visual regression testing), -this command requires the **chrome-devtools-mcp** server. - -### Why You Need It -- Capture implementation screenshots for design comparison -- Inspect DOM structure and computed CSS values -- Run automated UI tests in real browser -- Debug responsive layout issues -- Monitor console errors and network requests - -### Easy Installation (Recommended) - -Install `claudeup` - a CLI tool for managing Claude Code plugins and MCP servers: - -\`\`\`bash -npm install -g claudeup@latest -claudeup mcp add chrome-devtools -\`\`\` - -### Manual Installation - -Add to your `.claude.json` or `.claude/settings.json`: - -\`\`\`json -{ - "mcpServers": { - "chrome-devtools": { - "command": "npx", - "args": ["-y", "chrome-devtools-mcp@latest"] - } - } -} -\`\`\` - -### Continue Without It? - -You can continue, but: -- UI validation will be **skipped** (no design fidelity checks) -- Browser testing will be **unavailable** -- Manual verification will be required for UI changes - -Do you want to continue without Chrome DevTools MCP? -``` - -**Use AskUserQuestion:** -``` -Chrome DevTools MCP is not available. What would you like to do? - -Options: -- "Continue without UI verification" - Skip automated UI checks, proceed with implementation -- "Cancel and install MCP first" - I'll install the MCP and restart -``` - -### Phase 2: Check OpenRouter API Key - -**When to check:** Before any command that uses external AI models via Claudish - -**How to check:** - -```bash -# Check if OPENROUTER_API_KEY is set -if [[ -z "${OPENROUTER_API_KEY}" ]]; then - echo "OPENROUTER_API_KEY not set" -else - echo "OPENROUTER_API_KEY available" -fi - -# Also check if Claudish is available -npx claudish --version 2>/dev/null || echo "Claudish not installed" -``` - -**If OpenRouter API key is NOT set, show this message:** - -```markdown -## OpenRouter API Key Not Configured - -For multi-model AI orchestration (parallel code reviews, multi-expert validation), -this command uses external AI models via OpenRouter. - -### Why You Need It -- Run multiple AI models in parallel for 3-5x faster reviews -- Get diverse perspectives from different AI experts (Grok, Gemini, GPT-5, DeepSeek) -- Consensus analysis highlights issues flagged by multiple models -- Catch more bugs through AI diversity - -### Getting Started with OpenRouter - -1. **Sign up** at [https://openrouter.ai](https://openrouter.ai) -2. **Get your API key** from the dashboard -3. **Set the environment variable:** - -\`\`\`bash -# Add to your shell profile (.bashrc, .zshrc, etc.) -export OPENROUTER_API_KEY="your-api-key-here" -\`\`\` - -### Cost Information - -OpenRouter is **affordable** and even has **free models**: - -| Model | Cost | Notes | -|-------|------|-------| -| openrouter/polaris-alpha | **FREE** | Good for testing | -| x-ai/grok-code-fast-1 | ~$0.10/review | Fast coding specialist | -| google/gemini-2.5-flash | ~$0.05/review | Fast and affordable | -| deepseek/deepseek-chat | ~$0.05/review | Reasoning specialist | - -Typical code review session: **$0.20 - $0.80** for 3-4 external models - -### Easy Setup (Recommended) - -Install `claudeup` for easy API key management: - -\`\`\`bash -npm install -g claudeup@latest -claudeup config set OPENROUTER_API_KEY your-api-key -\`\`\` - -### Continue Without It? - -You can continue, but: -- Only **embedded Claude Sonnet** will be used for reviews -- No parallel multi-model validation -- Fewer diverse perspectives on code quality -- Still functional, just less comprehensive - -Do you want to continue without external AI models? -``` - -**Use AskUserQuestion:** -``` -OpenRouter API key is not configured. What would you like to do? - -Options: -- "Continue with embedded Claude only" - Use only Claude Sonnet for reviews (still good!) -- "Cancel and configure API key first" - I'll set up OpenRouter and restart -``` - -## Implementation Patterns - -### Pattern 1: Check Both Dependencies (for /implement command) +## Example: Checking Both Dependencies ```bash -# At the start of /implement command - -echo "Checking required dependencies..." - -# Check 1: Chrome DevTools MCP +# Check Chrome DevTools MCP CHROME_MCP_AVAILABLE=false if mcp__chrome-devtools__list_pages 2>/dev/null; then CHROME_MCP_AVAILABLE=true @@ -193,7 +28,7 @@ else echo "✗ Chrome DevTools MCP: Not available" fi -# Check 2: OpenRouter API Key +# Check OpenRouter API Key OPENROUTER_AVAILABLE=false if [[ -n "${OPENROUTER_API_KEY}" ]]; then OPENROUTER_AVAILABLE=true @@ -201,73 +36,24 @@ if [[ -n "${OPENROUTER_API_KEY}" ]]; then else echo "✗ OpenRouter API Key: Not configured" fi - -# Check 3: Claudish CLI (for external models) -CLAUDISH_AVAILABLE=false -if npx claudish --version 2>/dev/null; then - CLAUDISH_AVAILABLE=true - echo "✓ Claudish CLI: Available" -else - echo "✗ Claudish CLI: Not available" -fi ``` -### Pattern 2: Conditional Workflow Adaptation +**Verification:** Confirm both checks complete before proceeding with the command workflow. -Based on dependency availability, adapt the workflow: +## Graceful Degradation -```markdown -## Workflow Adaptation Based on Dependencies +| Missing Dependency | Impact | Fallback | +|--------------------|--------|----------| +| Chrome DevTools MCP | Skip UI validation, screenshots | Manual visual verification required | +| OpenRouter + Claudish | No multi-model reviews | Embedded Claude Sonnet review only | +| Both | Minimal mode | Core implementation still functional | -| Dependency | Available | Workflow Impact | -|------------|-----------|-----------------| -| Chrome DevTools MCP | ✓ | Full UI validation with screenshots | -| Chrome DevTools MCP | ✗ | Skip PHASE 2.5 (Design Fidelity Validation) | -| OpenRouter + Claudish | ✓ | Multi-model parallel code review (3-5x faster) | -| OpenRouter + Claudish | ✗ | Single-model embedded Claude review only | +Commands always complete, even with missing dependencies. -### Graceful Degradation - -Commands should ALWAYS complete, even with missing dependencies: - -1. **Missing Chrome DevTools MCP:** - - Skip: Design fidelity validation, browser testing - - Keep: Code implementation, code review, testing - - Message: "UI validation skipped - please manually verify visual changes" - -2. **Missing OpenRouter API:** - - Skip: External multi-model reviews - - Keep: Embedded Claude Sonnet review (still comprehensive!) - - Message: "Using embedded Claude Sonnet reviewer only" - -3. **Missing Both:** - - Still functional for core implementation - - Skip: UI validation, multi-model review - - Message: "Running in minimal mode - core functionality preserved" -``` - -### Pattern 3: One-Time Check with Session Cache - -Store dependency status in session metadata to avoid repeated checks: - -```bash -# In session-meta.json -{ - "dependencies": { - "chromeDevToolsMcp": true, - "openrouterApiKey": false, - "claudishCli": true, - "checkedAt": "2025-12-10T10:30:00Z" - } -} -``` - -## Quick Reference Messages - -### claudeup Installation (Copy-Paste Ready) +## Quick Setup (Copy-Paste Ready) ```bash -# Install claudeup globally +# Install claudeup for easy management npm install -g claudeup@latest # Add Chrome DevTools MCP @@ -277,46 +63,9 @@ claudeup mcp add chrome-devtools claudeup config set OPENROUTER_API_KEY your-api-key ``` -### OpenRouter Quick Start - -1. Visit: https://openrouter.ai -2. Sign up (free account) -3. Get API key from dashboard -4. Set in terminal: `export OPENROUTER_API_KEY="sk-or-..."` - -### Why Multi-Model Matters - -| Single Model | Multi-Model | -|--------------|-------------| -| 1 perspective | 4-5 perspectives | -| ~5 min review | ~5 min (parallel!) | -| May miss issues | Consensus catches more | -| Good | Better | - -## Integration Example - -Here's how to integrate this skill at the start of a command: - -```markdown -## STEP 0.5: Dependency Check (Before Session Init) - -**Check required dependencies and inform user of any limitations.** - -1. Run dependency checks using dependency-check skill patterns -2. Store results in workflow state -3. If critical dependencies missing: - - Show helpful setup instructions - - Ask user if they want to continue with reduced functionality -4. Adapt workflow based on available dependencies: - - chromeDevToolsMcp=false → Skip UI validation phases - - openrouterApiKey=false → Use embedded-only review -5. Continue to STEP 0 (Session Init) with dependency status known -``` - ## Notes -- **Non-blocking by default**: Always allow users to continue with reduced functionality -- **Clear messaging**: Explain what will be skipped and why -- **Easy setup paths**: Recommend claudeup for simplified management -- **Cost transparency**: Be clear about OpenRouter costs (affordable/free options exist) -- **One-time per session**: Cache dependency status to avoid repeated checks +- Non-blocking by default — always allow users to continue with reduced functionality +- Cache dependency status in `session-meta.json` to check once per session +- OpenRouter is affordable with free model options available +- Recommend claudeup for simplified dependency management diff --git a/plugins/instantly/skills/ab-testing-patterns/SKILL.md b/plugins/instantly/skills/ab-testing-patterns/SKILL.md index 51fac4d..cbc5eb4 100644 --- a/plugins/instantly/skills/ab-testing-patterns/SKILL.md +++ b/plugins/instantly/skills/ab-testing-patterns/SKILL.md @@ -1,18 +1,20 @@ --- name: ab-testing-patterns -version: 1.0.0 -description: A/B testing methodology for cold email optimization +description: "Design and evaluate A/B tests for cold email campaigns, including sample size calculation, statistical significance checks, and winner declaration. Use when setting up split tests or analyzing test results in Instantly." --- -plugin: instantly -updated: 2026-01-20 # A/B Testing Patterns -## Testing Fundamentals +## Workflow -### One Variable at a Time +1. **Define hypothesis** -- identify one variable to test (subject, body, CTA, timing) +2. **Calculate sample size** -- determine minimum sends per variant for target confidence +3. **Configure test** -- split leads or use sequential method in Instantly +4. **Run test** -- send for minimum 3 days to account for daily patterns +5. **Evaluate results** -- check statistical significance before declaring a winner +6. **Document learnings** -- log results for future campaign decisions -**CRITICAL:** Only change one element per test for clear attribution. +## Step 1: Isolate One Variable | Test Type | Variable | Keep Same | |-----------|----------|-----------| @@ -21,144 +23,78 @@ updated: 2026-01-20 | CTA | Call to action | Subject, body intro | | Send Time | Delivery time | All copy elements | -### Sample Size Requirements +## Step 2: Calculate Sample Size -| Confidence Level | Minimum Sample per Variant | -|------------------|----------------------------| +| Confidence Level | Minimum per Variant | +|------------------|---------------------| | 90% | 100 | | 95% (standard) | 150 | | 99% | 200 | -**Formula:** -``` -sample_size = (Z^2 * p * (1-p)) / E^2 +Formula: `sample = (Z^2 * p * (1-p)) / E^2` where Z=1.96 for 95%, p=0.5 if unknown, E=0.05. -Where: - Z = 1.96 for 95% confidence - p = expected conversion rate (use 0.5 if unknown) - E = margin of error (typically 0.05) -``` +## Step 3: Configure the Test in Instantly + +**Method A -- Split Leads (recommended):** -## Subject Line Testing +1. Export lead list +2. Randomly split into Variant A and Variant B groups +3. Create two identical campaigns with one variable different +4. Use `move_leads_to_campaign` to assign leads + +**Method B -- Sequential Testing (low volume only):** + +1. Run Control for X days, collect metrics +2. Update campaign with Variant via `update_campaign_sequence` +3. Run Variant for X days, collect metrics -### Test Categories +## Step 4: Subject Line Test Examples -| Category | Control Example | Variant Example | -|----------|-----------------|-----------------| +| Category | Control | Variant | +|----------|---------|---------| | Curiosity vs Specific | "Quick question" | "2 min about {{company}}'s pipeline" | | Personal vs Generic | "{{first_name}}, saw this" | "Your team might like this" | | Question vs Statement | "Struggling with X?" | "How we fixed X for [Company]" | | Short vs Medium | "Quick win?" | "{{first_name}}, 2 ideas for {{company}}" | -### Best Practices - -1. **Test 2-3 variants maximum** - More variants require more sample -2. **Run for minimum 3 days** - Account for daily patterns -3. **Test during stable periods** - Avoid holidays, major events -4. **Document everything** - Record hypothesis, results, learnings - -## Body Copy Testing - -### Elements to Test - -| Element | Low-Lift | High-Lift | -|---------|----------|-----------| -| Opening hook | Different pain point | Different approach entirely | -| Social proof | Different company name | No social proof | -| Value proposition | Reframe benefit | Different benefit | -| CTA | Soft vs hard ask | Different action | - -### Copy Frameworks to Test - -**PAS vs AIDA:** -- PAS: Problem-Agitate-Solution (emotional) -- AIDA: Attention-Interest-Desire-Action (logical) - -**Test Hypothesis:** PAS performs better for pain-point-heavy ICPs, AIDA for solution-seekers. - -## Timing Tests - -### Variables to Test - -| Variable | Options to Test | -|----------|-----------------| -| Day of week | Tue vs Thu (typically best) | -| Time of day | 8-10am vs 2-4pm | -| Timezone | Send in prospect's local time vs batch send | -| Sequence gaps | 2-day vs 3-day follow-up gaps | - -### Default Schedule (Starting Point) - -``` -Optimal Sending Windows: - Primary: Tuesday-Thursday, 9-11am local time - Secondary: Tuesday-Thursday, 2-4pm local time - Avoid: Monday morning, Friday afternoon -``` - -## Statistical Significance - -### Quick Significance Check +## Step 5: Evaluate Statistical Significance | Total Sample | Lift Needed for 95% Confidence | |--------------|--------------------------------| -| 200 (100 per variant) | 15%+ lift | -| 500 (250 per variant) | 10%+ lift | -| 1000 (500 per variant) | 7%+ lift | +| 200 (100/variant) | 15%+ lift | +| 500 (250/variant) | 10%+ lift | +| 1000 (500/variant) | 7%+ lift | ### Decision Framework ``` -IF lift >= 15% AND sample >= 100/variant: - Declare winner with medium confidence - -IF lift >= 10% AND sample >= 250/variant: - Declare winner with high confidence - -IF lift < 10% OR sample < 100/variant: - Continue test or call it inconclusive +IF lift >= 15% AND sample >= 100/variant → winner (medium confidence) +IF lift >= 10% AND sample >= 250/variant → winner (high confidence) +IF lift < 10% OR sample < 100/variant → inconclusive, continue test ``` -## Implementing A/B Tests in Instantly - -### Method 1: Split Leads - -1. Export lead list -2. Randomly split into Variant A and Variant B groups -3. Create two identical campaigns with one variable different -4. Use `move_leads_to_campaign` to assign leads - -### Method 2: Sequential Testing - -1. Run Control for X days, collect metrics -2. Update campaign with Variant (`update_campaign_sequence`) -3. Run Variant for X days, collect metrics -4. Compare (less rigorous, use only if lead volume is limited) - -### Tracking Results +## Step 6: Log Results ```markdown -## A/B Test Log - **Test ID**: {uuid} **Campaign**: {campaign_name} **Variable**: {what_was_tested} **Hypothesis**: {expected_outcome} -**Control**: -- Version: {control_description} -- Sample: {n} -- Open Rate: {x}% -- Reply Rate: {y}% - -**Variant**: -- Version: {variant_description} -- Sample: {n} -- Open Rate: {x}% -- Reply Rate: {y}% +| Metric | Control | Variant | Lift | +|--------|---------|---------|------| +| Open Rate | {x}% | {x}% | {z}% | +| Reply Rate | {y}% | {y}% | {z}% | **Result**: {Winner|Inconclusive} -**Lift**: {z}% **Confidence**: {confidence}% **Learning**: {what_we_learned} ``` + +## Validation Checklist + +- [ ] Only one variable changed between variants +- [ ] Sample size meets minimum for target confidence +- [ ] Test ran for at least 3 days +- [ ] No holidays or major events during test window +- [ ] Lift exceeds significance threshold before declaring winner diff --git a/plugins/instantly/skills/campaign-metrics/SKILL.md b/plugins/instantly/skills/campaign-metrics/SKILL.md index 6ad2e86..75d3e6c 100644 --- a/plugins/instantly/skills/campaign-metrics/SKILL.md +++ b/plugins/instantly/skills/campaign-metrics/SKILL.md @@ -1,76 +1,48 @@ --- name: campaign-metrics -version: 1.0.0 -description: Cold email campaign KPIs, benchmarks, and diagnostic patterns +description: "Evaluate cold email campaign performance by calculating health scores, diagnosing metric patterns, and recommending fixes. Use when reviewing campaign KPIs or benchmarking open/reply rates in Instantly." --- -plugin: instantly -updated: 2026-01-20 # Campaign Metrics -## Core KPIs +## Workflow -### Primary Metrics +1. **Collect metrics** -- gather open rate, reply rate, bounce rate, and unsubscribe rate +2. **Compare to benchmarks** -- evaluate performance tier by vertical +3. **Diagnose patterns** -- match metric combinations to known issues +4. **Calculate health score** -- compute weighted campaign score (0-100) +5. **Recommend actions** -- prescribe fixes based on diagnosis -| Metric | Formula | Benchmark (Cold Email) | -|--------|---------|------------------------| -| Open Rate | (Opened / Sent) * 100 | 40-50% (good), 25-40% (average) | -| Reply Rate | (Replied / Sent) * 100 | 5-10% (good), 2-5% (average) | -| Positive Reply Rate | (Positive / Replied) * 100 | 25-40% (good) | -| Bounce Rate | (Bounced / Sent) * 100 | <2% (healthy) | -| Unsubscribe Rate | (Unsubscribed / Sent) * 100 | <0.5% (healthy) | +## Step 1: Collect Primary Metrics -### Secondary Metrics +| Metric | Formula | Good | Average | +|--------|---------|------|---------| +| Open Rate | (Opened / Sent) * 100 | 40-50% | 25-40% | +| Reply Rate | (Replied / Sent) * 100 | 5-10% | 2-5% | +| Positive Reply Rate | (Positive / Replied) * 100 | 25-40% | -- | +| Bounce Rate | (Bounced / Sent) * 100 | <2% | 2-5% | +| Unsubscribe Rate | (Unsubscribed / Sent) * 100 | <0.5% | -- | -| Metric | Formula | Use Case | -|--------|---------|----------| -| Emails per Lead | Total Sent / Unique Leads | Sequence effectiveness | -| Reply by Step | Replies per step / Sent per step | Identify best-performing emails | -| Time to Reply | Avg time between send and reply | Timing optimization | +## Step 2: Benchmark by Vertical -## Benchmark Reference - -### Industry Benchmarks by Vertical - -| Vertical | Open Rate | Reply Rate | Notes | -|----------|-----------|------------|-------| -| SaaS | 45-55% | 5-12% | Higher engagement | -| Agency | 35-45% | 3-7% | Competitive space | -| E-commerce | 30-40% | 2-5% | Volume-focused | -| Financial Services | 25-35% | 2-4% | Compliance-heavy | +| Vertical | Open Rate | Reply Rate | +|----------|-----------|------------| +| SaaS | 45-55% | 5-12% | +| Agency | 35-45% | 3-7% | +| E-commerce | 30-40% | 2-5% | +| Financial Services | 25-35% | 2-4% | ### Performance Tiers -``` -EXCELLENT (Top 10%) - Open Rate: >50% - Reply Rate: >10% - Bounce Rate: <1% - -GOOD (Top 25%) - Open Rate: 40-50% - Reply Rate: 5-10% - Bounce Rate: 1-2% - -AVERAGE (Middle 50%) - Open Rate: 25-40% - Reply Rate: 2-5% - Bounce Rate: 2-5% - -POOR (Bottom 25%) - Open Rate: 15-25% - Reply Rate: 1-2% - Bounce Rate: 5-10% - -CRITICAL (Bottom 10%) - Open Rate: <15% - Reply Rate: <1% - Bounce Rate: >10% -``` - -## Diagnostic Patterns +| Tier | Open Rate | Reply Rate | Bounce Rate | +|------|-----------|------------|-------------| +| Excellent (Top 10%) | >50% | >10% | <1% | +| Good (Top 25%) | 40-50% | 5-10% | 1-2% | +| Average (Middle 50%) | 25-40% | 2-5% | 2-5% | +| Poor (Bottom 25%) | 15-25% | 1-2% | 5-10% | +| Critical (Bottom 10%) | <15% | <1% | >10% | -### Pattern Matrix +## Step 3: Diagnose Patterns | Open Rate | Reply Rate | Diagnosis | Action | |-----------|------------|-----------|--------| @@ -80,60 +52,45 @@ CRITICAL (Bottom 10%) | Declining | Stable | Fatigue setting in | Refresh creative | | Any | Any + High Bounce | List quality issue | Verify emails | -### Time-Based Analysis - -| Pattern | Meaning | Action | -|---------|---------|--------| -| Monday spike | Inbox cleared over weekend | Send Sun night or Mon early | -| Friday drop | Weekend mindset | Avoid Fri afternoon sends | -| Steady decline | Audience exhaustion | Rotate lists or refresh copy | -| Random spikes | External event correlation | Analyze and replicate | - -## Score Calculation - -### Campaign Health Score (0-100) +## Step 4: Calculate Health Score ``` -health_score = ( - open_score * 0.25 + - reply_score * 0.35 + - deliverability_score * 0.25 + - trend_score * 0.15 -) +health_score = open_score * 0.25 + reply_score * 0.35 + deliverability_score * 0.25 + trend_score * 0.15 ``` -**Component Calculations:** - -``` -open_score = normalize(open_rate, min=0, max=60) - 60%+ open = 100 points - 40% open = 67 points - 20% open = 33 points - 0% open = 0 points - -reply_score = normalize(reply_rate, min=0, max=15) - 15%+ reply = 100 points - 10% reply = 67 points - 5% reply = 33 points - 0% reply = 0 points - -deliverability_score = 100 - (bounce_rate * 10) - 0% bounce = 100 points - 5% bounce = 50 points - 10% bounce = 0 points - -trend_score = based on week-over-week change - +10% improvement = 100 points - Stable = 50 points - -10% decline = 0 points -``` +| Component | Calculation | +|-----------|-------------| +| open_score | normalize(open_rate, 0-60%) → 0-100 | +| reply_score | normalize(reply_rate, 0-15%) → 0-100 | +| deliverability_score | 100 - (bounce_rate * 10) | +| trend_score | WoW change: +10%=100, stable=50, -10%=0 | ### Score Interpretation -| Score | Rating | Action Required | -|-------|--------|-----------------| -| 90-100 | Excellent | Maintain, scale if possible | +| Score | Rating | Action | +|-------|--------|--------| +| 90-100 | Excellent | Maintain and scale | | 75-89 | Good | Minor optimizations | | 60-74 | Average | Address weak areas | | 40-59 | Poor | Major revision needed | | 0-39 | Critical | Pause and fix immediately | + +## Example: Diagnosing a Struggling Campaign + +``` +Campaign: Q1 SaaS Outreach +Open Rate: 22% → POOR (below 25% threshold) +Reply Rate: 1.5% → POOR +Bounce Rate: 3% → WARNING + +Diagnosis: Low opens suggest subject line issue +Action: A/B test 2-3 new subject line variants with 150+ leads per variant +Health Score: 22*0.25 + 10*0.35 + 70*0.25 + 50*0.15 = 34 → CRITICAL +``` + +## Validation Checklist + +- [ ] Metrics collected from at least 200 sends +- [ ] Bounce rate checked before diagnosing other metrics +- [ ] Performance compared against correct vertical benchmark +- [ ] Health score calculated with all four components diff --git a/plugins/instantly/skills/email-deliverability/SKILL.md b/plugins/instantly/skills/email-deliverability/SKILL.md index da42133..d82e46f 100644 --- a/plugins/instantly/skills/email-deliverability/SKILL.md +++ b/plugins/instantly/skills/email-deliverability/SKILL.md @@ -1,16 +1,20 @@ --- name: email-deliverability -version: 1.0.0 -description: Email deliverability best practices and troubleshooting +description: "Diagnose and fix email deliverability issues by auditing DNS records, sender reputation, content quality, and list hygiene. Use when bounce rates spike, emails land in spam, or warming up a new sending domain." --- -plugin: instantly -updated: 2026-01-20 # Email Deliverability -## Deliverability Fundamentals +## Workflow -### Key Metrics +1. **Check health metrics** -- assess bounce rate, spam complaints, inbox placement, sender score +2. **Audit technical setup** -- verify SPF, DKIM, and DMARC records +3. **Evaluate content quality** -- scan for spam triggers in copy and formatting +4. **Assess list quality** -- verify email addresses and remove bad contacts +5. **Troubleshoot issues** -- match symptoms to known patterns and remediate +6. **Verify fixes** -- confirm metrics return to healthy thresholds + +## Step 1: Check Health Metrics | Metric | Healthy | Warning | Critical | |--------|---------|---------|----------| @@ -19,59 +23,27 @@ updated: 2026-01-20 | Inbox Placement | >95% | 80-95% | <80% | | Sender Score | >80 | 60-80 | <60 | -### Deliverability Components - -``` -DELIVERABILITY = - Sender Reputation (40%) - + Content Quality (30%) - + Technical Setup (20%) - + List Quality (10%) -``` - -## Sender Reputation +Deliverability formula: Sender Reputation (40%) + Content Quality (30%) + Technical Setup (20%) + List Quality (10%). -### Warm-Up Schedule +## Step 2: Audit Technical Setup -| Day | Emails/Day | Total Sent | -|-----|------------|------------| -| 1-7 | 10-20 | 70-140 | -| 8-14 | 30-50 | 280-490 | -| 15-21 | 75-100 | 805-1190 | -| 22-28 | 150-200 | 1855-2590 | -| 29+ | Scale gradually | - | - -### Reputation Signals - -| Positive Signals | Negative Signals | -|------------------|------------------| -| Opens | Spam complaints | -| Replies | Hard bounces | -| Clicks | Low engagement | -| Forwards | Unsubscribes | -| Non-spam marking | Spam trap hits | - -## Content Quality +| Record | Purpose | How to Check | +|--------|---------|--------------| +| SPF | Authorize sending servers | `nslookup -type=TXT domain` | +| DKIM | Email signature verification | Check in email headers | +| DMARC | Policy for failed checks | `nslookup -type=TXT _dmarc.domain` | -### Spam Filter Triggers +Required DNS settings for Instantly: -**High-Risk Words:** ``` -FREE, GUARANTEE, WINNER, CASH, PRIZE -URGENT, ACT NOW, LIMITED TIME -Click here, Click below, Don't miss -Make money, Extra income, Work from home +SPF: v=spf1 include:_spf.instantly.ai ~all +DKIM: Configure via Instantly dashboard +DMARC: v=DMARC1; p=none; rua=mailto:dmarc@yourdomain.com ``` -**Formatting Red Flags:** -- ALL CAPS in subject or body -- Multiple exclamation marks!!! -- Colored fonts -- Excessive links (>1) -- Images (especially in cold email) -- Attachments +## Step 3: Evaluate Content Quality -### Safe Practices +**Spam trigger words to avoid:** FREE, GUARANTEE, WINNER, URGENT, ACT NOW, Click here, LIMITED TIME. | Do | Don't | |----|-------| @@ -79,103 +51,66 @@ Make money, Extra income, Work from home | Single link (if any) | Multiple CTAs | | Conversational tone | Salesy language | | Short sentences | Long paragraphs | -| Proper grammar | Typos and errors | +| Proper grammar | ALL CAPS or !!! | -## Technical Setup +## Step 4: Assess List Quality -### Required DNS Records +| Verification Level | Bounce Rate Reduction | +|--------------------|----------------------| +| Syntax check | 5-10% | +| Domain check | 10-20% | +| Mailbox check | 20-40% | +| Engagement check | 5-15% | -| Record | Purpose | Status Check | -|--------|---------|--------------| -| SPF | Authorize sending servers | `nslookup -type=TXT domain` | -| DKIM | Email signature verification | Check in email headers | -| DMARC | Policy for failed checks | `nslookup -type=TXT _dmarc.domain` | +Hygiene rules: Remove hard bounces and unsubscribes immediately. Remove soft bounces after 3 attempts. Re-verify full list every 3 months. -### Recommended Settings +## Step 5: Troubleshoot by Symptom -``` -SPF: v=spf1 include:_spf.instantly.ai ~all -DKIM: Configure via Instantly dashboard -DMARC: v=DMARC1; p=none; rua=mailto:dmarc@yourdomain.com -``` - -## List Quality - -### Email Verification - -| Verification Level | Description | Bounce Rate | -|--------------------|-------------|-------------| -| Syntax check | Valid format | Reduces 5-10% | -| Domain check | Valid domain | Reduces 10-20% | -| Mailbox check | Exists | Reduces 20-40% | -| Engagement check | Active | Reduces 5-15% | - -### List Hygiene - -| Practice | Frequency | Impact | -|----------|-----------|--------| -| Remove hard bounces | Immediately | Critical | -| Remove soft bounces | After 3 attempts | High | -| Remove unsubscribes | Immediately | Critical | -| Re-verify list | Every 3 months | Medium | - -## Troubleshooting - -### High Bounce Rate (>5%) - -**Diagnosis Steps:** -1. Check bounce types (hard vs soft) -2. Identify source (specific list segment?) -3. Verify emails before adding to campaign - -**Remediation:** +**High Bounce Rate (>5%):** 1. Pause campaign immediately -2. Remove all hard bounces -3. Re-verify remaining list -4. Resume with verified emails only - -### Low Open Rate (<15%) - -**Possible Causes:** -1. Poor sender reputation -2. Landing in spam/promotions -3. Bad subject lines -4. Wrong send time - -**Diagnosis:** -1. Check sender score -2. Send test emails to Gmail/Outlook -3. Review recent changes to sending - -### Spam Complaints (>0.1%) - -**Immediate Actions:** +2. Separate hard vs soft bounces +3. Remove all hard bounces +4. Re-verify remaining list +5. Resume with verified emails only + +**Low Open Rate (<15%):** +1. Check sender score at senderscore.org +2. Send test emails to Gmail/Outlook -- check spam folder +3. If landing in spam: start reputation recovery (see below) +4. If inbox: A/B test subject lines + +**Spam Complaints (>0.1%):** 1. Pause campaign -2. Review targeting (wrong ICP?) -3. Check email frequency -4. Review unsubscribe visibility +2. Review targeting -- wrong ICP? +3. Add clear unsubscribe link +4. Improve list sourcing and qualification -**Long-term:** -1. Improve list sourcing -2. Better qualification -3. Add clear opt-out +## Warm-Up Schedule for New Domains -## Recovery Playbook +| Day | Emails/Day | Cumulative | +|-----|------------|------------| +| 1-7 | 10-20 | ~140 | +| 8-14 | 30-50 | ~490 | +| 15-21 | 75-100 | ~1,190 | +| 22-28 | 150-200 | ~2,590 | +| 29+ | Scale gradually | -- | -### Reputation Recovery +## Reputation Recovery Playbook | Day | Action | Expected Outcome | |-----|--------|------------------| -| 1-3 | Pause all sending | Stop damage | +| 1-3 | Pause all sending | Stop further damage | | 4-7 | Remove problem addresses | Clean list | -| 8-14 | Warm up from scratch | Rebuild slowly | +| 8-14 | Warm up from scratch | Rebuild reputation | | 15-21 | Monitor metrics closely | Catch issues early | | 22+ | Gradually scale | Sustainable growth | -### Blacklist Removal +For blacklist removal: check MXToolbox, fix root cause, request removal, wait 24-72 hours, re-check. + +## Validation Checklist -1. Identify which blacklists (MXToolbox) -2. Fix underlying issue first -3. Request removal from each list -4. Wait 24-72 hours -5. Re-check and repeat if needed +- [ ] SPF, DKIM, and DMARC records all pass verification +- [ ] Bounce rate below 2% +- [ ] Spam complaint rate below 0.1% +- [ ] Test email reaches inbox (not spam) on Gmail and Outlook +- [ ] Warm-up completed before scaling volume diff --git a/plugins/instantly/skills/sequence-best-practices/SKILL.md b/plugins/instantly/skills/sequence-best-practices/SKILL.md index 9c1bffc..e775d74 100644 --- a/plugins/instantly/skills/sequence-best-practices/SKILL.md +++ b/plugins/instantly/skills/sequence-best-practices/SKILL.md @@ -1,14 +1,20 @@ --- name: sequence-best-practices -version: 1.0.0 -description: Email sequence design and optimization best practices +description: "Build and optimize cold email sequences with step-by-step timing, copy structure, personalization tokens, and CTA progression. Use when creating a new outreach sequence or improving reply rates on existing campaigns." --- -plugin: instantly -updated: 2026-01-20 # Sequence Best Practices -## Sequence Structure +## Workflow + +1. **Choose sequence length** -- select 5-step or 7-step based on lead volume +2. **Set timing** -- configure delays between steps +3. **Write copy** -- follow length and structure guidelines per step +4. **Add personalization** -- apply tokens and custom opening lines +5. **Review for spam triggers** -- remove risky words and formatting +6. **Launch and monitor** -- track reply rates by step to identify weak emails + +## Step 1: Choose Sequence Structure ### Standard 5-Step Sequence @@ -20,7 +26,7 @@ updated: 2026-01-20 | 4 | 10 | Soft breakup | "If not a fit..." | | 5 | 14 | Breakup | Last attempt, value-add | -### Extended 7-Step Sequence +### Extended 7-Step Sequence (high volume) | Step | Day | Purpose | Email Type | |------|-----|---------|------------| @@ -32,151 +38,102 @@ updated: 2026-01-20 | 6 | 16 | Soft breakup | Check-in | | 7 | 21 | Breakup | Final attempt | -## Timing Guidelines +## Step 2: Set Timing -### Optimal Delays - -| Between Emails | Recommended | Reasoning | -|----------------|-------------|-----------| -| Email 1 -> 2 | 2-3 days | Stay top of mind | -| Email 2 -> 3 | 3-4 days | Let social proof sink in | -| Email 3 -> 4 | 3-4 days | Give time to consider | -| Email 4 -> 5 | 4-5 days | Breakup needs space | +| Gap | Delay | Reasoning | +|-----|-------|-----------| +| Email 1 → 2 | 2-3 days | Stay top of mind | +| Email 2 → 3 | 3-4 days | Let social proof sink in | +| Email 3 → 4 | 3-4 days | Give time to consider | +| Email 4 → 5 | 4-5 days | Breakup needs space | | After breakup | 30+ days | Cool-off period | -### Send Time Optimization - -``` -Best Days: Tuesday, Wednesday, Thursday -Best Times: - - 8-10am recipient local time (start of day) - - 2-4pm recipient local time (post-lunch) - -Worst Days: Monday morning, Friday afternoon -Worst Times: After 6pm, before 7am -``` +Best send windows: Tue-Thu, 8-10am or 2-4pm recipient local time. Avoid Monday morning and Friday afternoon. -## Email Copy Guidelines +## Step 3: Write Copy by Step -### Length Guidelines +### Copy Length Guidelines -| Email Type | Word Count | Lines | Reason | -|------------|------------|-------|--------| -| Cold outreach | 50-100 | 5-8 | Scannable | -| Follow-up | 30-60 | 3-5 | Quick bump | -| Case study | 100-150 | 8-12 | Tell story | -| Breakup | 40-80 | 4-6 | Clear finale | +| Email Type | Words | Lines | +|------------|-------|-------| +| Cold outreach | 50-100 | 5-8 | +| Follow-up | 30-60 | 3-5 | +| Case study | 100-150 | 8-12 | +| Breakup | 40-80 | 4-6 | -### Structure Template +### Email Structure Template ``` [Personalized opening - 1 line] - [Problem statement or insight - 2-3 lines] - [Brief value proposition - 1-2 lines] - -[CTA - 1 line] - +[Single CTA - 1 line] [Signature] ``` -### Subject Line Best Practices - -| Do | Don't | -|----|-------| -| Keep 3-7 words | Write novel-length subjects | -| Use {{first_name}} or {{company}} | Over-personalize ("Saw you on LinkedIn...") | -| Create curiosity | Mislead with clickbait | -| Be specific when possible | Use generic templates | -| Test lowercase | Use ALL CAPS | - -**Examples:** -- Good: "{{first_name}}, quick question" -- Good: "idea for {{company}}'s Q2" -- Bad: "RE: Our meeting" (fake reply) -- Bad: "FREE TRIAL INSIDE!!!" (spam trigger) +### CTA Progression -### CTA Best Practices +| Step | CTA Type | Example | +|------|----------|---------| +| Early | Soft | "Thoughts?" | +| Middle | Medium | "Worth a quick chat?" | +| After engagement | Direct | "15 min this week?" | +| Final | Breakup | "Should I close the loop?" | -| CTA Type | Example | When to Use | -|----------|---------|-------------| -| Soft | "Thoughts?" | Early in sequence | -| Medium | "Worth a quick chat?" | Middle of sequence | -| Direct | "15 min this week?" | After engagement signals | -| Breakup | "Should I close the loop?" | Final email | +One CTA per email. Never give multiple options. -**One CTA Per Email:** Never give multiple options. +### Subject Line Examples -## Personalization Strategy +- Good: `{{first_name}}, quick question` (short, personal, curiosity) +- Good: `idea for {{company}}'s Q2` (specific, relevant) +- Bad: `RE: Our meeting` (deceptive fake reply) +- Bad: `FREE TRIAL INSIDE!!!` (spam trigger) -### Personalization Tokens +## Step 4: Add Personalization | Token | Example | Priority | |-------|---------|----------| -| {{first_name}} | "Hey Sarah," | Required | -| {{company}} | "saw {{company}}'s recent..." | Recommended | -| {{title}} | "As a {{title}}, you..." | If relevant | -| {{industry}} | "In {{industry}}, we see..." | For segmented | +| `{{first_name}}` | "Hey Sarah," | Required | +| `{{company}}` | "saw {{company}}'s recent..." | Recommended | +| `{{title}}` | "As a {{title}}, you..." | If relevant | +| `{{industry}}` | "In {{industry}}, we see..." | For segmented lists | -### Custom Personalization +Rule: Always personalize the opening line. The rest can be templated. -| Level | Effort | Impact | -|-------|--------|--------| -| Basic | Low | {{first_name}}, {{company}} | -| Medium | Medium | + recent company news, role-specific | -| High | High | + mutual connections, specific insight | - -**Rule:** Personalize opening line at minimum. Rest can be templated. - -## Avoiding Spam Triggers - -### Words to Avoid - -``` -FREE, GUARANTEE, WINNER, CASH, URGENT -Click here, Act now, Limited time, Don't miss -!!!!, ALL CAPS, $$$$ -``` - -### Technical Best Practices +## Step 5: Spam Trigger Checklist | Practice | Reason | |----------|--------| -| No images in cold emails | Trigger spam filters | +| No images in cold emails | Triggers spam filters | | Max 1 link per email | Multiple links = spam | | No HTML formatting | Plain text performs better | -| Short sentences | Looks more personal | | No attachments | Major spam trigger | +| Avoid: FREE, GUARANTEE, URGENT, ALL CAPS, !!!! | Known spam keywords | -## Sequence Psychology - -### Email 1: The Opening - -**Goal:** Establish relevance, create curiosity -**Approach:** Problem-focused, not product-focused -**Length:** 50-80 words +## Example: 5-Step SaaS Outreach Sequence -### Email 2: The Follow-Up - -**Goal:** Add credibility via social proof -**Approach:** "Here's what happened when..." -**Length:** 60-100 words +``` +Step 1 (Day 0): "{{first_name}}, quick question about {{company}}" + → 75 words, problem-focused opening, soft CTA "Thoughts?" -### Email 3: The Pivot +Step 2 (Day 3): "re: {{company}}" + → 60 words, case study: "We helped [Similar Co] increase X by 30%" -**Goal:** Try different angle -**Approach:** Different pain point or benefit -**Length:** 50-80 words +Step 3 (Day 7): "different approach for {{company}}" + → 70 words, alternative pain point, medium CTA -### Email 4: The Soft Breakup +Step 4 (Day 10): "not sure if this is a fit" + → 45 words, soft breakup, permission-based close -**Goal:** Create scarcity without desperation -**Approach:** "If this isn't a priority..." -**Length:** 40-60 words +Step 5 (Day 14): "closing the loop" + → 50 words, free resource attached, final CTA +``` -### Email 5: The Breakup +## Validation Checklist -**Goal:** Last attempt, leave door open -**Approach:** Value-add (resource) + clear close -**Length:** 50-80 words +- [ ] Each email has exactly one CTA +- [ ] Opening line is personalized with at least `{{first_name}}` +- [ ] No spam trigger words in subject or body +- [ ] Delays between emails follow recommended ranges +- [ ] Copy stays within word count guidelines per email type +- [ ] Reply rates tracked per step to identify underperformers diff --git a/plugins/multimodel/skills/error-recovery/SKILL.md b/plugins/multimodel/skills/error-recovery/SKILL.md index c781526..4caa908 100644 --- a/plugins/multimodel/skills/error-recovery/SKILL.md +++ b/plugins/multimodel/skills/error-recovery/SKILL.md @@ -1,1109 +1,77 @@ --- name: error-recovery -description: Handle errors, timeouts, and failures in multi-agent workflows. Use when dealing with external model timeouts, API failures, partial success, user cancellation, or graceful degradation. Trigger keywords - "error", "failure", "timeout", "retry", "fallback", "cancelled", "graceful degradation", "recovery", "partial success". -version: 0.1.0 -tags: [orchestration, error-handling, retry, fallback, timeout, recovery] -keywords: [error, failure, timeout, retry, fallback, graceful-degradation, cancellation, recovery, partial-success, resilience] -plugin: multimodel -updated: 2026-01-20 +description: "Handles errors, timeouts, and partial failures in multi-agent workflows with retry logic, fallback strategies, and graceful degradation. Implements exponential backoff, model failover, and partial-success preservation. Use when dealing with API failures, external model timeouts, user cancellation, or building resilient multi-agent pipelines." --- # Error Recovery -**Version:** 1.0.0 -**Purpose:** Patterns for handling failures in multi-agent workflows -**Status:** Production Ready +Patterns for handling failures gracefully in multi-agent workflows so temporary errors do not derail entire pipelines. -## Overview +## Workflow -Error recovery is the practice of handling failures gracefully in multi-agent workflows, ensuring that temporary errors, timeouts, or partial failures don't derail entire workflows. In production systems with external dependencies (AI models, APIs, network calls), failures are inevitable. The question is not "will it fail?" but "how will we handle it when it does?" +1. **Wrap external calls** in timeout-aware execution with configurable thresholds (default 30s). +2. **Detect failure type** — timeout, API error (401/500), network error, user cancellation, or out-of-credits. +3. **Apply recovery strategy** from the pattern table below. +4. **Preserve partial results** — if 3 of 4 agents succeeded, use those results and note the gap. +5. **Report outcome** — log the failure, recovery action, and final status. -This skill provides battle-tested patterns for: -- **Timeout handling** (external models taking >30s) -- **API failure recovery** (401, 500, network errors) -- **Partial success strategies** (some agents succeed, others fail) -- **User cancellation** (graceful Ctrl+C handling) -- **Missing tools** (claudish not installed) -- **Out of credits** (payment/quota errors) -- **Retry strategies** (exponential backoff, max retries) +## Recovery Patterns -With proper error recovery, workflows become **resilient** and **production-ready**. +| Failure Type | Detection | Recovery Strategy | +|--------------|-----------|-------------------| +| Timeout (>30s) | Elapsed time exceeds threshold | Retry once with shorter prompt, then skip with warning | +| API 401/403 | HTTP status code | Check API key, inform user, skip external model | +| API 500/502/503 | HTTP status code | Retry with exponential backoff (1s, 2s, 4s), max 3 attempts | +| Network error | Connection refused/timeout | Retry once, then fall back to embedded Claude | +| User cancellation (Ctrl+C) | Signal handler | Save partial results, report what completed | +| Out of credits | 402/quota error | Inform user, fall back to embedded Claude | +| Tool not installed | Command not found | Inform user with install instructions, continue without | -## Core Patterns - -### Pattern 1: Timeout Handling - -**Scenario: External Model Takes >30s** - -External AI models via Claudish may take >30s due to: -- Model service overloaded (high demand) -- Network latency (slow connection) -- Complex task (large input, detailed analysis) -- Model thinking time (GPT-5, Grok reasoning models) - -**Detection:** - -``` -Monitor execution time and set timeout limits: - -const TIMEOUT_THRESHOLD = 30000; // 30 seconds - -startTime = Date.now(); -executeClaudish(model, prompt); - -setInterval(() => { - elapsedTime = Date.now() - startTime; - if (elapsedTime > TIMEOUT_THRESHOLD && !modelResponded) { - handleTimeout(); - } -}, 1000); -``` - -**Recovery Strategy:** - -``` -Step 1: Detect Timeout - Log: "Timeout: x-ai/grok-code-fast-1 after 30s with no response" - -Step 2: Notify User - Present options: - "Model 'Grok' timed out after 30 seconds. - Options: - 1. Retry with 60s timeout - 2. Skip this model and continue with others - 3. Cancel entire workflow - - What would you like to do? (1/2/3)" - -Step 3a: User selects RETRY - Increase timeout to 60s - Re-execute claudish with longer timeout - If still times out: Offer skip or cancel - -Step 3b: User selects SKIP - Log: "Skipping Grok review due to timeout" - Mark this model as failed - Continue with remaining models - (Graceful degradation pattern) - -Step 3c: User selects CANCEL - Exit workflow gracefully - Save partial results (if any) - Log cancellation reason -``` - -**Graceful Degradation:** - -``` -Multi-Model Review Example: - -Requested: 5 models (Claude, Grok, Gemini, GPT-5, DeepSeek) -Timeout: Grok after 30s - -Result: - - Claude: Success ✓ - - Grok: Timeout ✗ (skipped) - - Gemini: Success ✓ - - GPT-5: Success ✓ - - DeepSeek: Success ✓ - -Successful: 4/5 models (80%) -Threshold: N ≥ 2 for consolidation ✓ - -Action: - Proceed with consolidation using 4 reviews - Notify user: "4/5 models completed (Grok timeout). Proceeding with 4-model consensus." - -Benefits: - - Workflow completes despite failure - - User gets results (4 models better than 1) - - Timeout doesn't derail entire workflow -``` - -**Example Implementation:** +## Example: Retry with Exponential Backoff ```bash -# In codex-code-reviewer agent (proxy mode) - -MODEL="x-ai/grok-code-fast-1" -TIMEOUT=30 - -# Execute with timeout -RESULT=$(timeout ${TIMEOUT}s bash -c " - printf '%s' '$PROMPT' | claudish --model $MODEL --stdin --quiet --auto-approve -" 2>&1) - -# Check exit code -if [ $? -eq 124 ]; then - # Timeout occurred (exit code 124 from timeout command) - echo "⚠️ Timeout: Model $MODEL exceeded ${TIMEOUT}s" >&2 - echo "TIMEOUT_ERROR: Model did not respond within ${TIMEOUT}s" - exit 1 -fi - -# Success - write results -echo "$RESULT" > ai-docs/grok-review.md -echo "Grok review complete. See ai-docs/grok-review.md" -``` - ---- - -### Pattern 2: API Failure Recovery - -**Common API Failure Scenarios:** - -``` -401 Unauthorized: - - Invalid API key (OPENROUTER_API_KEY incorrect) - - Expired API key - - API key not set in environment - -500 Internal Server Error: - - Model service temporarily down - - Server overload - - Model deployment issue - -Network Errors: - - Connection timeout (network slow/unstable) - - DNS resolution failure - - Firewall blocking request - -429 Too Many Requests: - - Rate limit exceeded - - Too many concurrent requests - - Quota exhausted for time window -``` - -**Recovery Strategies by Error Type:** - -**401 Unauthorized:** - -``` -Detection: - API returns 401 status code - -Recovery: - 1. Log: "API authentication failed (401)" - 2. Check if OPENROUTER_API_KEY is set: - if [ -z "$OPENROUTER_API_KEY" ]; then - notifyUser("OpenRouter API key not found. Set OPENROUTER_API_KEY in .env") - else - notifyUser("Invalid OpenRouter API key. Check .env file") - fi - 3. Skip all external models - 4. Fallback to embedded Claude only - 5. Notify user: - "⚠️ API authentication failed. Falling back to embedded Claude. - To fix: Add valid OPENROUTER_API_KEY to .env file." - -No retry (authentication won't fix itself) -``` - -**500 Internal Server Error:** - -``` -Detection: - API returns 500 status code +MAX_RETRIES=3 +RETRY_DELAY=1 -Recovery: - 1. Log: "Model service error (500): x-ai/grok-code-fast-1" - 2. Wait 5 seconds (give service time to recover) - 3. Retry ONCE - 4. If retry succeeds: Continue normally - 5. If retry fails: Skip this model, continue with others +for attempt in $(seq 1 $MAX_RETRIES); do + result=$(timeout 30 npx claudish --model "$MODEL" --stdin --quiet <<< "$PROMPT" 2>&1) + exit_code=$? -Example: - try { - result = await claudish(model, prompt); - } catch (error) { - if (error.status === 500) { - log("500 error, waiting 5s before retry..."); - await sleep(5000); - - try { - result = await claudish(model, prompt); // Retry - log("Retry succeeded"); - } catch (retryError) { - log("Retry failed, skipping model"); - skipModel(model); - continueWithRemaining(); - } - } - } - -Max retries: 1 (avoid long delays) -``` - -**Network Errors:** + if [ $exit_code -eq 0 ]; then + echo "$result" + break + fi + if [ $attempt -lt $MAX_RETRIES ]; then + sleep $RETRY_DELAY + RETRY_DELAY=$((RETRY_DELAY * 2)) + else + echo "WARNING: $MODEL failed after $MAX_RETRIES attempts, using fallback" + fi +done ``` -Detection: - - Connection timeout - - ECONNREFUSED - - ETIMEDOUT - - DNS resolution failure - -Recovery: - Retry up to 3 times with exponential backoff: - - async function retryWithBackoff(fn, maxRetries = 3) { - for (let i = 0; i < maxRetries; i++) { - try { - return await fn(); - } catch (error) { - if (!isNetworkError(error)) throw error; // Not retriable - if (i === maxRetries - 1) throw error; // Max retries reached - const delay = Math.pow(2, i) * 1000; // 1s, 2s, 4s - log(`Network error, retrying in ${delay}ms (attempt ${i+1}/${maxRetries})`); - await sleep(delay); - } - } - } +**Verification:** Confirm the retry loop exits cleanly on both success and max-retry exhaustion. - result = await retryWithBackoff(() => claudish(model, prompt)); - -Rationale: Network errors are often transient (temporary) -``` - -**429 Rate Limiting:** - -``` -Detection: - API returns 429 status code - Response may include Retry-After header +## Partial Success Strategy -Recovery: - 1. Check Retry-After header (seconds to wait) - 2. If present: Wait for specified time - 3. If not present: Wait 60s (default) - 4. Retry ONCE after waiting - 5. If still rate limited: Skip model +When running parallel agents and some fail: -Example: - if (error.status === 429) { - const retryAfter = error.headers['retry-after'] || 60; - log(`Rate limited. Waiting ${retryAfter}s before retry...`); - await sleep(retryAfter * 1000); +1. **Collect all results** — successful and failed. +2. **Check minimum threshold** — if 2+ of 4 agents succeeded, proceed with available results. +3. **Note gaps** — flag which perspectives are missing in the final output. +4. **Never block on optional agents** — only block on critical path agents. - try { - result = await claudish(model, prompt); - } catch (retryError) { - log("Still rate limited after retry. Skipping model."); - skipModel(model); - } - } +## Model Failover Chain -Note: Respect Retry-After header (avoid hammering API) ``` - -**Graceful Degradation for All API Failures:** - -``` -Fallback Strategy: - -If ALL external models fail (401, 500, network, etc.): - 1. Log all failures - 2. Notify user: - "⚠️ All external models failed. Falling back to embedded Claude. - Errors: - - Grok: Network timeout - - Gemini: 500 Internal Server Error - - GPT-5: Rate limited (429) - - DeepSeek: Authentication failed (401) - - Proceeding with Claude Sonnet (embedded) only." - - 3. Run embedded Claude review - 4. Present results with disclaimer: - "Review completed using Claude only (external models unavailable). - For multi-model consensus, try again later." - -Benefits: - - User still gets results (better than nothing) - - Workflow completes (not aborted) - - Clear error communication (user knows what happened) -``` - ---- - -### Pattern 3: Partial Success Strategies - -**Scenario: 2 of 4 Models Complete Successfully** - -In multi-model workflows, it's common for some models to succeed while others fail. - -**Tracking Success/Failure:** - -``` -const results = await Promise.allSettled([ - Task({ subagent: "reviewer", model: "claude" }), - Task({ subagent: "reviewer", model: "grok" }), - Task({ subagent: "reviewer", model: "gemini" }), - Task({ subagent: "reviewer", model: "gpt-5" }) -]); - -const successful = results.filter(r => r.status === 'fulfilled'); -const failed = results.filter(r => r.status === 'rejected'); - -log(`Success: ${successful.length}/4`); -log(`Failed: ${failed.length}/4`); -``` - -**Decision Logic:** - +Primary model → Retry same model → Alternative model → Embedded Claude +Example: grok-code-fast-1 → retry → gemini-2.5-flash → embedded Claude Sonnet ``` -If N ≥ 2 successful: - → Proceed with consolidation - → Use N reviews (not all 4) - → Notify user about failures -If N < 2 successful: - → Insufficient data for consensus - → Offer user choice: - 1. Retry failures - 2. Abort workflow - 3. Proceed with embedded Claude only - -Example: - -successful.length = 2 (Claude, Gemini) -failed.length = 2 (Grok timeout, GPT-5 500 error) - -Action: - notifyUser("2/4 models completed successfully. Proceeding with consolidation using 2 reviews."); - - consolidateReviews([ - "ai-docs/claude-review.md", - "ai-docs/gemini-review.md" - ]); - - presentResults({ - totalModels: 4, - successful: 2, - failureReasons: { - grok: "Timeout after 30s", - gpt5: "500 Internal Server Error" - } - }); -``` - -**Communication Strategy:** - -``` -Be transparent with user about partial success: - -❌ WRONG: - "Multi-model review complete!" - (User assumes all 4 models ran) - -✅ CORRECT: - "Multi-model review complete (2/4 models succeeded). - - Successful: - - Claude Sonnet ✓ - - Gemini 2.5 Flash ✓ - - Failed: - - Grok: Timeout after 30s - - GPT-5 Codex: 500 Internal Server Error - - Proceeding with 2-model consensus. - Top issues: [...]" - -User knows: - - What succeeded (Claude, Gemini) - - What failed (Grok, GPT-5) - - Why they failed (timeout, 500 error) - - What action was taken (2-model consensus) -``` - -**Consolidation Adapts to N Models:** - -``` -Consolidation logic must handle variable N: - -✅ CORRECT - Flexible N: - function consolidateReviews(reviewFiles) { - const N = reviewFiles.length; - log(`Consolidating ${N} reviews`); - - // Consensus thresholds adapt to N - const unanimousThreshold = N; // All N agree - const strongThreshold = Math.ceil(N * 0.67); // 67%+ agree - const majorityThreshold = Math.ceil(N * 0.5); // 50%+ agree - - // Apply consensus analysis with dynamic thresholds - ... - } - -❌ WRONG - Hardcoded N: - // Assumes always 4 models - const unanimousThreshold = 4; // Breaks if N = 2! -``` - ---- - -### Pattern 4: User Cancellation Handling (Ctrl+C) - -**Scenario: User Presses Ctrl+C During Workflow** - -Users may cancel long-running workflows for various reasons: -- Taking too long -- Realized they want different configuration -- Accidentally triggered workflow -- Need to prioritize other work - -**Cleanup Strategy:** - -``` -process.on('SIGINT', async () => { - log("⚠️ User cancelled workflow (Ctrl+C)"); - - // Step 1: Stop all running processes gracefully - await stopAllAgents(); - - // Step 2: Save partial results to files - const partialResults = await collectPartialResults(); - await writeFile('ai-docs/partial-review.md', partialResults); - - // Step 3: Log what was completed vs cancelled - log("Workflow cancelled"); - log("Completed:"); - log(" - PHASE 1: Requirements gathering ✓"); - log(" - PHASE 2: Architecture planning ✓"); - log("Cancelled:"); - log(" - PHASE 3: Implementation (in progress)"); - log(" - PHASE 4: Testing (not started)"); - log(" - PHASE 5: Review (not started)"); - - // Step 4: Notify user - console.log("\n⚠️ Workflow cancelled by user."); - console.log("Partial results saved to ai-docs/partial-review.md"); - console.log("Completed phases: 2/5"); - - // Step 5: Clean exit - process.exit(0); -}); -``` - -**Save Partial Results:** - -``` -Partial Results Format: - -# Workflow Cancelled by User - -**Status:** Cancelled during PHASE 3 (Implementation) -**Completed:** 2/5 phases (40%) -**Duration:** 8 minutes (of estimated 20 minutes) -**Timestamp:** 2025-11-22T14:30:00Z - -## Completed Phases - -### PHASE 1: Requirements Gathering ✓ -- User requirements documented -- See: ai-docs/requirements.md - -### PHASE 2: Architecture Planning ✓ -- Architecture plan generated -- See: ai-docs/architecture-plan.md - -## Cancelled Phases - -### PHASE 3: Implementation (IN PROGRESS) -- Status: 30% complete -- Files created: src/auth.ts (partial) -- Files pending: src/routes.ts, src/services.ts - -### PHASE 4: Testing (NOT STARTED) -- Pending: Test suite creation - -### PHASE 5: Code Review (NOT STARTED) -- Pending: Multi-model review - -## How to Resume - -To resume from PHASE 3: -1. Review partial implementation in src/auth.ts -2. Complete remaining implementation -3. Continue with PHASE 4 (Testing) - -Or restart workflow from beginning with updated requirements. -``` - -**Resumable Workflows (Advanced):** - -``` -Save workflow state for potential resume: - -// During workflow execution -await saveWorkflowState({ - currentPhase: 3, - totalPhases: 5, - completedPhases: [1, 2], - pendingPhases: [3, 4, 5], - partialResults: { - phase1: "ai-docs/requirements.md", - phase2: "ai-docs/architecture-plan.md", - phase3: "src/auth.ts (partial)" - } -}, '.claude/workflow-state.json'); - -// On next invocation -const state = await loadWorkflowState('.claude/workflow-state.json'); -if (state) { - askUser("Found incomplete workflow from previous session. Resume? (Yes/No)"); - - if (userSaysYes) { - resumeFromPhase(state.currentPhase); - } else { - deleteWorkflowState(); - startFresh(); - } -} -``` - ---- - -### Pattern 5: Claudish Not Installed - -**Scenario: User Requests Multi-Model Review but Claudish Missing** - -**Detection:** - -``` -Check if claudish CLI is installed: - -Bash: which claudish -Exit code 0: Installed ✓ -Exit code 1: Not installed ✗ - -Or: - -Bash: claudish --version -Output: "claudish version 2.2.1" → Installed ✓ -Error: "command not found" → Not installed ✗ -``` - -**Recovery Strategy:** - -``` -Step 1: Detect Missing Claudish - hasClaudish = checkCommand('which claudish'); - - if (!hasClaudish) { - log("Claudish CLI not found"); - notifyUser(); - } - -Step 2: Notify User with Installation Instructions - "⚠️ Claudish CLI not found. External AI models unavailable. - - To enable multi-model review: - 1. Install: npm install -g claudish - 2. Configure: Set OPENROUTER_API_KEY in .env - 3. Re-run this command - - For now, falling back to embedded Claude Sonnet only." - -Step 3: Fallback to Embedded Claude - log("Falling back to embedded Claude review"); - runEmbeddedReviewOnly(); - -Benefits: - - Workflow doesn't fail (graceful degradation) - - User gets results (Claude review) - - Clear instructions for enabling multi-model (future use) -``` - -**Example Implementation:** - -``` -Phase 2: Model Selection - -Bash: which claudish -if [ $? -ne 0 ]; then - # Claudish not installed - echo "⚠️ Claudish CLI not found." - echo "Install: npm install -g claudish" - echo "Falling back to embedded Claude only." - - # Skip external model selection - selectedModels=["claude-sonnet"] -else - # Claudish available - echo "Claudish CLI found ✓" - # Proceed with external model selection - selectedModels=["claude-sonnet", "grok", "gemini", "gpt-5"] -fi -``` - ---- - -### Pattern 6: Out of OpenRouter Credits - -**Scenario: External Model API Call Fails Due to Insufficient Credits** - -**Detection:** - -``` -API returns: - - 402 Payment Required (HTTP status) - - Or error message contains "credits", "quota", "billing" - -Example error messages: - - "Insufficient credits" - - "Credit balance too low" - - "Quota exceeded" - - "Payment required" -``` - -**Recovery Strategy:** - -``` -Step 1: Detect Credit Exhaustion - if (error.status === 402 || error.message.includes('credits')) { - handleCreditExhaustion(); - } - -Step 2: Log Event - log("OpenRouter credits exhausted"); - -Step 3: Notify User - "⚠️ OpenRouter credits exhausted. External models unavailable. - - To fix: - 1. Visit https://openrouter.ai - 2. Add credits to your account - 3. Re-run this command - - For now, falling back to embedded Claude Sonnet." - -Step 4: Skip All External Models - skipAllExternalModels(); - -Step 5: Fallback to Embedded Claude - runEmbeddedReviewOnly(); - -Benefits: - - Workflow completes (doesn't fail) - - User gets results (Claude review) - - Clear instructions for adding credits -``` - -**Proactive Credit Check (Advanced):** - -``` -Before expensive multi-model operation: - -Step 1: Check OpenRouter Credit Balance - Bash: curl -H "Authorization: Bearer $OPENROUTER_API_KEY" \ - https://openrouter.ai/api/v1/auth/key - - Response: { "data": { "usage": 1.23, "limit": 10.00 } } - -Step 2: Estimate Cost - estimatedCost = 0.008 // From cost estimation pattern - -Step 3: Check if Sufficient Credits - remainingCredits = 10.00 - 1.23 = 8.77 - if (estimatedCost > remainingCredits) { - warnUser("Insufficient credits ($8.77 remaining, $0.008 needed)"); - } - -Benefits: - - Warn before operation (not after failure) - - User can add credits first (avoid wasted time) -``` - ---- - -### Pattern 7: Retry Strategies - -**Exponential Backoff:** - -``` -Retry with increasing delays to avoid overwhelming services: - -Retry Schedule: - 1st retry: Wait 1 second - 2nd retry: Wait 2 seconds - 3rd retry: Wait 4 seconds - Max retries: 3 - -Formula: delay = 2^attempt × 1000ms - -async function retryWithBackoff(fn, maxRetries = 3) { - for (let attempt = 0; attempt < maxRetries; attempt++) { - try { - return await fn(); - } catch (error) { - if (!isRetriable(error)) { - throw error; // Don't retry non-retriable errors - } - - if (attempt === maxRetries - 1) { - throw error; // Max retries reached - } - - const delay = Math.pow(2, attempt) * 1000; - log(`Retry ${attempt + 1}/${maxRetries} after ${delay}ms`); - await sleep(delay); - } - } -} -``` - -**When to Retry:** - -``` -Retriable Errors (temporary, retry likely to succeed): - ✓ Network errors (ETIMEDOUT, ECONNREFUSED) - ✓ 500 Internal Server Error (service temporarily down) - ✓ 503 Service Unavailable (overloaded, retry later) - ✓ 429 Rate Limiting (wait for reset, then retry) - -Non-Retriable Errors (permanent, retry won't help): - ✗ 401 Unauthorized (bad credentials) - ✗ 403 Forbidden (insufficient permissions) - ✗ 404 Not Found (model doesn't exist) - ✗ 400 Bad Request (invalid input) - ✗ User cancellation (SIGINT) - -Function: - function isRetriable(error) { - const retriableCodes = [500, 503, 429]; - const retriableTypes = ['ETIMEDOUT', 'ECONNREFUSED', 'ENOTFOUND']; - - return ( - retriableCodes.includes(error.status) || - retriableTypes.includes(error.code) - ); - } -``` - -**Max Retry Limits:** - -``` -Set appropriate max retries by operation type: - -Network requests: 3 retries (transient failures) -API calls: 1-2 retries (avoid long delays) -User input: 0 retries (ask user to retry manually) - -Example: - result = await retryWithBackoff( - () => claudish(model, prompt), - maxRetries: 2 // 2 retries for API calls - ); -``` - ---- - -## Integration with Other Skills - -**error-recovery + multi-model-validation:** - -``` -Use Case: Handling external model failures in parallel execution - -Step 1: Parallel Execution (multi-model-validation) - Launch 5 models simultaneously - -Step 2: Error Recovery (error-recovery) - Model 1: Success ✓ - Model 2: Timeout → Skip (timeout handling pattern) - Model 3: 500 error → Retry once, then skip - Model 4: Success ✓ - Model 5: Success ✓ - -Step 3: Partial Success Strategy (error-recovery) - 3/5 successful (≥ 2 threshold) - Proceed with consolidation using 3 reviews - -Step 4: Consolidation (multi-model-validation) - Consolidate 3 successful reviews - Notify user about 2 failures -``` - -**error-recovery + quality-gates:** - -``` -Use Case: Test-driven loop with error recovery - -Step 1: Run Tests (quality-gates TDD pattern) - Bash: bun test - -Step 2: If Test Execution Fails (error-recovery) - Error type: Syntax error in test file - - Recovery: - - Fix syntax error - - Retry test execution - - If still fails: Notify user, skip TDD phase - -Step 3: If Tests Pass (quality-gates) - Proceed to code review -``` - -**error-recovery + multi-agent-coordination:** - -``` -Use Case: Agent selection with fallback - -Step 1: Agent Selection (multi-agent-coordination) - Preferred: ui-developer-codex (external validation) - -Step 2: Check Tool Availability (error-recovery) - Bash: which claudish - Result: Not found - -Step 3: Fallback Strategy (error-recovery) - Log: "Claudish not installed, falling back to embedded ui-developer" - Use: ui-developer (embedded) - -Step 4: Execution (multi-agent-coordination) - Task: ui-developer -``` - ---- - -## Best Practices - -**Do:** -- ✅ Set timeout limits (30s default, 60s for complex tasks) -- ✅ Retry transient errors (network, 500, 503) -- ✅ Use exponential backoff (avoid hammering services) -- ✅ Skip non-retriable errors (401, 404, don't retry) -- ✅ Provide graceful degradation (fallback to embedded Claude) -- ✅ Save partial results on cancellation -- ✅ Communicate transparently (tell user what failed and why) -- ✅ Adapt to partial success (N ≥ 2 reviews is useful) - -**Don't:** -- ❌ Retry indefinitely (set max retry limits) -- ❌ Retry non-retriable errors (waste time on 401, 404) -- ❌ Fail entire workflow for single model failure (graceful degradation) -- ❌ Hide errors from user (be transparent) -- ❌ Discard partial results on failure (save what succeeded) -- ❌ Ignore user cancellation (handle SIGINT gracefully) -- ❌ Retry without delay (use backoff) - -**Performance:** -- Exponential backoff: Prevents overwhelming services -- Max retries: Limits wasted time (3 retries = <10s overhead) -- Graceful degradation: Workflows complete despite failures - ---- - -## Examples - -### Example 1: Timeout with Retry - -**Scenario:** Grok model times out, user retries with longer timeout - -**Execution:** - -``` -Attempt 1: - Bash: timeout 30s claudish --model x-ai/grok-code-fast-1 ... - Result: Timeout after 30s - - Notify user: - "⚠️ Grok timed out after 30s. - Options: - 1. Retry with 60s timeout - 2. Skip Grok - 3. Cancel workflow" - - User selects: 1 (Retry) - -Attempt 2: - Bash: timeout 60s claudish --model x-ai/grok-code-fast-1 ... - Result: Success after 45s - - Log: "Grok review completed on retry (45s)" - Write: ai-docs/grok-review.md - Continue with workflow -``` - ---- - -### Example 2: Partial Success (2/4 Models) - -**Scenario:** 4 models selected, 2 fail, proceed with 2 - -**Execution:** - -``` -Launch 4 models in parallel: - Task: Claude (embedded) - Task: Grok (external) - Task: Gemini (external) - Task: GPT-5 (external) - -Results: - Claude: Success ✓ (2 min) - Grok: Timeout ✗ (30s) - Gemini: 500 error ✗ (retry failed) - GPT-5: Success ✓ (3 min) - -successful.length = 2 (Claude, GPT-5) -2 ≥ 2 ✓ (threshold met) - -Notify user: - "2/4 models completed successfully. - - Successful: - - Claude Sonnet ✓ - - GPT-5 Codex ✓ - - Failed: - - Grok: Timeout after 30s - - Gemini: 500 Internal Server Error (retry failed) - - Proceeding with 2-model consensus." - -Consolidate: - consolidateReviews([ - "ai-docs/claude-review.md", - "ai-docs/gpt5-review.md" - ]); - -Present results with 2-model consensus -``` - ---- - -### Example 3: User Cancellation - -**Scenario:** User presses Ctrl+C during PHASE 3 - -**Execution:** - -``` -Workflow starts: - PHASE 1: Requirements ✓ (30s) - PHASE 2: Architecture ✓ (2 min) - PHASE 3: Implementation (in progress, 3 min elapsed) - -User presses Ctrl+C: - Signal: SIGINT received - -Handler executes: - Log: "User cancelled workflow (Ctrl+C)" - - Stop agents: - - backend-developer (currently executing) - - Terminate gracefully - - Collect partial results: - - ai-docs/requirements.md ✓ - - ai-docs/architecture-plan.md ✓ - - src/auth.ts (30% complete) - - Save to file: - Write: ai-docs/partial-implementation.md - "# Workflow Cancelled - Completed: PHASE 1, PHASE 2 - Partial: PHASE 3 (30%) - Pending: PHASE 4, PHASE 5" - - Notify user: - "⚠️ Workflow cancelled by user. - Partial results saved to ai-docs/partial-implementation.md - Completed: 2/5 phases (40%)" - - Exit: process.exit(0) -``` - ---- - -## Troubleshooting - -**Problem: Workflow fails after single model timeout** - -Cause: No graceful degradation - -Solution: Continue with remaining models - -``` -❌ Wrong: - if (timeout) { - throw new Error("Model timed out"); - } - -✅ Correct: - if (timeout) { - log("Model timed out, skipping"); - skipModel(); - continueWithRemaining(); - } -``` - ---- - -**Problem: Retrying 401 errors indefinitely** - -Cause: Retrying non-retriable errors - -Solution: Check if error is retriable - -``` -❌ Wrong: - for (let i = 0; i < 10; i++) { - try { return await fn(); } - catch (e) { /* retry all errors */ } - } - -✅ Correct: - for (let i = 0; i < 3; i++) { - try { return await fn(); } - catch (e) { - if (!isRetriable(e)) throw e; // Don't retry 401 - await sleep(delay); - } - } -``` - ---- - -**Problem: No visibility into what failed** - -Cause: Not communicating errors to user - -Solution: Transparently report all failures - -``` -❌ Wrong: - "Review complete!" (hides 2 failures) - -✅ Correct: - "Review complete (2/4 models succeeded). - Failed: Grok (timeout), Gemini (500 error)" -``` - ---- - -## Summary - -Error recovery ensures resilient workflows through: - -- **Timeout handling** (detect, retry with longer timeout, or skip) -- **API failure recovery** (retry transient, skip permanent) -- **Partial success strategies** (N ≥ 2 threshold, adapt to failures) -- **User cancellation** (graceful Ctrl+C, save partial results) -- **Missing tools** (claudish not installed, fallback to embedded) -- **Out of credits** (402 error, fallback to free models) -- **Retry strategies** (exponential backoff, max 3 retries) - -With these patterns, workflows are **production-ready** and **resilient** to inevitable failures. - ---- +## Notes -**Extracted From:** -- `/review` command error handling (external model failures) -- `/implement` command PHASE 2.5 (test-driven loop error recovery) -- Production experience with Claudish proxy failures -- Multi-model validation resilience requirements +- Always set explicit timeouts on external model calls (30s default, 60s for Opus-tier) +- Log all failures with error type, model, and recovery action for performance tracking +- Inform users when falling back to degraded mode — never silently downgrade +- Preserve partial results even when the workflow cannot fully complete diff --git a/plugins/multimodel/skills/multi-agent-coordination/SKILL.md b/plugins/multimodel/skills/multi-agent-coordination/SKILL.md index 1e31ce6..df6ad97 100644 --- a/plugins/multimodel/skills/multi-agent-coordination/SKILL.md +++ b/plugins/multimodel/skills/multi-agent-coordination/SKILL.md @@ -1,744 +1,68 @@ --- name: multi-agent-coordination -description: Coordinate multiple agents in parallel or sequential workflows. Use when running agents simultaneously, delegating to sub-agents, switching between specialized agents, or managing agent selection. Trigger keywords - "parallel agents", "sequential workflow", "delegate", "multi-agent", "sub-agent", "agent switching", "task decomposition". -version: 0.1.0 -tags: [orchestration, multi-agent, parallel, sequential, delegation, coordination] -keywords: [parallel, sequential, delegate, sub-agent, agent-switching, multi-agent, task-decomposition, coordination] -plugin: multimodel -updated: 2026-01-20 +description: "Orchestrates multiple specialized agents in parallel or sequential workflows using Task tool delegation. Decomposes complex tasks, manages agent selection, aggregates results, and handles inter-agent dependencies. Use when running agents simultaneously, delegating to sub-agents, coordinating multi-step workflows, or decomposing tasks across specialists." --- # Multi-Agent Coordination -**Version:** 1.0.0 -**Purpose:** Patterns for coordinating multiple agents in complex workflows -**Status:** Production Ready +Patterns for orchestrating multiple specialized agents in parallel and sequential workflows via the Task tool. -## Overview +## Workflow -Multi-agent coordination is the foundation of sophisticated Claude Code workflows. This skill provides battle-tested patterns for orchestrating multiple specialized agents to accomplish complex tasks that are beyond the capabilities of a single agent. +1. **Decompose the task** into subtasks, identifying dependencies between them. +2. **Select agents** for each subtask based on role specialization. +3. **Determine execution order** — sequential for dependencies, parallel for independent work. +4. **Delegate via Task tool** with file-based context passing (not inline prompts). +5. **Aggregate results** from all agents into a unified output. -The key challenge in multi-agent systems is **dependencies**. Some tasks must execute sequentially (one agent's output feeds into another), while others can run in parallel (independent validations from different perspectives). Getting this right is the difference between a 5-minute workflow and a 15-minute one. +## Sequential vs Parallel -This skill teaches you: -- When to run agents in **parallel** vs **sequential** -- How to **select the right agent** for each task -- How to **delegate** to sub-agents without polluting context -- How to manage **context windows** across multiple agent calls +| Pattern | When to Use | Example | +|---------|-------------|---------| +| Sequential | Agent B needs Agent A's output | Plan → implement → test → review | +| Parallel | Agents are independent | 3 reviewers validating same code simultaneously | +| Fan-out/fan-in | Independent work then merge | Parallel reviews → consolidated verdict | -## Core Patterns - -### Pattern 1: Sequential vs Parallel Execution - -**When to Use Sequential:** - -Use sequential execution when there are **dependencies** between agents: -- Agent B needs Agent A's output as input -- Workflow phases must complete in order (plan → implement → test → review) -- Each agent modifies shared state (same files) - -**Example: Multi-Phase Implementation** - -``` -Phase 1: Architecture Planning - Task: api-architect - Output: ai-docs/architecture-plan.md - Wait for completion ✓ - -Phase 2: Implementation (depends on Phase 1) - Task: backend-developer - Input: Read ai-docs/architecture-plan.md - Output: src/auth.ts, src/routes.ts - Wait for completion ✓ - -Phase 3: Testing (depends on Phase 2) - Task: test-architect - Input: Read src/auth.ts, src/routes.ts - Output: tests/auth.test.ts -``` - -**When to Use Parallel:** - -Use parallel execution when agents are **independent**: -- Multiple validation perspectives (designer + tester + reviewer) -- Multiple AI models reviewing same code (Grok + Gemini + Claude) -- Multiple feature implementations in separate files - -**Example: Multi-Perspective Validation** - -``` -Single Message with Multiple Task Calls: - -Task: designer - Prompt: Validate UI against Figma design - Output: ai-docs/design-review.md ---- -Task: ui-manual-tester - Prompt: Test UI in browser for usability - Output: ai-docs/testing-report.md ---- -Task: senior-code-reviewer - Prompt: Review code quality and patterns - Output: ai-docs/code-review.md - -All three execute simultaneously (3x speedup!) -Wait for all to complete, then consolidate results. -``` - -**The 4-Message Pattern for True Parallel Execution:** - -This is **CRITICAL** for achieving true parallelism: - -``` -Message 1: Preparation (Bash Only) - - Create workspace directories - - Validate inputs - - Write context files - - NO Task calls, NO Tasks - -Message 2: Parallel Execution (Task Only) - - Launch ALL agents in SINGLE message - - ONLY Task tool calls - - Each Task is independent - - All execute simultaneously - -Message 3: Consolidation (Task Only) - - Launch consolidation agent - - Automatically triggered when N agents complete - -Message 4: Present Results - - Show user final consolidated results - - Include links to detailed reports -``` - -**Anti-Pattern: Mixing Tool Types Breaks Parallelism** - -``` -❌ WRONG - Executes Sequentially: - await TaskCreate({...}); // Tool 1 - await Task({...}); // Tool 2 - waits for Tasks - await Bash({...}); // Tool 3 - waits for Task - await Task({...}); // Tool 4 - waits for Bash - -✅ CORRECT - Executes in Parallel: - await Task({...}); // Task 1 - await Task({...}); // Task 2 - await Task({...}); // Task 3 - // All execute simultaneously -``` - -**Why Mixing Fails:** - -Claude Code sees different tool types and assumes there are dependencies between them, forcing sequential execution. Using a single tool type (all Task calls) signals that operations are independent and can run in parallel. - ---- - -### Pattern 2: Agent Selection by Task Type - -**Task Detection Logic:** - -Intelligent workflows automatically detect task type and select appropriate agents: - -``` -Task Type Detection: - -IF request mentions "API", "endpoint", "backend", "database": - → API-focused workflow - → Use: api-architect, backend-developer, test-architect - → Skip: designer, ui-developer (not relevant) - -ELSE IF request mentions "UI", "component", "design", "Figma": - → UI-focused workflow - → Use: designer, ui-developer, ui-manual-tester - → Optional: ui-developer-codex (external validation) - -ELSE IF request mentions both API and UI: - → Mixed workflow - → Use all relevant agents from both categories - → Coordinate between backend and frontend agents - -ELSE IF request mentions "test", "coverage", "bug": - → Testing-focused workflow - → Use: test-architect, ui-manual-tester - → Optional: codebase-detective (for bug investigation) - -ELSE IF request mentions "review", "validate", "feedback": - → Review-focused workflow - → Use: senior-code-reviewer, designer, ui-developer - → Optional: external model reviewers -``` - -**Agent Capability Matrix:** - -| Task Type | Primary Agent | Secondary Agent | Optional External | -|-----------|---------------|-----------------|-------------------| -| API Implementation | backend-developer | api-architect | - | -| UI Implementation | ui-developer | designer | ui-developer-codex | -| Testing | test-architect | ui-manual-tester | - | -| Code Review | senior-code-reviewer | - | codex-code-reviewer | -| Architecture Planning | api-architect OR frontend-architect | - | plan-reviewer | -| Bug Investigation | codebase-detective | test-architect | - | -| Design Validation | designer | ui-developer | designer-codex | - -**Agent Switching Pattern:** - -Some workflows benefit from **adaptive agent selection** based on context: - -``` -Example: UI Development with External Validation - -Base Implementation: - Task: ui-developer - Prompt: Implement navbar component from design - -User requests external validation: - → Switch to ui-developer-codex OR add parallel ui-developer-codex - → Run both: embedded ui-developer + external ui-developer-codex - → Consolidate feedback from both - -Scenario 1: User wants speed - → Use ONLY ui-developer (embedded, fast) - -Scenario 2: User wants highest quality - → Use BOTH ui-developer AND ui-developer-codex (parallel) - → Consensus analysis on feedback - -Scenario 3: User is out of credits - → Fallback to ui-developer only - → Notify user external validation unavailable -``` - ---- - -### Pattern 3: Sub-Agent Delegation - -**File-Based Instructions (Context Isolation):** - -When delegating to sub-agents, use **file-based instructions** to avoid context pollution: - -``` -✅ CORRECT - File-Based Delegation: - -Step 1: Write instructions to file - Write: ai-docs/architecture-instructions.md - Content: "Design authentication system with JWT tokens..." - -Step 2: Delegate to agent with file reference - Task: api-architect - Prompt: "Read instructions from ai-docs/architecture-instructions.md - and create architecture plan." - -Step 3: Agent reads file, does work, writes output - Agent reads: ai-docs/architecture-instructions.md - Agent writes: ai-docs/architecture-plan.md - -Step 4: Agent returns brief summary ONLY - Return: "Architecture plan complete. See ai-docs/architecture-plan.md" - -Step 5: Orchestrator reads output file if needed - Read: ai-docs/architecture-plan.md - (Only if orchestrator needs to process the output) -``` - -**Why File-Based?** - -- **Avoids context pollution:** Long user requirements don't bloat orchestrator context -- **Reusable:** Multiple agents can read same instruction file -- **Debuggable:** Files persist after workflow completes -- **Clean separation:** Input file, output file, orchestrator stays lightweight - -**Anti-Pattern: Inline Delegation** - -``` -❌ WRONG - Context Pollution: - -Task: api-architect - Prompt: "Design authentication system with: - - JWT tokens with refresh token rotation - - Email/password login with bcrypt hashing - - OAuth2 integration with Google, GitHub - - Rate limiting on login endpoint (5 attempts per 15 min) - - Password reset flow with time-limited tokens - - Email verification on signup - - Role-based access control (admin, user, guest) - - Session management with Redis - - Security headers (CORS, CSP, HSTS) - - ... (500 more lines of requirements)" - -Problem: Orchestrator's context now contains 500+ lines of requirements - that are only relevant to the architect agent. -``` - -**Brief Summary Returns:** - -Sub-agents should return **2-5 sentence summaries**, not full output: - -``` -✅ CORRECT - Brief Summary: - "Architecture plan complete. Designed 3-layer authentication: - JWT with refresh tokens, OAuth2 integration (Google/GitHub), - and Redis session management. See ai-docs/architecture-plan.md - for detailed component breakdown." - -❌ WRONG - Full Output: - "Architecture plan: - [500 lines of detailed architecture documentation] - Components: AuthController, TokenService, OAuthService... - [another 500 lines]" -``` - -**Proxy Mode Invocation:** - -For external AI models (Claudish), use the claudish CLI directive: - -``` -Task: codex-code-reviewer claudish CLI: x-ai/grok-code-fast-1 - Prompt: "Review authentication implementation for security issues. - Code context in ai-docs/code-review-context.md" - -Agent Behavior: - 1. Detects claudish CLI directive - 2. Extracts model: x-ai/grok-code-fast-1 - 3. Extracts task: "Review authentication implementation..." - 4. Executes: claudish --model x-ai/grok-code-fast-1 --stdin <<< "..." - 5. Waits for full response (blocking execution) - 6. Writes: ai-docs/grok-review.md (full detailed review) - 7. Returns: "Grok review complete. Found 3 CRITICAL issues. See ai-docs/grok-review.md" -``` - -**Key: Blocking Execution** - -External models MUST execute synchronously (blocking) so the agent waits for the full response: - -``` -✅ CORRECT - Blocking: - RESULT=$(claudish --model x-ai/grok-code-fast-1 --stdin <<< "$PROMPT") - echo "$RESULT" > ai-docs/grok-review.md - echo "Review complete - see ai-docs/grok-review.md" - -❌ WRONG - Background (returns before completion): - claudish --model x-ai/grok-code-fast-1 --stdin <<< "$PROMPT" & - echo "Review started..." # Agent returns immediately, review not done! -``` - ---- - -### Pattern 4: Context Window Management - -**When to Delegate:** - -Delegate to sub-agents when: -- Task is self-contained (clear input → output) -- Output is large (architecture plan, test suite, review report) -- Task requires specialized expertise (designer, tester, reviewer) -- Multiple independent tasks can run in parallel - -**When to Execute in Main Context:** - -Execute in main orchestrator when: -- Task is small (simple file edit, command execution) -- Output is brief (yes/no decision, status check) -- Task depends on orchestrator state (current phase, iteration count) -- Context pollution risk is low - -**Context Size Estimation:** - -**Note:** Token estimates below are approximations based on typical usage. Actual context consumption varies by skill complexity, Claude model version, and conversation history. Use these as guidelines, not exact measurements. - -Estimate context usage to decide delegation strategy: - -``` -Context Budget: ~200k tokens (Claude Sonnet 4.5 - actual varies by model) - -Current context usage breakdown: - - System prompt: 10k tokens - - Skill content (5 skills): 10k tokens - - Command instructions: 5k tokens - - User request: 1k tokens - - Conversation history: 20k tokens - ─────────────────────────────────── - Total used: 46k tokens - Remaining: 154k tokens - -Safe threshold for delegation: If task will consume >30k tokens, delegate - -Example: Architecture planning for large system - - Requirements: 5k tokens - - Expected output: 20k tokens - - Total: 25k tokens - ─────────────────────────────────── - Decision: Delegate (keeps orchestrator lightweight) -``` - -**Delegation Strategy by Context Size:** - -| Task Output Size | Strategy | -|------------------|----------| -| < 1k tokens | Execute in orchestrator | -| 1k - 10k tokens | Delegate with summary return | -| 10k - 30k tokens | Delegate with file-based output | -| > 30k tokens | Multi-agent decomposition | - -**Example: Multi-Agent Decomposition** - -``` -User Request: "Implement complete e-commerce system" - -This is >100k tokens if done by single agent. Decompose: - -Phase 1: Break into sub-systems - - Product catalog - - Shopping cart - - Checkout flow - - User authentication - - Order management - - Payment integration - -Phase 2: Delegate each sub-system to separate agent - Task: backend-developer - Instruction file: ai-docs/product-catalog-requirements.md - Output file: ai-docs/product-catalog-implementation.md - - Task: backend-developer - Instruction file: ai-docs/shopping-cart-requirements.md - Output file: ai-docs/shopping-cart-implementation.md - - ... (6 parallel agent invocations) - -Phase 3: Integration agent - Task: backend-developer - Instruction: "Integrate 6 sub-systems. Read output files: - ai-docs/*-implementation.md" - Output: ai-docs/integration-plan.md - -Total context per agent: ~20k tokens (manageable) -vs. Single agent: 120k+ tokens (context overflow risk) -``` - ---- - -## Integration with Other Skills - -**multi-agent-coordination + multi-model-validation:** - -``` -Use Case: Code review with multiple AI models - -Step 1: Agent Selection (multi-agent-coordination) - - Detect task type: Code review - - Select agents: senior-code-reviewer (embedded) + external models - -Step 2: Parallel Execution (multi-model-validation) - - Follow 4-Message Pattern - - Launch all reviewers simultaneously - - Wait for all to complete - -Step 3: Consolidation (multi-model-validation) - - Auto-consolidate reviews - - Apply consensus analysis -``` - -**multi-agent-coordination + quality-gates:** - -``` -Use Case: Iterative UI validation - -Step 1: Agent Selection (multi-agent-coordination) - - Detect task type: UI validation - - Select agents: designer, ui-developer - -Step 2: Iteration Loop (quality-gates) - - Run designer validation - - If not PASS: delegate to ui-developer for fixes - - Loop until PASS or max iterations - -Step 3: User Validation Gate (quality-gates) - - MANDATORY user approval - - Collect feedback if issues found -``` - -**multi-agent-coordination + task-orchestration:** - -``` -Use Case: Multi-phase implementation workflow - -Step 1: Initialize Tasks (task-orchestration) - - Create task list for all phases - -Step 2: Sequential Agent Delegation (multi-agent-coordination) - - Phase 1: api-architect - - Phase 2: backend-developer (depends on Phase 1) - - Phase 3: test-architect (depends on Phase 2) - - TaskUpdate after each phase -``` - ---- - -## Best Practices - -**Do:** -- ✅ Use parallel execution for independent tasks (3-5x speedup) -- ✅ Use sequential execution when there are dependencies -- ✅ Use file-based instructions to avoid context pollution -- ✅ Return brief summaries (2-5 sentences) from sub-agents -- ✅ Select agents based on task type (API/UI/Testing/Review) -- ✅ Decompose large tasks into multiple sub-agent calls -- ✅ Estimate context usage before delegating - -**Don't:** -- ❌ Mix tool types in parallel execution (breaks parallelism) -- ❌ Inline long instructions in Task prompts (context pollution) -- ❌ Return full output from sub-agents (use files instead) -- ❌ Use parallel execution for dependent tasks (wrong results) -- ❌ Use single agent for >100k token tasks (context overflow) -- ❌ Forget to wait for all parallel tasks before consolidating - -**Performance Tips:** -- Parallel execution: 3-5x faster than sequential (5min vs 15min) -- File-based delegation: Saves 50-80% context usage -- Agent switching: Adapt to user preferences (speed vs quality) -- Context decomposition: Enables tasks that would otherwise overflow - ---- - -## Examples - -### Example 1: Parallel Multi-Model Code Review - -**Scenario:** User requests "Review my authentication code with Grok and Gemini" - -**Agent Selection:** -- Task type: Code review -- Agents: senior-code-reviewer (embedded), external Grok, external Gemini - -**Execution:** - -``` -Message 1: Preparation - - Write code context to ai-docs/code-review-context.md - -Message 2: Parallel Execution (3 Task calls in single message) - Task: senior-code-reviewer - Prompt: "Review ai-docs/code-review-context.md for security issues" - --- - Task: codex-code-reviewer claudish CLI: x-ai/grok-code-fast-1 - Prompt: "Review ai-docs/code-review-context.md for security issues" - --- - Task: codex-code-reviewer claudish CLI: google/gemini-2.5-flash - Prompt: "Review ai-docs/code-review-context.md for security issues" - - All 3 execute simultaneously (3x faster than sequential) - -Message 3: Auto-Consolidation - Task: senior-code-reviewer - Prompt: "Consolidate 3 reviews from: - - ai-docs/claude-review.md - - ai-docs/grok-review.md - - ai-docs/gemini-review.md - Prioritize by consensus." - -Message 4: Present Results - "Review complete. 3 models analyzed your code. - Top 5 issues by consensus: - 1. [UNANIMOUS] Missing input validation on login endpoint - 2. [STRONG] SQL injection risk in user query - 3. [MAJORITY] Weak password requirements - See ai-docs/consolidated-review.md for details." -``` - -**Result:** 5 minutes total (vs 15+ if sequential), consensus-based prioritization - ---- - -### Example 2: Sequential Multi-Phase Implementation - -**Scenario:** User requests "Implement payment integration feature" - -**Agent Selection:** -- Task type: API implementation -- Agents: api-architect → backend-developer → test-architect → senior-code-reviewer - -**Execution:** +## Example: Sequential Pipeline ``` -Phase 1: Architecture Planning - Write: ai-docs/payment-requirements.md - "Integrate Stripe payment processing with webhook support..." - - Task: api-architect - Prompt: "Read ai-docs/payment-requirements.md - Create architecture plan" - Output: ai-docs/payment-architecture.md - Return: "Architecture plan complete. Designed 3-layer payment system." - - Wait for completion ✓ - -Phase 2: Implementation (depends on Phase 1) - Task: backend-developer - Prompt: "Read ai-docs/payment-architecture.md - Implement payment integration" - Output: src/payment.ts, src/webhooks.ts - Return: "Payment integration implemented. 2 new files, 500 lines." - - Wait for completion ✓ - -Phase 3: Testing (depends on Phase 2) - Task: test-architect - Prompt: "Write tests for src/payment.ts and src/webhooks.ts" - Output: tests/payment.test.ts, tests/webhooks.test.ts - Return: "Test suite complete. 20 tests covering payment flows." - - Wait for completion ✓ - -Phase 4: Code Review (depends on Phase 3) - Task: senior-code-reviewer - Prompt: "Review payment integration implementation" - Output: ai-docs/payment-review.md - Return: "Review complete. 2 MEDIUM issues found." - - Wait for completion ✓ -``` - -**Result:** Sequential execution ensures each phase has correct inputs - ---- - -### Example 3: Adaptive Agent Switching - -**Scenario:** User requests "Validate navbar implementation" with optional external AI - -**Agent Selection:** -- Task type: UI validation -- Base agent: designer -- Optional: designer-codex (if user wants external validation) - -**Execution:** - +Phase 1: api-architect → outputs architecture-plan.md +Phase 2: backend-developer → reads plan, outputs src/auth.ts (depends on Phase 1) +Phase 3: test-architect → reads src/auth.ts, outputs tests/ (depends on Phase 2) ``` -Step 1: Ask user preference - "Do you want external AI validation? (Yes/No)" - -Step 2a: If user says NO (speed mode) - Task: designer - Prompt: "Validate navbar against Figma design" - Output: ai-docs/design-review.md - Return: "Design validation complete. PASS with 2 minor suggestions." - -Step 2b: If user says YES (quality mode) - Message 1: Parallel Validation - Task: designer - Prompt: "Validate navbar against Figma design" - --- - Task: designer claudish CLI: design-review-codex - Prompt: "Validate navbar against Figma design" - Message 2: Consolidate - Task: designer - Prompt: "Consolidate 2 design reviews. Prioritize by consensus." - Output: ai-docs/design-review-consolidated.md - Return: "Consolidated review complete. Both agree on 1 CRITICAL issue." +## Example: Parallel Validation -Step 3: User validation - Present consolidated review to user for approval +```typescript +// Launch 3 reviewers in parallel via Task tool +Task({ agent: "senior-code-reviewer", description: "Review auth implementation" }) +Task({ agent: "test-architect", description: "Review test coverage" }) +Task({ agent: "security-reviewer", description: "Audit auth security" }) +// Aggregate: collect all 3 results, consolidate findings ``` -**Result:** Adaptive workflow based on user preference (speed vs quality) - ---- - -## Troubleshooting - -**Problem: Parallel tasks executing sequentially** +**Verification:** Confirm all parallel tasks completed before aggregating results. -Cause: Mixed tool types in same message - -Solution: Use 4-Message Pattern with ONLY Task calls in Message 2 - -``` -❌ Wrong: - await TaskCreate({...}); - await Task({...}); - await Task({...}); - -✅ Correct: - Message 1: await Bash({...}); (prep only) - Message 2: await Task({...}); await Task({...}); (parallel) -``` - ---- +## Agent Selection Guide -**Problem: Orchestrator context overflowing** +| Task Type | Recommended Agent | Reasoning | +|-----------|-------------------|-----------| +| Architecture design | api-architect / frontend-architect | System-level decisions | +| Feature implementation | backend-developer / typescript-frontend-dev | Code writing | +| Testing | test-architect | Coverage analysis | +| Code review | senior-code-reviewer | Quality validation | +| UI work | ui-developer / designer | Visual implementation | +| Debugging | debugger agent | Root cause analysis | -Cause: Inline instructions or full output returns +## Context Passing -Solution: Use file-based delegation + brief summaries +- **File-based:** Agent A writes to `ai-docs/plan.md`, Agent B reads it. Best for large outputs. +- **Prompt-based:** Pass key findings inline in the Task description. Best for small context. +- **Never share full context windows** between agents — extract only what the next agent needs. -``` -❌ Wrong: - Task: agent - Prompt: "[1000 lines of inline requirements]" - Return: "[500 lines of full output]" - -✅ Correct: - Write: ai-docs/requirements.md - Task: agent - Prompt: "Read ai-docs/requirements.md" - Return: "Complete. See ai-docs/output.md" -``` - ---- - -**Problem: Wrong agent selected for task** - -Cause: Task type detection failed - -Solution: Explicitly detect task type using keywords - -``` -Check user request for keywords: - - API/endpoint/backend → api-architect, backend-developer - - UI/component/design → designer, ui-developer - - test/coverage → test-architect - - review/validate → senior-code-reviewer - -Default: Ask user to clarify task type -``` - ---- - -**Problem: Agent returns immediately before external model completes** - -Cause: Background execution (non-blocking claudish call) - -Solution: Use synchronous (blocking) execution - -``` -❌ Wrong: - claudish --model grok ... & (background, returns immediately) - -✅ Correct: - RESULT=$(claudish --model grok ...) (blocks until complete) -``` - ---- - -## Summary - -Multi-agent coordination is about choosing the right execution strategy: - -- **Parallel** when tasks are independent (3-5x speedup) -- **Sequential** when tasks have dependencies (correct results) -- **File-based delegation** to avoid context pollution (50-80% savings) -- **Brief summaries** from sub-agents (clean orchestrator context) -- **Task type detection** for intelligent agent selection -- **Context decomposition** for large tasks (avoid overflow) - -Master these patterns and you can orchestrate workflows of any complexity. - ---- +## Notes -**Extracted From:** -- `/implement` command (task detection, sequential workflows) -- `/validate-ui` command (adaptive agent switching) -- `/review` command (parallel execution, 4-Message Pattern) -- `CLAUDE.md` Parallel Multi-Model Execution Protocol +- Keep agent context windows clean — delegate to sub-agents instead of doing everything inline +- Use session-isolation for file artifacts to prevent collisions across concurrent workflows +- Set explicit timeouts and use error-recovery patterns for external model agents +- Fan-out/fan-in is the most common pattern: split work, run in parallel, merge results diff --git a/plugins/multimodel/skills/performance-tracking/SKILL.md b/plugins/multimodel/skills/performance-tracking/SKILL.md index 4bf9a26..7604cea 100644 --- a/plugins/multimodel/skills/performance-tracking/SKILL.md +++ b/plugins/multimodel/skills/performance-tracking/SKILL.md @@ -1,1856 +1,78 @@ --- name: performance-tracking -description: Track agent, skill, and model performance metrics for optimization. Use when measuring agent success rates, tracking model latency, analyzing routing effectiveness, or optimizing cost-per-task. Trigger keywords - "performance", "metrics", "tracking", "success rate", "agent performance", "model latency", "cost tracking", "optimization", "routing metrics". -version: 0.1.0 -tags: [orchestration, performance, metrics, tracking, agent-performance, optimization] -keywords: [performance, metrics, tracking, agent, skill, success-rate, latency, cost, optimization, history, routing] -plugin: multimodel -updated: 2026-01-28 +description: "Tracks agent success rates, model latency, cost-per-task, and routing effectiveness across multi-agent workflows. Logs metrics to JSON, generates trend reports, and alerts on performance degradation. Use when measuring agent performance, optimizing routing, analyzing cost efficiency, or identifying failing patterns." --- # Performance Tracking -**Version:** 1.0.0 -**Purpose:** Track agent, skill, and model performance metrics for continuous optimization -**Status:** Production Ready +Logs agent, skill, and model metrics to enable data-driven routing optimization and cost reduction. -## Overview +## Workflow -Performance tracking transforms workflows from "fire and forget" to **data-driven optimization systems**. By measuring what actually works, you can route tasks more effectively, identify failing patterns early, and reduce costs. +1. **Log metrics after each task** — record agent, model, tokens, cost, duration, and success/failure. +2. **Aggregate periodically** — generate daily/weekly reports with success rates, cost breakdowns, and tier distribution. +3. **Detect degradation** — alert when agent success rate drops below 70% or model latency exceeds thresholds. +4. **Optimize routing** — use historical data to adjust tier thresholds and agent selection. -This skill provides battle-tested patterns for: -- **Agent success tracking** (completion rates, confidence scores, task type affinity) -- **Skill effectiveness** (activation counts, success correlation, usage patterns) -- **Model performance** (latency, cost, quality, provider comparison) -- **Routing optimization** (tier distribution, routing accuracy, cost efficiency) -- **Historical analysis** (trend detection, degradation alerts, pattern discovery) - -Performance tracking enables **continuous improvement** by providing the data needed to make informed decisions about agent selection, model choice, and workflow routing. - -### Why Track Performance - -**Optimize Routing:** -- Identify which agents excel at specific task types -- Route complex tasks to high-confidence agents -- Avoid agents with low success rates for critical work - -**Identify Failing Agents:** -- Detect agents with <70% success rate -- Alert when agent performance degrades -- Replace or retrain underperforming agents - -**Reduce Costs:** -- Find cost-effective model alternatives -- Identify expensive agents with low success rates -- Optimize tier thresholds based on actual performance - -**Improve Quality:** -- Track correlation between confidence scores and success -- Identify patterns in successful implementations -- Learn which models produce best results for task types - -### What We Track - -**Agent Metrics:** -- Total runs, success/failure counts -- Average confidence scores -- Task type distribution -- Last used timestamp -- Individual execution history - -**Skill Metrics:** -- Activation counts per skill -- Last activation timestamps -- Success correlation (when skill active, what's success rate?) -- Co-activation patterns - -**Model Metrics:** -- Total runs, success/failure counts -- Average latency (response time) -- Total cost (cumulative spend) -- Cost per successful task -- Last used timestamp - -**Routing Metrics:** -- Tier distribution (how often each tier selected) -- Routing accuracy (did tier match complexity?) -- Cost efficiency (tier1 vs tier4 cost ratio) -- Decision history with outcomes - -### Integration with task-complexity-router - -The performance tracker provides critical feedback to the task-complexity-router: - -``` -Routing Feedback Loop: - -1. Router selects tier based on complexity - → task-complexity-router analyzes task - → Routes to tier2 (medium complexity) - -2. Agent executes task - → Records: tier=2, agent=ui-developer, result=success - -3. Performance tracker updates metrics - → tier2 usage +1 - → ui-developer success +1 - → Confidence in tier2 routing increases - -4. Future routing decisions informed by history - → Router sees tier2 has 85% success rate - → Router sees ui-developer excels at UI tasks - → Router confidently routes similar tasks to tier2 -``` - ---- - -## Metrics Schema - -### JSON Structure (Version 1.0.0) - -Store performance metrics in `.claude/agent-performance.json`: +## Metric Schema ```json { - "schemaVersion": "1.0.0", - "lastUpdated": "2026-01-28T15:30:00Z", - "agents": { - "ui-developer": { - "totalRuns": 42, - "successCount": 38, - "failureCount": 4, - "avgConfidence": 0.85, - "lastUsed": "2026-01-28T15:30:00Z", - "taskTypes": { - "implement-component": 15, - "fix-styling": 12, - "refactor-ui": 8, - "review-code": 7 - }, - "history": [ - { - "timestamp": "2026-01-28T15:30:00Z", - "taskType": "implement-component", - "result": "success", - "confidence": 0.90, - "duration": 45000, - "tier": 2, - "model": "claude-sonnet-4-5-20250929" - }, - { - "timestamp": "2026-01-28T14:20:00Z", - "taskType": "fix-styling", - "result": "success", - "confidence": 0.85, - "duration": 30000, - "tier": 1, - "model": "claude-sonnet-4-5-20250929" - } - ] - }, - "backend-developer": { - "totalRuns": 28, - "successCount": 25, - "failureCount": 3, - "avgConfidence": 0.88, - "lastUsed": "2026-01-28T14:00:00Z", - "taskTypes": { - "implement-api": 12, - "fix-bug": 8, - "database-migration": 5, - "write-tests": 3 - }, - "history": [] - } - }, - "skills": { - "multi-model-validation": { - "activations": 15, - "lastActivated": "2026-01-28T15:00:00Z", - "successCorrelation": 0.92, - "coActivations": { - "quality-gates": 12, - "error-recovery": 8 - } - }, - "task-complexity-router": { - "activations": 68, - "lastActivated": "2026-01-28T15:30:00Z", - "successCorrelation": 0.85, - "coActivations": { - "multi-agent-coordination": 45, - "hierarchical-coordinator": 30 - } - } - }, - "models": { - "claude-sonnet-4-5-20250929": { - "totalRuns": 120, - "successCount": 108, - "failureCount": 12, - "avgLatency": 2500, - "totalCost": 0.45, - "lastUsed": "2026-01-28T15:30:00Z", - "taskTypePerformance": { - "code-review": { "success": 25, "failure": 2 }, - "implementation": { "success": 40, "failure": 5 }, - "testing": { "success": 20, "failure": 3 } - } - }, - "x-ai/grok-code-fast-1": { - "totalRuns": 35, - "successCount": 30, - "failureCount": 5, - "avgLatency": 1800, - "totalCost": 0.08, - "lastUsed": "2026-01-28T13:00:00Z", - "taskTypePerformance": { - "code-review": { "success": 18, "failure": 2 }, - "implementation": { "success": 12, "failure": 3 } - } - } - }, - "routing": { - "tierDistribution": { - "tier1": 45, - "tier2": 30, - "tier3": 15, - "tier4": 8 - }, - "decisions": [ - { - "timestamp": "2026-01-28T15:30:00Z", - "taskType": "implement-component", - "complexity": "medium", - "selectedTier": 2, - "agent": "ui-developer", - "result": "success", - "cost": 0.003 - }, - { - "timestamp": "2026-01-28T14:20:00Z", - "taskType": "fix-styling", - "complexity": "low", - "selectedTier": 1, - "agent": "ui-developer", - "result": "success", - "cost": 0.001 - } - ] - } + "task": "Implement profile page", + "agent": "typescript-frontend-dev", + "tier": 2, + "model": "claude-sonnet-4-5", + "tokens_in": 1200, + "tokens_out": 2800, + "cost": 0.0456, + "duration_seconds": 8, + "success": true, + "timestamp": "2026-01-28T10:30:00Z" } ``` -### Schema Field Definitions - -**Agent Metrics:** -- `totalRuns`: Total task executions -- `successCount`: Tasks completed successfully -- `failureCount`: Tasks that failed or required retry -- `avgConfidence`: Rolling average of agent confidence scores (0.0-1.0) -- `lastUsed`: ISO-8601 timestamp of last execution -- `taskTypes`: Distribution of task types (understand agent specialization) -- `history`: Array of recent executions (max 100 entries, FIFO) - -**Skill Metrics:** -- `activations`: Total times skill was triggered -- `lastActivated`: ISO-8601 timestamp -- `successCorrelation`: Success rate when this skill is active (0.0-1.0) -- `coActivations`: Skills frequently activated together (detect patterns) - -**Model Metrics:** -- `totalRuns`: Total executions -- `successCount`/`failureCount`: Outcome tracking -- `avgLatency`: Average response time in milliseconds -- `totalCost`: Cumulative spend in USD -- `lastUsed`: ISO-8601 timestamp -- `taskTypePerformance`: Success/failure breakdown by task type - -**Routing Metrics:** -- `tierDistribution`: Count of tasks routed to each tier -- `decisions`: Array of routing decisions with outcomes (max 100, FIFO) - ---- - -## Tracking Patterns - -### Pattern 1: Capturing Agent Performance - -**After Agent Completes Task:** - -``` -Execution Flow: - -1. Agent executes task - Task: ui-developer - Input: "Implement login form component" - Result: Success - Confidence: 0.90 - Duration: 45 seconds - Tier: 2 - Model: claude-sonnet-4-5-20250929 - -2. Update agent metrics - Read: .claude/agent-performance.json - Update: - agents["ui-developer"].totalRuns += 1 - agents["ui-developer"].successCount += 1 - agents["ui-developer"].avgConfidence = rolling_avg(0.90) - agents["ui-developer"].lastUsed = NOW - agents["ui-developer"].taskTypes["implement-component"] += 1 - agents["ui-developer"].history.push({ - timestamp: NOW, - taskType: "implement-component", - result: "success", - confidence: 0.90, - duration: 45000, - tier: 2, - model: "claude-sonnet-4-5-20250929" - }) - Trim history if > 100 entries - Write: .claude/agent-performance.json - -3. Calculate derived metrics - Success rate: successCount / totalRuns = 38/42 = 90.5% - Avg duration: sum(history.duration) / history.length - Task affinity: taskTypes sorted by count -``` - -**After Agent Fails:** - -``` -Failure Flow: - -1. Agent fails task - Task: backend-developer - Input: "Implement complex payment flow" - Result: Failure (error, timeout, or low quality) - Confidence: 0.65 - Tier: 3 - -2. Update failure metrics - agents["backend-developer"].totalRuns += 1 - agents["backend-developer"].failureCount += 1 - agents["backend-developer"].avgConfidence = rolling_avg(0.65) - agents["backend-developer"].history.push({ - timestamp: NOW, - taskType: "implement-api", - result: "failure", - confidence: 0.65, - duration: 120000, - tier: 3, - error: "Exceeded max iterations" - }) - -3. Check for degradation - If failureCount / totalRuns > 0.30: - Alert: "backend-developer has >30% failure rate" - Recommendation: "Review recent failures, retrain, or replace" -``` - -### Pattern 2: Tracking Model Performance - -**After Model Execution:** - -``` -Execution Flow: - -1. Model completes task - Model: x-ai/grok-code-fast-1 - Task: Code review - Latency: 1800ms - Cost: $0.002 - Result: Success - -2. Update model metrics - models["x-ai/grok-code-fast-1"].totalRuns += 1 - models["x-ai/grok-code-fast-1"].successCount += 1 - models["x-ai/grok-code-fast-1"].avgLatency = rolling_avg(1800) - models["x-ai/grok-code-fast-1"].totalCost += 0.002 - models["x-ai/grok-code-fast-1"].lastUsed = NOW - models["x-ai/grok-code-fast-1"].taskTypePerformance["code-review"].success += 1 - -3. Compare model performance - Claude Sonnet: avgLatency=2500ms, cost=$0.45 (120 runs) - Grok Fast: avgLatency=1800ms, cost=$0.08 (35 runs) - - Analysis: - - Grok is 28% faster (1800ms vs 2500ms) - - Grok is 82% cheaper per run ($0.0023 vs $0.0038) - - Both have similar success rates (86% vs 90%) - - Recommendation: - - Use Grok for cost-sensitive tasks - - Use Claude for critical tasks (higher success rate) -``` - -### Pattern 3: Recording Skill Activation - -**After Skill Activation:** - -``` -Activation Flow: - -1. Skill triggers - Skill: multi-model-validation - Context: User requested /review with 3 models - -2. Update skill metrics - skills["multi-model-validation"].activations += 1 - skills["multi-model-validation"].lastActivated = NOW - -3. Track co-activation - Active skills: ["multi-model-validation", "quality-gates"] - skills["multi-model-validation"].coActivations["quality-gates"] += 1 - -4. Calculate success correlation - Tasks with this skill active: 15 - Successful tasks: 14 - Success correlation: 14/15 = 93.3% - -5. Pattern detection - Observation: multi-model-validation + quality-gates = 100% success (12/12) - Recommendation: Always pair these skills for high-quality reviews -``` - -### Pattern 4: Routing Decision Tracking - -**After Routing Decision:** - -``` -Routing Flow: - -1. Router selects tier - Task: "Implement user profile page" - Analysis: Medium complexity (multiple components, state management) - Selected tier: 2 - Agent: ui-developer - Model: claude-sonnet-4-5-20250929 - -2. Record routing decision - routing.tierDistribution["tier2"] += 1 - routing.decisions.push({ - timestamp: NOW, - taskType: "implement-component", - complexity: "medium", - selectedTier: 2, - agent: "ui-developer", - result: "pending" - }) - -3. After task completes - Update decision with result: - routing.decisions[last].result = "success" - routing.decisions[last].cost = 0.003 - -4. Trim decision history if > 100 entries -``` - -### Pattern 5: Session-Level Aggregation - -**End of Session Summary:** - -``` -Session Summary Flow: - -1. Aggregate session metrics - Session ID: 2026-01-28-session-15 - Duration: 2 hours - Tasks executed: 15 - Success rate: 14/15 = 93.3% - Total cost: $0.045 - Models used: Claude (12), Grok (3) - -2. Create session snapshot - File: ai-docs/performance-history/2026-01-28-session-15.json - Content: - { - "sessionId": "2026-01-28-session-15", - "startTime": "2026-01-28T13:00:00Z", - "endTime": "2026-01-28T15:00:00Z", - "duration": 7200000, - "tasks": 15, - "successRate": 0.933, - "totalCost": 0.045, - "modelUsage": { "claude": 12, "grok": 3 }, - "topAgents": ["ui-developer", "backend-developer"], - "activeSkills": ["task-complexity-router", "multi-model-validation"] - } - -3. Update rolling metrics - .claude/agent-performance.json (persistent) - ai-docs/performance-history/ (snapshots) - -4. Cleanup old snapshots - Keep last 100 session snapshots - Delete older entries -``` - ---- - -## File Location and Management - -### Primary Performance File - -**Location:** `.claude/agent-performance.json` - -**Purpose:** Persistent, project-level performance tracking - -**When to Update:** -- After every agent execution -- After every model execution -- After every skill activation -- After every routing decision - -**Format:** JSON schema version 1.0.0 (see Metrics Schema section) - -**Rotation:** Keep full history, but trim individual history arrays to 100 entries - -### Session Snapshots - -**Location:** `ai-docs/performance-history/` - -**Purpose:** Point-in-time session summaries for historical analysis - -**Naming:** `{YYYY-MM-DD}-session-{N}.json` - -**Example:** -``` -ai-docs/performance-history/ - 2026-01-28-session-1.json - 2026-01-28-session-2.json - 2026-01-27-session-1.json - ... -``` - -**Retention:** Keep last 100 sessions, delete older - -### Integration with Existing Files - -**Relationship with ai-docs/llm-performance.json:** - -``` -Comparison: - -llm-performance.json (existing): - - Model-specific performance - - Cost tracking per model - - Response time tracking - - Used by multi-model-validation - -agent-performance.json (new): - - Agent-level metrics (multi-run aggregation) - - Skill activation tracking - - Routing decision history - - Task type affinity - -Integration: - - agent-performance.json imports model data from llm-performance.json - - Both files updated in parallel - - llm-performance.json focuses on single-run details - - agent-performance.json focuses on aggregate trends -``` - -**Migration Path:** - -``` -Step 1: Create .claude/agent-performance.json with schema 1.0.0 -Step 2: Import historical data from llm-performance.json -Step 3: Update both files going forward -Step 4: Deprecate llm-performance.json after 6 months (optional) -``` - -### Data Cleanup and Rotation - -**Automatic Cleanup:** - -``` -Cleanup Rules: - -1. Agent history arrays - Max entries: 100 - Strategy: FIFO (oldest removed first) - Trigger: After every agent execution - -2. Routing decision arrays - Max entries: 100 - Strategy: FIFO - Trigger: After every routing decision - -3. Session snapshots - Max files: 100 - Strategy: FIFO (delete oldest session files) - Trigger: After every session ends - -4. Skill co-activation maps - Max entries per skill: 50 - Strategy: Keep top 50 by count - Trigger: Weekly cleanup -``` - -**Manual Cleanup:** - -``` -When to manually reset: - -1. After major workflow changes - - Agent capabilities changed - - New skills added - - Routing logic updated - → Reset metrics to start fresh - -2. After agent retraining - - Agent prompt updated - - Agent model changed - → Reset agent-specific metrics - -3. After prolonged period (>6 months) - - Metrics may be outdated - → Archive old data, start fresh - -How to reset: - Backup: cp .claude/agent-performance.json .claude/agent-performance-backup-{DATE}.json - Reset: echo '{"schemaVersion":"1.0.0","lastUpdated":"...","agents":{},...}' > .claude/agent-performance.json -``` - ---- - -## Using Metrics for Optimization - -### Optimization 1: Identify Underperforming Agents +**Verification:** Confirm the JSON entry is written to the metrics log after each task completes. -**Detection:** - -``` -Analyze agent success rates: +## What to Track -agents["ui-developer"]: - successCount: 38 - totalRuns: 42 - success rate: 38/42 = 90.5% ✅ GOOD +| Category | Metrics | Purpose | +|----------|---------|---------| +| Agent | Success rate, avg duration, task type affinity | Route tasks to best-fit agents | +| Model | Latency, cost, quality score | Select optimal model per tier | +| Routing | Tier distribution, escalation count | Tune scoring thresholds | +| Cost | Daily total, cost per task, savings vs baseline | Budget tracking | -agents["test-architect"]: - successCount: 15 - totalRuns: 25 - success rate: 15/25 = 60% ❌ UNDERPERFORMING +## Example: Daily Report -Threshold: <70% success rate = underperforming ``` +Daily Cost Report: + Tier 0 (Native): 45 tasks, $0.00 + Tier 1 (Haiku): 30 tasks, $0.12 + Tier 2 (Sonnet): 20 tasks, $0.60 + Tier 3 (Opus): 5 tasks, $0.75 + Total: 100 tasks, $1.47 -**Action:** - -``` -For test-architect (60% success): - -1. Analyze failure patterns - Review history entries where result="failure" - Common failure reasons: - - "Tests too brittle" (8 occurrences) - - "Missing test coverage" (5 occurrences) - - "Test timeout" (2 occurrences) - -2. Identify root cause - Pattern: test-architect struggles with async/timing tests - Evidence: All timeout failures involved async code - -3. Take action - Option A: Retrain agent - - Update prompt with async testing best practices - - Add examples of proper async test patterns - - Reset metrics after retraining - - Option B: Route differently - - Route async test tasks to backend-developer (90% success on async) - - Keep test-architect for synchronous unit tests - - Option C: Replace agent - - Create new specialized-async-test-architect - - Deprecate test-architect for async work -``` - -### Optimization 2: Find Cost-Effective Model Alternatives - -**Analysis:** - -``` -Compare model cost-effectiveness: - -Model: claude-sonnet-4-5-20250929 - Total cost: $0.45 - Total runs: 120 - Success count: 108 - Cost per task: $0.0038 - Cost per success: $0.0042 - Success rate: 90% - -Model: x-ai/grok-code-fast-1 - Total cost: $0.08 - Total runs: 35 - Success count: 30 - Cost per task: $0.0023 - Cost per success: $0.0027 - Success rate: 86% - -Model: google/gemini-2.5-flash - Total cost: $0.02 - Total runs: 20 - Success count: 16 - Cost per task: $0.0010 - Cost per success: $0.0013 - Success rate: 80% - -Cost Efficiency Ranking: - 1. Gemini Flash: $0.0013 per success (80% success rate) - 2. Grok Fast: $0.0027 per success (86% success rate) - 3. Claude Sonnet: $0.0042 per success (90% success rate) - -Quality-Cost Tradeoff: - - Gemini: 69% cheaper than Claude, but 10% lower success rate - - Grok: 36% cheaper than Claude, but 4% lower success rate -``` - -**Action:** - -``` -Optimization strategy: - -Tier 1 (Simple tasks): - Use: Gemini Flash - Reason: Lowest cost, acceptable success rate for simple work - Example: "Fix typo in comment", "Format code" - -Tier 2 (Medium tasks): - Use: Grok Fast - Reason: Good balance of cost and quality - Example: "Implement CRUD endpoint", "Add validation" - -Tier 3 (Complex tasks): - Use: Claude Sonnet - Reason: Highest success rate justifies cost - Example: "Design architecture", "Complex refactoring" - -Tier 4 (Critical tasks): - Use: Claude Sonnet + Multi-model validation - Reason: Quality > cost for critical work - Example: "Security review", "Production bug fix" - -Expected savings: - Current: 90% Claude usage × $0.0042 = $0.00378 avg per task - Optimized: 20% Claude + 50% Grok + 30% Gemini = $0.00257 avg per task - Savings: 32% cost reduction with minimal quality impact -``` - -### Optimization 3: Optimize Routing Tier Thresholds - -**Analysis:** - -``` -Review tier distribution: - -routing.tierDistribution: - tier1: 45 tasks (45.9%) - tier2: 30 tasks (30.6%) - tier3: 15 tasks (15.3%) - tier4: 8 tasks (8.2%) - -Analyze tier accuracy: - -Tier 1 (Simple): - Tasks: 45 - Success: 42 - Failures: 3 - Success rate: 93.3% ✅ - Verdict: Well-calibrated - -Tier 2 (Medium): - Tasks: 30 - Success: 25 - Failures: 5 - Success rate: 83.3% ⚠️ - Verdict: Slightly low (target 90%) - -Tier 3 (Complex): - Tasks: 15 - Success: 12 - Failures: 3 - Success rate: 80.0% ⚠️ - Verdict: Too low (target 90%) - -Tier 4 (Critical): - Tasks: 8 - Success: 8 - Failures: 0 - Success rate: 100% ✅ - Verdict: Well-calibrated -``` - -**Action:** - -``` -Adjust tier thresholds: - -Current thresholds (task-complexity-router): - tier1: complexity score 0-3 - tier2: complexity score 4-6 - tier3: complexity score 7-9 - tier4: complexity score 10+ - -Problem: tier2 and tier3 have lower success rates -Root cause: Tasks slightly too complex for assigned tier - -Optimized thresholds: - tier1: complexity score 0-2 (narrower range) - tier2: complexity score 3-5 (shift down) - tier3: complexity score 6-8 (shift down) - tier4: complexity score 9+ (broader range) - -Rationale: - - Shift more borderline tasks to higher tiers - - Accept slightly higher cost for better success rates - - tier2/tier3 success should improve to 90%+ +Agent Success Rates: + typescript-frontend-dev: 94% (47/50) + backend-developer: 88% (22/25) + test-writer: 96% (24/25) -Expected impact: - - tier1 usage: 45 → 35 tasks (fewer simple tasks) - - tier2 usage: 30 → 32 tasks (more medium tasks) - - tier3 usage: 15 → 18 tasks (more complex tasks) - - tier4 usage: 8 → 13 tasks (more critical tasks) - - Overall success rate: 88% → 92% - - Overall cost: +15% (acceptable tradeoff for quality) +Alerts: + backend-developer success rate dropped 8% this week ``` -### Optimization 4: Detect Model-Task Affinity Patterns - -**Analysis:** - -``` -Analyze task type performance by model: - -Task type: code-review - -Claude Sonnet: - Success: 25, Failure: 2 - Success rate: 92.6% ✅ - -Grok Fast: - Success: 18, Failure: 2 - Success rate: 90.0% ✅ - -Gemini Flash: - Success: 10, Failure: 4 - Success rate: 71.4% ⚠️ - -→ Pattern: Claude and Grok excel at code review, Gemini struggles - -Task type: implementation - -Claude Sonnet: - Success: 40, Failure: 5 - Success rate: 88.9% ✅ - -Grok Fast: - Success: 12, Failure: 3 - Success rate: 80.0% ⚠️ - -Gemini Flash: - Success: 6, Failure: 1 - Success rate: 85.7% ✅ - -→ Pattern: Claude best for implementation, Grok/Gemini acceptable - -Task type: testing - -Claude Sonnet: - Success: 20, Failure: 3 - Success rate: 87.0% ✅ - -Grok Fast: - Success: 0, Failure: 0 - Success rate: N/A - -Gemini Flash: - Success: 0, Failure: 0 - Success rate: N/A - -→ Pattern: Only Claude has testing data (others not used for this) -``` - -**Action:** - -``` -Task-specific model routing: - -code-review tasks: - tier1: Grok Fast (90% success, low cost) - tier2: Grok Fast (90% success, low cost) - tier3: Claude Sonnet (93% success, high quality) - tier4: Multi-model (Claude + Grok consensus) - -implementation tasks: - tier1: Gemini Flash (86% success, lowest cost) - tier2: Grok Fast (80% success, medium cost) - tier3: Claude Sonnet (89% success, highest quality) - tier4: Claude Sonnet (89% success, proven) - -testing tasks: - tier1-4: Claude Sonnet (only model with proven testing capability) - -Expected impact: - - 25% cost savings on code reviews (use Grok instead of Claude) - - 10% cost savings on implementation (use Gemini for simple) - - Maintain quality (route by proven success rates) -``` - -### Optimization 5: Alert on Performance Degradation - -**Detection:** - -``` -Monitor for degradation: - -Week 1 (baseline): - ui-developer success rate: 90.5% - Average task duration: 45s - -Week 2: - ui-developer success rate: 88.2% (↓2.3%) - Average task duration: 48s (↑3s) - -Week 3: - ui-developer success rate: 85.1% (↓5.4% from baseline) - Average task duration: 52s (↑7s from baseline) - -Week 4: - ui-developer success rate: 78.3% (↓12.2% from baseline) 🚨 - Average task duration: 58s (↑13s from baseline) 🚨 - -Threshold exceeded: - ❌ Success rate dropped >10% (78.3% vs 90.5%) - ❌ Duration increased >20% (58s vs 45s) - -→ ALERT: ui-developer performance degraded significantly -``` - -**Action:** - -``` -Degradation response: - -1. Investigate root cause - Review recent history: - - Task complexity increased? (Check taskTypes distribution) - - Model changed? (Check model field in history) - - Failures clustered around specific task type? - - Finding: All recent failures on "complex-state-management" tasks - Root cause: New task type introduced, agent not trained for it - -2. Take corrective action - Option A: Retrain agent - - Update prompt with state management patterns - - Add examples of successful state management - - Reset metrics after retraining - - Option B: Route differently - - Route state management tasks to specialized agent - - Keep ui-developer for simpler UI tasks - - Option C: Escalate to human - - Alert: "ui-developer performance degraded" - - Request: "Manual review of recent failures needed" - -3. Monitor recovery - Week 5 (after retraining): - Success rate: 85.0% (recovering) - Week 6: - Success rate: 89.2% (near baseline) - Week 7: - Success rate: 91.0% (recovered ✅) -``` - ---- - -## Integration with Orchestration Plugin - -### Integration 1: multi-model-validation - -**How multi-model-validation records model performance:** - -``` -Multi-Model Review Flow: - -1. Execute parallel review - Models: [claude-sonnet, grok-fast, gemini-flash] - Task: Code review of auth.ts - -2. Collect model responses - Each model returns: - - Review findings - - Confidence score - - Latency - - Cost - -3. Record individual model performance - For each model: - models[modelId].totalRuns += 1 - models[modelId].avgLatency = rolling_avg(latency) - models[modelId].totalCost += cost - -4. Determine success/failure - If review found critical issues → success (doing its job) - If review crashed/errored → failure - -5. Update success counts - models[modelId].successCount += 1 (or failureCount) - models[modelId].taskTypePerformance["code-review"].success += 1 +## Degradation Thresholds -6. Consolidate findings - Generate consensus report - Track which models agreed (co-occurrence patterns) +| Metric | Warning | Critical | +|--------|---------|----------| +| Agent success rate | < 80% | < 70% | +| Model latency | > 2x baseline | > 5x baseline | +| Escalation rate | > 20% of tasks | > 40% of tasks | +| Daily cost | > 150% of average | > 200% of average | -7. User feedback (optional) - User rates review quality: "Helpful" | "Not helpful" - Update successCorrelation for multi-model-validation skill -``` - -### Integration 2: task-complexity-router - -**How task-complexity-router reads performance data:** - -``` -Routing Decision Flow: - -1. Analyze task complexity - Input: "Implement user authentication with OAuth" - Analysis: Complex (multiple components, external API, security) - Base tier: 3 - -2. Read performance history - Load: .claude/agent-performance.json - Check: routing.tierDistribution - -3. Adjust tier based on history - tier3 historical success rate: 80% (below 90% target) - tier4 historical success rate: 100% - - Decision: Bump to tier4 for higher success probability - -4. Select agent based on task type affinity - Task type: "implement-api" - Candidates: backend-developer, full-stack-developer - - Check affinity: - backend-developer.taskTypes["implement-api"]: 12 (high affinity) - full-stack-developer.taskTypes["implement-api"]: 3 (low affinity) - - Decision: Select backend-developer (proven track record) - -5. Select model based on tier + task type - tier4 + implement-api: - models[claude].taskTypePerformance["implementation"]: 89% success - models[grok].taskTypePerformance["implementation"]: 80% success - - Decision: Select Claude (higher success rate for tier4) - -6. Record routing decision - routing.decisions.push({ - timestamp: NOW, - taskType: "implement-api", - complexity: "complex", - selectedTier: 4, - agent: "backend-developer", - model: "claude-sonnet-4-5-20250929", - result: "pending" - }) - -7. After execution, update result - routing.decisions[last].result = "success" - routing.decisions[last].cost = 0.005 -``` - -### Integration 3: hierarchical-coordinator - -**How hierarchical-coordinator tracks phase success:** - -``` -Phase Execution Tracking: - -1. Execute workflow phases - Phase 1: Planning (architect agent) - Phase 2: Implementation (developer agent) - Phase 3: Testing (tester agent) - Phase 4: Review (reviewer agent) - -2. Track phase-level metrics - Create phase-specific tracking: - - agents["architect"].phasePerformance = { - "planning": { success: 15, failure: 2 }, - "architecture": { success: 8, failure: 1 } - } - - agents["developer"].phasePerformance = { - "implementation": { success: 25, failure: 5 }, - "refactoring": { success: 10, failure: 2 } - } - -3. Detect phase-specific issues - Analysis: developer has 20% failure rate on implementation phase - But: developer has 83% success rate overall - - Insight: Failures concentrated in specific phase - -4. Optimize phase assignment - Current: developer handles all implementation - Optimized: Split by complexity - - Simple implementation → junior-developer (cheaper) - - Complex implementation → senior-developer (higher success) - -5. Track coordinator effectiveness - skills["hierarchical-coordinator"].activations += 1 - skills["hierarchical-coordinator"].successCorrelation = 0.92 - - Insight: Workflows using coordinator have 92% success (vs 80% without) -``` - -### Integration 4: quality-gates - -**How quality-gates uses performance thresholds:** - -``` -Quality Gate Decision: - -1. Agent completes task - Agent: ui-developer - Task: "Implement dashboard component" - Confidence: 0.75 - -2. Check agent performance history - Load: agents["ui-developer"] - Historical avg confidence: 0.85 - Current confidence: 0.75 (below average 🚨) - -3. Apply quality gate - Threshold: If confidence < avg - 0.10, trigger validation - - Decision: 0.75 < 0.75 (borderline) - Action: Trigger designer validation (extra quality check) - -4. Designer validates - Result: Found 3 minor issues - Verdict: Quality gate prevented low-quality work from proceeding - -5. Update metrics - Without gate: ui-developer would have 1 more failure - With gate: Issues caught early, fixed before user sees - - skills["quality-gates"].successCorrelation += 1 - (Success correlation increases when gate prevents failures) - -6. Continuous improvement - Pattern: Low-confidence tasks benefit from extra validation - Threshold: Automatically adjust based on correlation data - Future: If confidence < 0.80, always trigger validation -``` - ---- - -## Best Practices - -### Do - -- ✅ **Track all agent executions** (success and failure provide learning signal) -- ✅ **Record model latency and cost** (optimize for cost-effectiveness) -- ✅ **Maintain execution history** (detect patterns and trends) -- ✅ **Set success rate thresholds** (<70% = investigate, <50% = replace) -- ✅ **Alert on performance degradation** (>10% drop from baseline) -- ✅ **Use task type affinity** (route tasks to agents with proven success) -- ✅ **Compare model cost-effectiveness** (cost per success, not just cost per task) -- ✅ **Track skill co-activation** (identify successful skill combinations) -- ✅ **Rotate history data** (keep last 100 entries, prevent unbounded growth) -- ✅ **Create session snapshots** (point-in-time analysis) -- ✅ **Integrate with routing** (feed performance data back to router) - -### Don't - -- ❌ **Track only successes** (failures provide valuable learning signal) -- ❌ **Ignore degradation** (small drops compound into big problems) -- ❌ **Use stale data** (>6 months old metrics may not reflect current state) -- ❌ **Over-optimize on cost alone** (balance cost and quality) -- ❌ **Forget to update metrics** (incomplete data leads to poor decisions) -- ❌ **Store unbounded history** (trim arrays to prevent file bloat) -- ❌ **Mix session metrics** (isolate session data for cleaner analysis) -- ❌ **Ignore task type affinity** (agents specialize, use it) -- ❌ **Skip validation after major changes** (reset metrics when workflows change) - -### Privacy Considerations - -**What to Track:** -- Aggregate metrics (counts, averages, distributions) -- Task types (generic categories like "implement-component") -- Success/failure outcomes -- Model performance data -- Timing and cost data - -**What NOT to Track:** -- User-specific data (usernames, emails) -- Sensitive code snippets -- API keys or credentials -- Personal information -- Business logic details - -**Data Retention:** -- Keep aggregate metrics indefinitely (no PII) -- Rotate detailed history after 100 entries -- Delete session snapshots after 100 sessions -- Archive old metrics before major resets - -### When to Reset Metrics - -**Situations Requiring Reset:** - -1. **Agent capabilities changed** - - Prompt updated significantly - - Agent model changed - - Agent skills added/removed - → Reset agent-specific metrics - -2. **Workflow architecture changed** - - New routing logic - - New tier definitions - - New skill combinations - → Reset routing and skill metrics - -3. **Model pricing changed** - - Cost per token updated - - New pricing tier - → Reset cost calculations (keep counts) - -4. **After prolonged period (>6 months)** - - Metrics may be outdated - - Workflow patterns changed - → Archive and reset all metrics - -**How to Reset:** - -```bash -# Backup current metrics -cp .claude/agent-performance.json .claude/agent-performance-backup-$(date +%Y%m%d).json - -# Reset to empty state -cat > .claude/agent-performance.json << 'EOF' -{ - "schemaVersion": "1.0.0", - "lastUpdated": "2026-01-28T16:00:00Z", - "agents": {}, - "skills": {}, - "models": {}, - "routing": { - "tierDistribution": {}, - "decisions": [] - } -} -EOF - -# Archive old session snapshots -mkdir -p ai-docs/performance-history/archive-$(date +%Y%m%d) -mv ai-docs/performance-history/*.json ai-docs/performance-history/archive-$(date +%Y%m%d)/ -``` - -### Metric Hygiene - -**Regular Maintenance:** - -``` -Weekly: - - Review top agents (ensure success rates >70%) - - Check model cost trends (identify cost spikes) - - Trim co-activation maps (keep top 50 per skill) - -Monthly: - - Analyze task type affinity changes - - Compare model cost-effectiveness - - Review tier distribution accuracy - - Archive old session snapshots (keep last 100) - -Quarterly: - - Deep analysis of performance trends - - Optimize routing thresholds - - Identify underperforming patterns - - Consider agent retraining or replacement - -Annually: - - Full metrics review and reset (if needed) - - Archive all historical data - - Update baseline success rates - - Document lessons learned -``` - ---- - -## Examples - -### Example 1: Tracking a Multi-Model Review Session - -**Scenario:** User requests `/review` with 3 models (Claude, Grok, Gemini) - -**Execution:** - -``` -Step 1: Initialize session - Session ID: 2026-01-28-session-15 - Start time: 15:00:00Z - -Step 2: Execute multi-model review - Models: [claude-sonnet, grok-fast, gemini-flash] - Task: Code review of auth/login.ts (450 lines) - -Step 3: Track individual model executions - - Model: claude-sonnet-4-5-20250929 - Start: 15:00:05Z - End: 15:00:08Z - Latency: 3000ms - Cost: $0.003 - Result: Found 5 issues (2 CRITICAL, 3 HIGH) - Outcome: Success - - Update metrics: - models["claude-sonnet-4-5-20250929"].totalRuns = 121 - models["claude-sonnet-4-5-20250929"].successCount = 109 - models["claude-sonnet-4-5-20250929"].avgLatency = 2520ms - models["claude-sonnet-4-5-20250929"].totalCost = $0.453 - models["claude-sonnet-4-5-20250929"].taskTypePerformance["code-review"].success = 26 - - Model: x-ai/grok-code-fast-1 - Start: 15:00:05Z - End: 15:00:07Z - Latency: 2000ms - Cost: $0.002 - Result: Found 4 issues (2 CRITICAL, 2 HIGH) - Outcome: Success - - Update metrics: - models["x-ai/grok-code-fast-1"].totalRuns = 36 - models["x-ai/grok-code-fast-1"].successCount = 31 - models["x-ai/grok-code-fast-1"].avgLatency = 1820ms - models["x-ai/grok-code-fast-1"].totalCost = $0.082 - models["x-ai/grok-code-fast-1"].taskTypePerformance["code-review"].success = 19 - - Model: google/gemini-2.5-flash - Start: 15:00:05Z - End: 15:00:06Z - Latency: 1500ms - Cost: $0.001 - Result: Found 3 issues (1 CRITICAL, 2 MEDIUM) - Outcome: Success - - Update metrics: - models["google/gemini-2.5-flash"].totalRuns = 21 - models["google/gemini-2.5-flash"].successCount = 17 - models["google/gemini-2.5-flash"].avgLatency = 1480ms - models["google/gemini-2.5-flash"].totalCost = $0.021 - models["google/gemini-2.5-flash"].taskTypePerformance["code-review"].success = 11 - -Step 4: Track skill activation - skills["multi-model-validation"].activations = 16 - skills["multi-model-validation"].lastActivated = 15:00:08Z - skills["multi-model-validation"].coActivations["quality-gates"] = 13 - -Step 5: Consolidate findings - Consensus issues (all 3 models agreed): - - CRITICAL: SQL injection vulnerability (UNANIMOUS) - - CRITICAL: Missing authentication check (UNANIMOUS) - - Majority issues (2/3 models agreed): - - HIGH: Insufficient input validation (Claude, Grok) - - HIGH: Missing error handling (Claude, Grok) - - Divergent issues (1/3 models): - - MEDIUM: Code duplication (Gemini only) - -Step 6: Record session summary - Session complete: - Duration: 8 seconds - Models used: 3 - Total cost: $0.006 - Issues found: 5 (2 unanimous, 2 majority, 1 divergent) - Result: Success - - Create snapshot: - ai-docs/performance-history/2026-01-28-session-15.json - -Step 7: Update aggregate metrics - Overall session success rate: 3/3 models successful = 100% - Cost efficiency: $0.002 per model = good value -``` - -**Insights from Tracking:** - -``` -Performance comparison: - Fastest: Gemini (1500ms) - 50% faster than Claude - Most thorough: Claude (5 issues) - Found 1 extra issue - Best value: Gemini ($0.001, 3 issues) - Lowest cost, good coverage - -Cost analysis: - Total: $0.006 for 3-model review - vs Single Claude: $0.003 (double cost, but 2x validation) - ROI: Found 2 CRITICAL issues all models agreed on = high confidence - -Consensus validation: - UNANIMOUS issues (100% confidence) → Fix immediately - MAJORITY issues (67% confidence) → Fix before merge - DIVERGENT issues (33% confidence) → Low priority (possible false positive) - -Recommendation: - Multi-model validation worth the cost for critical code (auth, payments, security) - Single-model sufficient for non-critical code (UI components, docs) -``` - ---- - -### Example 2: Identifying Model Performance Differences - -**Scenario:** After 100 tasks, compare model performance for optimization - -**Execution:** - -``` -Step 1: Load performance data - Read: .claude/agent-performance.json - -Step 2: Extract model metrics - - Claude Sonnet: - Total runs: 120 - Success: 108, Failure: 12 - Success rate: 90.0% - Avg latency: 2500ms - Total cost: $0.45 - Cost per task: $0.00375 - Cost per success: $0.00417 - - Grok Fast: - Total runs: 35 - Success: 30, Failure: 5 - Success rate: 85.7% - Avg latency: 1800ms - Total cost: $0.08 - Cost per task: $0.00229 - Cost per success: $0.00267 - - Gemini Flash: - Total runs: 20 - Success: 16, Failure: 4 - Success rate: 80.0% - Avg latency: 1500ms - Total cost: $0.02 - Cost per task: $0.00100 - Cost per success: $0.00125 - -Step 3: Analyze task type performance - - Code Review: - Claude: 25 success, 2 failure = 92.6% - Grok: 18 success, 2 failure = 90.0% - Gemini: 10 success, 4 failure = 71.4% - - Winner: Claude (highest quality) - Best value: Grok (90% at lower cost) - - Implementation: - Claude: 40 success, 5 failure = 88.9% - Grok: 12 success, 3 failure = 80.0% - Gemini: 6 success, 1 failure = 85.7% - - Winner: Claude (highest quality) - Surprising: Gemini performs well here (86% success) - - Testing: - Claude: 20 success, 3 failure = 87.0% - Grok: No data - Gemini: No data - - Winner: Claude (only option) - Action: Try Grok/Gemini for testing tasks to gather data - -Step 4: Calculate cost-effectiveness by task type - - Code Review Cost-Effectiveness: - Claude: $0.00417 per success, 92.6% quality - Grok: $0.00267 per success, 90.0% quality (36% cheaper, -2.6% quality) - Gemini: $0.00125 per success, 71.4% quality (70% cheaper, -21.2% quality) - - Recommendation: Use Grok for cost-effective reviews (minimal quality loss) - - Implementation Cost-Effectiveness: - Claude: $0.00417 per success, 88.9% quality - Grok: $0.00267 per success, 80.0% quality (36% cheaper, -8.9% quality) - Gemini: $0.00125 per success, 85.7% quality (70% cheaper, -3.2% quality) - - Recommendation: Use Gemini for simple implementation (best value) - -Step 5: Generate optimization plan - - Current usage (120 total tasks): - Claude: 100 tasks (83%) - Grok: 15 tasks (13%) - Gemini: 5 tasks (4%) - - Optimized usage (maintain quality >85%): - tier1 (Simple): Gemini (30% of tasks) - tier2 (Medium): Grok (40% of tasks) - tier3 (Complex): Claude (25% of tasks) - tier4 (Critical): Claude + Multi-model (5% of tasks) - - Expected impact: - Current avg cost: $0.00375 per task - Optimized avg cost: $0.00240 per task - Savings: 36% cost reduction - - Current avg success: 88.5% - Optimized avg success: 86.2% (projected) - Quality impact: -2.3% (acceptable tradeoff) - -Step 6: Implement gradual rollout - - Week 1: Route 20% of tier1 tasks to Gemini - Monitor: Success rate, cost savings - Target: >80% success rate - - Week 2: Route 40% of tier2 tasks to Grok - Monitor: Success rate, cost savings - Target: >85% success rate - - Week 3: Evaluate results - If successful: Increase percentages - If unsuccessful: Rollback and investigate - -Step 7: Track optimization results - - After 2 weeks: - Gemini tier1 success: 82% ✅ (above 80% target) - Grok tier2 success: 87% ✅ (above 85% target) - Cost savings: 28% ✅ (approaching 36% target) - - Decision: Continue rollout - Next: Route 50% tier1 to Gemini, 60% tier2 to Grok -``` - -**Insights from Analysis:** - -``` -Key findings: - 1. Grok is best value for code reviews (90% quality at 36% lower cost) - 2. Gemini surprisingly good for implementation (86% vs 89% Claude) - 3. Claude still best for critical work (92% code review success) - 4. Latency varies significantly (Gemini 40% faster than Claude) - -Optimization strategy: - - Use Gemini for simple, latency-sensitive tasks - - Use Grok for medium-complexity, cost-sensitive tasks - - Use Claude for critical, quality-sensitive tasks - - Use multi-model for maximum confidence (despite cost) - -Expected ROI: - - 36% cost reduction (from $0.00375 to $0.00240 per task) - - 2.3% quality tradeoff (from 88.5% to 86.2% success) - - Worth it: Save $135 per 100,000 tasks with minimal quality impact -``` - ---- - -### Example 3: Optimizing Routing Based on Accumulated Data - -**Scenario:** After 100 routing decisions, optimize tier thresholds - -**Execution:** - -``` -Step 1: Load routing data - Read: .claude/agent-performance.json - Focus: routing.tierDistribution, routing.decisions - -Step 2: Analyze tier distribution - - Current distribution: - tier1: 45 tasks (45.9%) - tier2: 30 tasks (30.6%) - tier3: 15 tasks (15.3%) - tier4: 8 tasks (8.2%) - - Skew analysis: - Heavy on tier1 (46%) - Router prefers simple classification - Light on tier4 (8%) - Router rarely escalates - -Step 3: Calculate tier success rates - - tier1 (Simple tasks): - Total: 45 - Success: 42, Failure: 3 - Success rate: 93.3% ✅ - Avg cost: $0.001 - Avg duration: 25s - - tier2 (Medium tasks): - Total: 30 - Success: 25, Failure: 5 - Success rate: 83.3% ⚠️ (target: 90%) - Avg cost: $0.002 - Avg duration: 45s - - tier3 (Complex tasks): - Total: 15 - Success: 12, Failure: 3 - Success rate: 80.0% ⚠️ (target: 90%) - Avg cost: $0.004 - Avg duration: 90s - - tier4 (Critical tasks): - Total: 8 - Success: 8, Failure: 0 - Success rate: 100% ✅ - Avg cost: $0.008 - Avg duration: 120s - -Step 4: Analyze tier2/tier3 failures - - tier2 failures (5 tasks): - 1. "Implement complex state management" (complexity: 6) - - Should have been tier3 (underestimated) - 2. "Add authentication to API" (complexity: 6) - - Should have been tier3 (security = critical) - 3. "Refactor component with hooks" (complexity: 5) - - Should have been tier2 (correctly routed, agent issue) - 4. "Implement drag-and-drop" (complexity: 6) - - Should have been tier3 (complex interaction) - 5. "Add real-time updates" (complexity: 6) - - Should have been tier3 (WebSocket complexity) - - Pattern: 4/5 failures were borderline tier2/tier3 (complexity 6) - Root cause: tier2 upper threshold too high (should be 5, not 6) - - tier3 failures (3 tasks): - 1. "Design microservices architecture" (complexity: 9) - - Should have been tier4 (architecture = critical) - 2. "Implement payment processing" (complexity: 9) - - Should have been tier4 (money = critical) - 3. "Refactor authentication system" (complexity: 8) - - Correctly routed, agent struggled with complexity - - Pattern: 2/3 failures should have been tier4 (complexity 9) - Root cause: tier3 upper threshold too high (should be 8, not 9) - -Step 5: Propose threshold adjustments - - Current thresholds: - tier1: complexity 0-3 - tier2: complexity 4-6 - tier3: complexity 7-9 - tier4: complexity 10+ - - Problem: Borderline tasks (6, 9) cause failures - - Optimized thresholds: - tier1: complexity 0-2 (narrower, more confident) - tier2: complexity 3-5 (shift down, avoid borderline 6) - tier3: complexity 6-8 (shift down, avoid borderline 9) - tier4: complexity 9+ (broader, include borderline cases) - - Rationale: - - Move borderline complexity 6 from tier2 → tier3 - - Move borderline complexity 9 from tier3 → tier4 - - Accept 15% higher cost for 10% better success rate - -Step 6: Simulate new distribution - - Reclassify historical tasks with new thresholds: - - tier1 (0-2): 35 tasks (35%) - Success rate: 34/35 = 97.1% ↑ (was 93.3%) - - tier2 (3-5): 32 tasks (32%) - Success rate: 30/32 = 93.8% ↑ (was 83.3%) - - tier3 (6-8): 18 tasks (18%) - Success rate: 17/18 = 94.4% ↑ (was 80.0%) - - tier4 (9+): 13 tasks (13%) - Success rate: 13/13 = 100% ✓ (was 100%) - - Overall success rate: 94/98 = 95.9% ↑ (was 87.8%) - -Step 7: Calculate cost impact - - Current avg cost: $0.00240 per task - Optimized avg cost: $0.00276 per task (+15%) - - Cost breakdown: - tier1 (35%): $0.001 × 0.35 = $0.00035 - tier2 (32%): $0.002 × 0.32 = $0.00064 - tier3 (18%): $0.004 × 0.18 = $0.00072 - tier4 (13%): $0.008 × 0.13 = $0.00104 - Total: $0.00275 (rounded $0.00276) - - ROI calculation: - Cost increase: +$0.00036 per task (+15%) - Success increase: +8.1% (from 87.8% to 95.9%) - Failure reduction: 12 → 4 failures (67% reduction) - - Value: Preventing 8 failures per 100 tasks worth the 15% cost increase - -Step 8: Implement new thresholds - - Update task-complexity-router skill: - OLD: - if (complexity <= 3) return "tier1"; - if (complexity <= 6) return "tier2"; - if (complexity <= 9) return "tier3"; - return "tier4"; - - NEW: - if (complexity <= 2) return "tier1"; - if (complexity <= 5) return "tier2"; - if (complexity <= 8) return "tier3"; - return "tier4"; - - Document change: - Reason: Performance data showed borderline tasks caused failures - Expected: 8% success rate improvement, 15% cost increase - Monitoring: Track next 100 tasks to validate improvement - -Step 9: Monitor post-optimization - - After 50 tasks with new thresholds: - tier1: 18 tasks, 18 success = 100% ✅ - tier2: 16 tasks, 15 success = 93.8% ✅ - tier3: 10 tasks, 9 success = 90.0% ✅ - tier4: 6 tasks, 6 success = 100% ✅ - - Overall: 48/50 = 96.0% success ✅ (matches projection) - Avg cost: $0.00280 ✅ (matches projection) - - Verdict: Optimization successful, keep new thresholds -``` - -**Insights from Optimization:** - -``` -Key findings: - 1. Borderline complexity scores (6, 9) caused most failures - 2. Router was too aggressive in keeping tasks at lower tiers - 3. Small threshold adjustments (6→5, 9→8) had big impact - -Optimization results: - - Success rate: 87.8% → 96.0% (+8.2%) - - Failure rate: 12.2% → 4.0% (-67%) - - Cost per task: $0.00240 → $0.00280 (+15%) - - ROI: Strong (quality improvement worth cost increase) - -Lessons learned: - - Track tier success rates, not just overall success - - Borderline cases benefit from tier escalation - - Performance data reveals routing blind spots - - Continuous monitoring enables iterative improvement - -Next steps: - - Continue monitoring for 100 more tasks - - Consider dynamic thresholds (adjust based on live data) - - Explore agent-specific routing (some agents handle complexity better) -``` - ---- - -## Troubleshooting - -**Problem: Agent performance.json file growing too large** - -**Cause:** History arrays not being trimmed - -**Solution:** Implement automatic trimming after each update - -```javascript -function updateAgentMetrics(agentId, execution) { - const agent = metrics.agents[agentId]; - - // Update aggregates - agent.totalRuns += 1; - agent.successCount += execution.result === "success" ? 1 : 0; - - // Add to history - agent.history.push(execution); - - // Trim to max 100 entries (FIFO) - if (agent.history.length > 100) { - agent.history = agent.history.slice(-100); - } -} -``` - ---- - -**Problem: Metrics don't reflect recent changes** - -**Cause:** Stale data from old workflows - -**Solution:** Reset metrics after major changes - -```bash -# Backup current metrics -cp .claude/agent-performance.json .claude/agent-performance-backup-$(date +%Y%m%d).json - -# Reset relevant sections (keep models, reset agents) -# Edit .claude/agent-performance.json manually or use script -``` - ---- - -**Problem: Success rate calculations seem wrong** - -**Cause:** Inconsistent result values ("success", "SUCCESS", "completed", etc.) - -**Solution:** Normalize result values - -```javascript -function normalizeResult(result) { - const successValues = ["success", "SUCCESS", "completed", "PASS"]; - const failureValues = ["failure", "FAILURE", "error", "ERROR", "FAIL"]; - - if (successValues.includes(result)) return "success"; - if (failureValues.includes(result)) return "failure"; - return "unknown"; -} - -// Use normalized values in metrics -const normalizedResult = normalizeResult(execution.result); -agent.successCount += normalizedResult === "success" ? 1 : 0; -agent.failureCount += normalizedResult === "failure" ? 1 : 0; -``` - ---- - -## Summary - -Performance tracking enables **data-driven orchestration optimization** through: - -- **Agent success tracking** (identify high-performers and underperformers) -- **Model performance comparison** (find cost-effective alternatives) -- **Skill effectiveness analysis** (discover successful patterns) -- **Routing optimization** (adjust tier thresholds based on actual results) -- **Historical trend detection** (alert on degradation, celebrate improvements) - -Key metrics to monitor: -- Agent success rate (target >70%, alert if <60%) -- Model cost-effectiveness (cost per success, not just cost per task) -- Routing tier accuracy (target >90% success per tier) -- Skill activation correlation (identify high-value skills) - -Master performance tracking and your orchestration workflows will continuously improve, delivering better results at lower costs. - ---- +## Notes -**Inspired By:** -- `/review` command (multi-model performance tracking) -- `/dev` command (agent success rate monitoring) -- task-complexity-router skill (routing feedback loops) -- Production workflows (cost optimization, quality tracking) +- Store metrics in a JSONL file for easy append and analysis +- Integrate with task-complexity-router to feed routing decisions +- Track across sessions to build historical baselines +- Focus on actionable metrics that drive routing improvements diff --git a/plugins/multimodel/skills/quality-gates/SKILL.md b/plugins/multimodel/skills/quality-gates/SKILL.md index 8a14718..aca18d5 100644 --- a/plugins/multimodel/skills/quality-gates/SKILL.md +++ b/plugins/multimodel/skills/quality-gates/SKILL.md @@ -1,998 +1,76 @@ --- name: quality-gates -description: Implement quality gates, user approval, iteration loops, and test-driven development. Use when validating with users, implementing feedback loops, classifying issue severity, running test-driven loops, or building multi-iteration workflows. Trigger keywords - "approval", "user validation", "iteration", "feedback loop", "severity", "test-driven", "TDD", "quality gate", "consensus". -version: 0.1.0 -tags: [orchestration, quality-gates, approval, iteration, feedback, severity, test-driven, TDD] -keywords: [approval, validation, iteration, feedback-loop, severity, test-driven, TDD, quality-gate, consensus, user-approval] -plugin: multimodel -updated: 2026-01-20 +description: "Implements approval checkpoints, iteration loops, severity classification, and test-driven validation in multi-agent workflows. Pauses execution for user review, runs fix-test cycles until passing, and enforces quality thresholds. Use when adding user approval gates, building feedback loops, running TDD iterations, or classifying issue severity in pipelines." --- # Quality Gates -**Version:** 1.0.0 -**Purpose:** Patterns for approval gates, iteration loops, and quality validation in multi-agent workflows -**Status:** Production Ready +Checkpoints in workflows where execution pauses for validation before proceeding. Prevents low-quality work from advancing through the pipeline. -## Overview +## Workflow -Quality gates are checkpoints in workflows where execution pauses for validation before proceeding. They prevent low-quality work from advancing through the pipeline and ensure user expectations are met. +1. **Define gate criteria** — what must be true to pass (tests green, user approves, score above threshold). +2. **Execute the work phase** — agent implements, reviews, or generates output. +3. **Evaluate at the gate** — run checks, present to user, or score results. +4. **Branch on outcome** — pass (continue), fail (iterate), or abort (stop workflow). +5. **Log gate results** — record pass/fail, iteration count, and feedback for performance tracking. -This skill provides battle-tested patterns for: -- **User approval gates** (cost gates, quality gates, final acceptance) -- **Iteration loops** (automated refinement until quality threshold met) -- **Issue severity classification** (CRITICAL, HIGH, MEDIUM, LOW) -- **Multi-reviewer consensus** (unanimous vs majority agreement) -- **Feedback loops** (user reports issues → agent fixes → user validates) -- **Test-driven development loops** (write tests → run → analyze failures → fix → repeat) +## Gate Types -Quality gates transform "fire and forget" workflows into **iterative refinement systems** that consistently produce high-quality results. +| Gate Type | Mechanism | Example | +|-----------|-----------|---------| +| User approval | AskUserQuestion | "Review this plan before I implement it" | +| Test-driven | Run tests, check results | Fix code until `bun test` passes | +| Score threshold | Evaluate output quality | Design fidelity score must be >= 40/50 | +| Consensus | Multiple reviewers agree | 2 of 3 AI reviewers flag the same issue | +| Severity filter | Classify and filter | Only block on CRITICAL/HIGH issues | -## Core Patterns - -### Pattern 1: User Approval Gates - -**When to Ask for Approval:** - -Use approval gates for: -- **Cost gates:** Before expensive operations (multi-model review, large-scale refactoring) -- **Quality gates:** Before proceeding to next phase (design validation before implementation) -- **Final validation:** Before completing workflow (user acceptance testing) -- **Irreversible operations:** Before destructive actions (delete files, database migrations) - -**How to Present Approval:** - -``` -Good Approval Prompt: - -"You selected 5 AI models for code review: - - Claude Sonnet (embedded, free) - - Grok Code Fast (external, $0.002) - - Gemini 2.5 Flash (external, $0.001) - - GPT-5 Codex (external, $0.004) - - DeepSeek Coder (external, $0.001) - - Estimated total cost: $0.008 ($0.005 - $0.010) - Expected duration: ~5 minutes - - Proceed with multi-model review? (Yes/No/Cancel)" - -Why it works: -✓ Clear context (what will happen) -✓ Cost transparency (range, not single number) -✓ Time expectation (5 minutes) -✓ Multiple options (Yes/No/Cancel) -``` - -**Anti-Pattern: Vague Approval** - -``` -❌ Wrong: - -"This will cost money. Proceed? (Yes/No)" - -Why it fails: -✗ No cost details (how much?) -✗ No context (what will happen?) -✗ No alternatives (what if user says no?) -``` - -**Handling User Responses:** - -``` -User says YES: - → Proceed with workflow - → Track approval in logs - → Continue to next step - -User says NO: - → Offer alternatives: - 1. Use fewer models (reduce cost) - 2. Use only free embedded Claude - 3. Skip this step entirely - 4. Cancel workflow - → Ask user to choose alternative - → Proceed based on choice - -User says CANCEL: - → Gracefully exit workflow - → Save partial results (if any) - → Log cancellation reason - → Clean up temporary files - → Notify user: "Workflow cancelled. Partial results saved to..." -``` - -**Approval Bypasses (Advanced):** - -For automated workflows, allow approval bypass: - -``` -Automated Workflow Mode: - -If workflow is triggered by CI/CD or scheduled task: - → Skip user approval gates - → Use predefined defaults (e.g., max cost $0.10) - → Log decisions for audit trail - → Email report to stakeholders after completion - -Example: - if (isAutomatedMode) { - if (estimatedCost <= maxAutomatedCost) { - log("Auto-approved: $0.008 <= $0.10 threshold"); - proceed(); - } else { - log("Auto-rejected: $0.008 > $0.10 threshold"); - notifyStakeholders("Cost exceeds automated threshold"); - abort(); - } - } -``` - ---- - -### Pattern 2: Iteration Loop Patterns - -**Max Iteration Limits:** - -Always set a **max iteration limit** to prevent infinite loops: - -``` -Typical Iteration Limits: - -Automated quality loops: 10 iterations - - Designer validation → Developer fixes → Repeat - - Test failures → Developer fixes → Repeat - -User feedback loops: 5 rounds - - User reports issues → Developer fixes → User validates → Repeat - -Code review loops: 3 rounds - - Reviewer finds issues → Developer fixes → Re-review → Repeat - -Multi-model consensus: 1 iteration (no loop) - - Parallel review → Consolidate → Present -``` - -**Exit Criteria:** - -Define clear **exit criteria** for each loop type: - -``` -Loop Type: Design Validation - -Exit Criteria (checked after each iteration): - 1. Designer assessment = PASS → Exit loop (success) - 2. Iteration count >= 10 → Exit loop (max iterations) - 3. User manually approves → Exit loop (user override) - 4. No changes made by developer → Exit loop (stuck, escalate) - -Example: - for (let i = 1; i <= 10; i++) { - const review = await designer.validate(); - - if (review.assessment === "PASS") { - log("Design validation passed on iteration " + i); - break; // Success exit - } - - if (i === 10) { - log("Max iterations reached. Escalating to user validation."); - break; // Max iterations exit - } - - await developer.fix(review.issues); - } -``` - -**Progress Tracking:** - -Show clear progress to user during iterations: - -``` -Iteration Loop Progress: - -Iteration 1/10: Designer found 5 issues → Developer fixing... -Iteration 2/10: Designer found 3 issues → Developer fixing... -Iteration 3/10: Designer found 1 issue → Developer fixing... -Iteration 4/10: Designer assessment: PASS ✓ - -Loop completed in 4 iterations. -``` - -**Iteration History Documentation:** - -Track what happened in each iteration: - -``` -Iteration History (ai-docs/iteration-history.md): - -## Iteration 1 -Designer Assessment: NEEDS IMPROVEMENT -Issues Found: - - Button color doesn't match design (#3B82F6 vs #2563EB) - - Spacing between elements too tight (8px vs 16px) - - Font size incorrect (14px vs 16px) -Developer Actions: - - Updated button color to #2563EB - - Increased spacing to 16px - - Changed font size to 16px - -## Iteration 2 -Designer Assessment: NEEDS IMPROVEMENT -Issues Found: - - Border radius too large (8px vs 4px) -Developer Actions: - - Reduced border radius to 4px - -## Iteration 3 -Designer Assessment: PASS ✓ -Issues Found: None -Result: Design validation complete -``` - ---- - -### Pattern 3: Issue Severity Classification - -**Severity Levels:** - -Use 4-level severity classification: - -``` -CRITICAL - Must fix immediately - - Blocks core functionality - - Security vulnerabilities (SQL injection, XSS, auth bypass) - - Data loss risk - - System crashes - - Build failures - - Action: STOP workflow, fix immediately, re-validate - -HIGH - Should fix soon - - Major bugs (incorrect behavior) - - Performance issues (>3s page load, memory leaks) - - Accessibility violations (keyboard navigation broken) - - User experience blockers - - Action: Fix in current iteration, proceed after fix - -MEDIUM - Should fix - - Minor bugs (edge cases, visual glitches) - - Code quality issues (duplication, complexity) - - Non-blocking performance issues - - Incomplete error handling - - Action: Fix if time permits, or schedule for next iteration - -LOW - Nice to have - - Code style inconsistencies - - Minor refactoring opportunities - - Documentation improvements - - Polish and optimization - - Action: Log for future improvement, proceed without fixing -``` - -**Severity-Based Prioritization:** - -``` -Issue List (sorted by severity): - -CRITICAL Issues (must fix all before proceeding): - 1. SQL injection in user search endpoint - 2. Missing authentication check on admin routes - 3. Password stored in plaintext - -HIGH Issues (fix before code review): - 4. Memory leak in WebSocket connection - 5. Missing error handling in payment flow - 6. Accessibility: keyboard navigation broken - -MEDIUM Issues (fix if time permits): - 7. Code duplication in auth controllers - 8. Inconsistent error messages - 9. Missing JSDoc comments - -LOW Issues (defer to future): - 10. Variable naming inconsistency - 11. Redundant type annotations - 12. CSS could use more specificity - -Action Plan: - - Fix CRITICAL (1-3) immediately → Re-run tests - - Fix HIGH (4-6) before code review - - Log MEDIUM (7-9) for next iteration - - Ignore LOW (10-12) for now -``` - -**Severity Escalation:** - -Issues can escalate in severity based on context: - -``` -Context-Based Escalation: - -Issue: "Missing error handling in payment flow" - Base Severity: MEDIUM (code quality issue) - - Context 1: Development environment - → Severity: MEDIUM (not user-facing yet) - - Context 2: Production environment - → Severity: HIGH (affects real users, money involved) - - Context 3: Production + recent payment failures - → Severity: CRITICAL (actively causing issues) - -Rule: Escalate severity when: - - Issue affects production users - - Issue involves money/security/data - - Issue is currently causing failures -``` - ---- - -### Pattern 4: Multi-Reviewer Consensus - -**Consensus Levels:** - -When multiple reviewers evaluate the same code/design: - -``` -UNANIMOUS (100% agreement): - - ALL reviewers flagged this issue - - VERY HIGH confidence - - Highest priority (likely a real problem) - -Example: - 3/3 reviewers: "SQL injection in search endpoint" - → UNANIMOUS consensus - → CRITICAL priority (all agree it's critical) - -STRONG CONSENSUS (67-99% agreement): - - MOST reviewers flagged this issue - - HIGH confidence - - High priority (probably a real problem) - -Example: - 2/3 reviewers: "Missing input validation" - → STRONG consensus (67%) - → HIGH priority - -MAJORITY (50-66% agreement): - - HALF or more flagged this issue - - MEDIUM confidence - - Medium priority (worth investigating) - -Example: - 2/3 reviewers: "Code duplication in controllers" - → MAJORITY consensus (67%) - → MEDIUM priority - -DIVERGENT (< 50% agreement): - - Only 1-2 reviewers flagged this issue - - LOW confidence - - Low priority (may be model-specific or false positive) - -Example: - 1/3 reviewers: "Variable naming could be better" - → DIVERGENT (33%) - → LOW priority (one reviewer's opinion) -``` - -**Consensus-Based Prioritization:** - -``` -Prioritized Issue List (by consensus + severity): - -1. [UNANIMOUS - CRITICAL] SQL injection in search - ALL reviewers agree: Claude, Grok, Gemini (3/3) - -2. [UNANIMOUS - HIGH] Missing input validation - ALL reviewers agree: Claude, Grok, Gemini (3/3) - -3. [STRONG - HIGH] Memory leak in WebSocket - MOST reviewers agree: Claude, Grok (2/3) - -4. [MAJORITY - MEDIUM] Code duplication - HALF+ reviewers agree: Claude, Gemini (2/3) - -5. [DIVERGENT - LOW] Variable naming - SINGLE reviewer: Claude only (1/3) - -Action: - - Fix issues 1-2 immediately (unanimous + CRITICAL/HIGH) - - Fix issue 3 before review (strong consensus) - - Consider issue 4 (majority, but medium severity) - - Ignore issue 5 (divergent, likely false positive) -``` - ---- - -### Pattern 5: Feedback Loop Implementation - -**User Feedback Loop:** - -``` -Workflow: User Validation with Feedback - -Step 1: Initial Implementation - Developer implements feature - Designer/Tester validates - Present to user for manual validation - -Step 2: User Validation Gate (MANDATORY) - Present to user: - "Implementation complete. Please manually verify: - - Open app at http://localhost:3000 - - Test feature: [specific instructions] - - Compare to design reference - - Does it meet expectations? (Yes/No)" - -Step 3a: User says YES - → ✅ Feature approved - → Generate final report - → Mark workflow complete - -Step 3b: User says NO - → Collect specific feedback - -Step 4: Collect Specific Feedback - Ask user: "Please describe the issues you found:" - - User response: - "1. Button color is wrong (should be blue, not green) - 2. Spacing is too tight between elements - 3. Font size is too small" - -Step 5: Extract Structured Feedback - Parse user feedback into structured issues: - - Issue 1: - Component: Button - Problem: Color incorrect - Expected: Blue (#2563EB) - Actual: Green (#10B981) - Severity: MEDIUM - - Issue 2: - Component: Container - Problem: Spacing too tight - Expected: 16px - Actual: 8px - Severity: MEDIUM - - Issue 3: - Component: Text - Problem: Font size too small - Expected: 16px - Actual: 14px - Severity: LOW - -Step 6: Launch Fixing Agent - Task: ui-developer - Prompt: "Fix user-reported issues: - - 1. Button color: Change from #10B981 to #2563EB - 2. Container spacing: Increase from 8px to 16px - 3. Text font size: Increase from 14px to 16px - - User feedback: [user's exact words]" - -Step 7: Re-validate - After fixes: - - Re-run designer validation - - Loop back to Step 2 (user validation) - -Step 8: Max Feedback Rounds - Limit: 5 feedback rounds (prevent infinite loop) - - If round > 5: - Escalate to human review - "Unable to meet user expectations after 5 rounds. - Manual intervention required." -``` - -**Feedback Round Tracking:** - -``` -Feedback Round History: - -Round 1: - User Issues: Button color, spacing, font size - Fixes Applied: Updated all 3 issues - Result: Re-validate - -Round 2: - User Issues: Border radius too large - Fixes Applied: Reduced border radius - Result: Re-validate - -Round 3: - User Issues: None - Result: ✅ APPROVED - -Total Rounds: 3/5 -``` - ---- - -### Pattern 6: Test-Driven Development Loop - -**When to Use:** - -Use TDD loop **after implementing code, before code review**: - -``` -Workflow Phases: - -Phase 1: Architecture Planning -Phase 2: Implementation -Phase 2.5: Test-Driven Development Loop ← THIS PATTERN -Phase 3: Code Review -Phase 4: User Acceptance -``` - -**The TDD Loop Pattern:** - -``` -Step 1: Write Tests First - Task: test-architect - Prompt: "Write comprehensive tests for authentication feature. - Requirements: [link to requirements] - Implementation: [link to code]" - Output: tests/auth.test.ts - -Step 2: Run Tests - Bash: bun test tests/auth.test.ts - Capture output and exit code - -Step 3: Check Test Results - If all tests pass: - → ✅ TDD loop complete - → Proceed to code review (Phase 3) - - If tests fail: - → Analyze failure (continue to Step 4) - -Step 4: Analyze Test Failure - Task: test-architect - Prompt: "Analyze test failure output: - - [test failure logs] - - Determine root cause: - - TEST_ISSUE: Test has bug (bad assertion, missing mock, wrong expectation) - - IMPLEMENTATION_ISSUE: Code has bug (logic error, missing validation, incorrect behavior) - - Provide detailed analysis." - - test-architect returns: - verdict: TEST_ISSUE | IMPLEMENTATION_ISSUE - analysis: Detailed explanation - recommendation: Specific fix needed - -Step 5a: If TEST_ISSUE (test is wrong) - Task: test-architect - Prompt: "Fix test based on analysis: - [analysis from Step 4]" - - After fix: - → Re-run tests (back to Step 2) - → Loop continues - -Step 5b: If IMPLEMENTATION_ISSUE (code is wrong) - Provide structured feedback to developer: - - Task: backend-developer - Prompt: "Fix implementation based on test failure: - - Test Failure: - [failure output] - - Root Cause: - [analysis from test-architect] - - Recommended Fix: - [specific fix needed]" - - After fix: - → Re-run tests (back to Step 2) - → Loop continues - -Step 6: Max Iteration Limit - Limit: 10 iterations - - Iteration tracking: - Iteration 1/10: 5 tests failed → Fix implementation - Iteration 2/10: 2 tests failed → Fix test (bad mock) - Iteration 3/10: All tests pass ✅ - - If iteration > 10: - Escalate to human review - "Unable to pass all tests after 10 iterations. - Manual debugging required." -``` - -**Example TDD Loop:** - -``` -Phase 2.5: Test-Driven Development Loop - -Iteration 1: - Tests Run: 20 tests - Results: 5 failed, 15 passed - Failure: "JWT token validation fails with expired token" - Analysis: IMPLEMENTATION_ISSUE - Missing expiration check - Fix: Added expiration validation in TokenService - Re-run: Continue to Iteration 2 - -Iteration 2: - Tests Run: 20 tests - Results: 2 failed, 18 passed - Failure: "Mock database not reset between tests" - Analysis: TEST_ISSUE - Missing beforeEach cleanup - Fix: Added database reset in test setup - Re-run: Continue to Iteration 3 - -Iteration 3: - Tests Run: 20 tests - Results: All passed ✅ - Result: TDD loop complete, proceed to code review - -Total Iterations: 3/10 -Duration: ~5 minutes -Benefits: - - Caught 2 bugs before code review - - Fixed 1 test quality issue - - All tests passing gives confidence in implementation -``` - -**Benefits of TDD Loop:** - -``` -Benefits: - -1. Catch bugs early (before code review, not after) -2. Ensure test quality (test-architect fixes bad tests) -3. Automated quality assurance (no manual testing needed) -4. Fast feedback loop (seconds to run tests, not minutes) -5. Confidence in implementation (all tests passing) - -Performance: - Traditional: Implement → Review → Find bugs → Fix → Re-review - Time: 30+ minutes, multiple review rounds - - TDD Loop: Implement → Test → Fix → Test → Review (with confidence) - Time: 15 minutes, single review round (fewer issues) -``` - ---- - -## Integration with Other Skills - -**quality-gates + multi-model-validation:** - -``` -Use Case: Cost approval before multi-model review - -Step 1: Estimate costs (multi-model-validation) -Step 2: User approval gate (quality-gates) - If approved: Proceed with parallel execution - If rejected: Offer alternatives -Step 3: Execute review (multi-model-validation) -``` - -**quality-gates + multi-agent-coordination:** - -``` -Use Case: Iteration loop with designer validation - -Step 1: Agent selection (multi-agent-coordination) - Select designer + ui-developer - -Step 2: Iteration loop (quality-gates) - For i = 1 to 10: - - Run designer validation - - If PASS: Exit loop - - Else: Delegate to ui-developer for fixes - -Step 3: User validation gate (quality-gates) - Mandatory manual approval -``` - -**quality-gates + error-recovery:** - -``` -Use Case: Test-driven loop with error recovery - -Step 1: Run tests (quality-gates TDD pattern) -Step 2: If test execution fails (error-recovery) - - Syntax error → Fix and retry - - Framework crash → Notify user, skip TDD -Step 3: If tests pass (quality-gates) - - Proceed to code review -``` - ---- - -## Best Practices - -**Do:** -- ✅ Set max iteration limits (prevent infinite loops) -- ✅ Define clear exit criteria (PASS, max iterations, user override) -- ✅ Track iteration history (document what happened) -- ✅ Show progress to user ("Iteration 3/10 complete") -- ✅ Classify issue severity (CRITICAL → HIGH → MEDIUM → LOW) -- ✅ Prioritize by consensus + severity -- ✅ Ask user approval for expensive operations -- ✅ Collect specific feedback (not vague complaints) -- ✅ Use TDD loop to catch bugs early - -**Don't:** -- ❌ Create infinite loops (no exit criteria) -- ❌ Skip user validation gates (mandatory for UX) -- ❌ Ignore consensus (unanimous issues are real) -- ❌ Batch all severities together (prioritize CRITICAL) -- ❌ Proceed without approval for >$0.01 operations -- ❌ Collect vague feedback ("it's wrong" → what specifically?) -- ❌ Skip TDD loop (catches bugs before expensive review) - -**Performance:** -- Iteration loops: 5-10 iterations typical, max 10-15 min -- TDD loop: 3-5 iterations typical, max 5-10 min -- User feedback: 1-3 rounds typical, max 5 rounds - ---- - -## Examples - -### Example 1: User Approval Gate for Multi-Model Review - -**Scenario:** User requests multi-model review, costs $0.008 - -**Execution:** +## Example: Test-Driven Iteration Loop ``` -Step 1: Estimate Costs - Input: 450 lines × 1.5 = 675 tokens per model - Output: 2000-4000 tokens per model - Total: 3 models × 3000 avg = 9000 output tokens - Cost: ~$0.008 ($0.005 - $0.010) - -Step 2: Present Approval Gate - "Multi-model review will analyze 450 lines with 3 AI models: - - Claude Sonnet (embedded, free) - - Grok Code Fast (external, $0.002) - - Gemini 2.5 Flash (external, $0.001) - - Estimated cost: $0.008 ($0.005 - $0.010) - Duration: ~5 minutes - - Proceed? (Yes/No/Cancel)" - -Step 3a: User says YES - → Proceed with parallel execution - → Track approval: log("User approved $0.008 cost") +max_iterations = 5 -Step 3b: User says NO - → Offer alternatives: - 1. Use only free Claude (no external models) - 2. Use only 1 external model (reduce cost to $0.002) - 3. Skip review entirely - → Ask user to choose - -Step 3c: User says CANCEL - → Exit gracefully - → Log: "User cancelled multi-model review" - → Clean up temporary files +for iteration in 1..max_iterations: + 1. Run tests: bun test + 2. If all pass → GATE PASSED, continue to next phase + 3. If failures: + a. Parse test output for failing tests + b. Fix identified issues + c. Continue loop + 4. If max_iterations reached → GATE FAILED, ask user ``` ---- - -### Example 2: Designer Validation Iteration Loop +**Verification:** Confirm tests pass before marking the gate as passed. Never skip the final test run. -**Scenario:** UI implementation with automated iteration until PASS - -**Execution:** +## Example: User Approval Gate +```typescript +// Present plan for approval before implementation +AskUserQuestion({ + questions: [{ + question: "Review the architecture plan. Approve to proceed with implementation?", + header: "Quality Gate: Plan Review", + options: [ + { label: "Approve", description: "Proceed with implementation" }, + { label: "Request changes", description: "I have feedback" }, + { label: "Reject", description: "Start over with different approach" } + ] + }] +}) ``` -Iteration 1: - Task: designer - Prompt: "Validate navbar against Figma design" - Output: ai-docs/design-review-1.md - Assessment: NEEDS IMPROVEMENT - Issues: - - Button color: #3B82F6 (expected #2563EB) - - Spacing: 8px (expected 16px) - - Task: ui-developer - Prompt: "Fix issues from ai-docs/design-review-1.md" - Changes: Updated button color, increased spacing - - Result: Continue to Iteration 2 - -Iteration 2: - Task: designer - Prompt: "Re-validate navbar" - Output: ai-docs/design-review-2.md - Assessment: NEEDS IMPROVEMENT - Issues: - - Border radius: 8px (expected 4px) - - Task: ui-developer - Prompt: "Fix border radius issue" - Changes: Reduced border radius to 4px - - Result: Continue to Iteration 3 - -Iteration 3: - Task: designer - Prompt: "Re-validate navbar" - Output: ai-docs/design-review-3.md - Assessment: PASS ✓ - Issues: None - - Result: Exit loop (success) - -Summary: - Total Iterations: 3/10 - Duration: ~8 minutes - Automated Fixes: 3 issues resolved - Result: PASS, proceed to user validation -``` - ---- - -### Example 3: Test-Driven Development Loop - -**Scenario:** Authentication implementation with TDD - -**Execution:** - -``` -Phase 2.5: Test-Driven Development Loop - -Iteration 1: - Task: test-architect - Prompt: "Write tests for authentication feature" - Output: tests/auth.test.ts (20 tests) - - Bash: bun test tests/auth.test.ts - Result: 5 failed, 15 passed - - Task: test-architect - Prompt: "Analyze test failures" - Verdict: IMPLEMENTATION_ISSUE - Analysis: "Missing JWT expiration validation" - - Task: backend-developer - Prompt: "Add JWT expiration validation" - Changes: Updated TokenService.verify() - - Bash: bun test tests/auth.test.ts - Result: Continue to Iteration 2 - -Iteration 2: - Bash: bun test tests/auth.test.ts - Result: 2 failed, 18 passed - - Task: test-architect - Prompt: "Analyze test failures" - Verdict: TEST_ISSUE - Analysis: "Mock database not reset between tests" - - Task: test-architect - Prompt: "Fix test setup" - Changes: Added beforeEach cleanup - - Bash: bun test tests/auth.test.ts - Result: Continue to Iteration 3 -Iteration 3: - Bash: bun test tests/auth.test.ts - Result: All 20 passed ✅ +## Severity Classification - Result: TDD loop complete, proceed to code review +| Severity | Impact | Gate Action | +|----------|--------|-------------| +| CRITICAL | Breaks functionality, security risk | Block — must fix before proceeding | +| HIGH | Significant quality issue | Block — fix recommended | +| MEDIUM | Minor quality concern | Warn — proceed with note | +| LOW | Cosmetic or suggestion | Log — proceed without blocking | -Summary: - Total Iterations: 3/10 - Duration: ~5 minutes - Bugs Caught: 1 implementation bug, 1 test bug - Result: All tests passing, high confidence in code -``` - ---- - -## Troubleshooting - -**Problem: Infinite iteration loop** - -Cause: No exit criteria or max iteration limit - -Solution: Always set max iterations (10 for automated, 5 for user feedback) - -``` -❌ Wrong: - while (true) { - if (review.assessment === "PASS") break; - fix(); - } - -✅ Correct: - for (let i = 1; i <= 10; i++) { - if (review.assessment === "PASS") break; - if (i === 10) escalateToUser(); - fix(); - } -``` - ---- - -**Problem: User approval skipped for expensive operation** - -Cause: Missing approval gate - -Solution: Always ask approval for costs >$0.01 - -``` -❌ Wrong: - if (userRequestedMultiModel) { - executeReview(); - } - -✅ Correct: - if (userRequestedMultiModel) { - const cost = estimateCost(); - if (cost > 0.01) { - const approved = await askUserApproval(cost); - if (!approved) return offerAlternatives(); - } - executeReview(); - } -``` - ---- - -**Problem: All issues treated equally** - -Cause: No severity classification - -Solution: Classify by severity, prioritize CRITICAL - -``` -❌ Wrong: - issues.forEach(issue => fix(issue)); - -✅ Correct: - const critical = issues.filter(i => i.severity === "CRITICAL"); - const high = issues.filter(i => i.severity === "HIGH"); - - critical.forEach(issue => fix(issue)); // Fix critical first - high.forEach(issue => fix(issue)); // Then high - // MEDIUM and LOW deferred or skipped -``` - ---- - -## Summary - -Quality gates ensure high-quality results through: - -- **User approval gates** (cost, quality, final validation) -- **Iteration loops** (automated refinement, max 10 iterations) -- **Severity classification** (CRITICAL → HIGH → MEDIUM → LOW) -- **Consensus prioritization** (unanimous → strong → majority → divergent) -- **Feedback loops** (collect specific issues, fix, re-validate) -- **Test-driven development** (write tests, run, fix, repeat until pass) - -Master these patterns and your workflows will consistently produce high-quality, validated results. - ---- +## Notes -**Extracted From:** -- `/review` command (user approval for costs, consensus analysis) -- `/validate-ui` command (iteration loops, user validation gates, feedback collection) -- `/implement` command (PHASE 2.5 test-driven development loop) -- Multi-model review patterns (consensus-based prioritization) +- Always set max iterations to prevent infinite loops (default: 5) +- User approval gates should show enough context for informed decisions +- Log all gate outcomes for performance tracking and workflow optimization +- Consensus gates work well with multi-model review (2 of 3 agreement threshold) diff --git a/plugins/multimodel/skills/session-isolation/SKILL.md b/plugins/multimodel/skills/session-isolation/SKILL.md index 07c58c0..e1ce698 100644 --- a/plugins/multimodel/skills/session-isolation/SKILL.md +++ b/plugins/multimodel/skills/session-isolation/SKILL.md @@ -1,48 +1,21 @@ --- name: session-isolation -description: Use when orchestrating workflows that generate multiple files (designs, reviews, reports) to prevent file collisions across concurrent or sequential sessions with unique session directories. -keywords: [session-isolation, artifact-isolation, file-collision, concurrent-sessions, sequential-workflows, session-directory, multi-artifact, session-metadata] -plugin: multimodel -updated: 2026-01-20 +description: "Creates unique session directories to prevent file collisions across concurrent or sequential multi-artifact workflows. Generates timestamped paths, tracks metadata, and supports legacy fallback. Use when orchestrating workflows that generate multiple files such as designs, reviews, or reports." --- # Session Isolation Pattern -Session-based artifact isolation for multi-artifact workflows. Use when orchestrating workflows that generate multiple files (designs, reviews, reports) to prevent file collisions across concurrent or sequential sessions. +Isolates artifacts from multi-artifact workflows into unique session directories to prevent file collisions. -## Problem +## Workflow -When multiple workflows run (even sequentially), artifacts with the same name collide: +1. **Generate a unique session path** using workflow type, target slug, timestamp, and random suffix. +2. **Create the directory structure** with subdirectories for reviews and artifacts. +3. **Write session metadata** (`session-meta.json`) with session ID, type, and start time. +4. **Pass SESSION_PATH** to all sub-agent prompts so they write to isolated paths. +5. **Update metadata on completion** with status and completion timestamp. -``` -Session 1 (SEO): writes ai-docs/plan-review-grok.md -Session 2 (API): writes ai-docs/plan-review-grok.md <-- OVERWRITES! -``` - -## Solution - -Use unique session folders to isolate artifacts: - -``` -ai-docs/sessions/agentdev-seo-20260105-143022-a3f2/ -├── session-meta.json # Session tracking -├── design.md # Primary artifact -├── reviews/ -│ ├── plan-review/ # Plan review phase -│ │ ├── internal.md -│ │ ├── grok.md -│ │ └── consolidated.md -│ └── impl-review/ # Implementation review phase -│ ├── internal.md -│ └── consolidated.md -└── report.md # Final report -``` - -## Implementation Pattern - -### 1. Session Initialization (Orchestrator) - -Add to Phase 0 of your orchestrator command: +## Example: Session Initialization ```bash # Generate unique session path @@ -52,141 +25,37 @@ SESSION_PATH="ai-docs/sessions/${SESSION_BASE}" # Create directory structure mkdir -p "${SESSION_PATH}/reviews/plan-review" \ - "${SESSION_PATH}/reviews/impl-review" || { - echo "Warning: Cannot create session directory, using legacy mode" - SESSION_PATH="ai-docs" -} + "${SESSION_PATH}/reviews/impl-review" -# Create session metadata (if not legacy mode) -if [[ "$SESSION_PATH" != "ai-docs" ]]; then - cat > "${SESSION_PATH}/session-meta.json" << EOF +# Create session metadata +cat > "${SESSION_PATH}/session-meta.json" << EOF { "session_id": "${SESSION_BASE}", "type": "${WORKFLOW_TYPE}", - "target": "${USER_REQUEST}", "started_at": "$(date -u +%Y-%m-%dT%H:%M:%SZ)", "status": "in_progress" } EOF -fi -``` - -### 2. Pass SESSION_PATH to Sub-Agents - -Include in all agent prompts: - -``` -SESSION_PATH: ${SESSION_PATH} - -{actual task description} - -Save output to: ${SESSION_PATH}/{artifact_path} -``` - -### 3. Sub-Agent SESSION_PATH Detection - -Add to agent ``: - -```xml - - **Check for Session Path Directive** - - If prompt contains `SESSION_PATH: {path}`: - 1. Extract the session path - 2. Use it for all output file paths - 3. Primary artifact: `${SESSION_PATH}/{type}.md` - 4. Reviews: `${SESSION_PATH}/reviews/{phase}/{model}.md` - - **If NO SESSION_PATH**: Use legacy paths (ai-docs/) - ``` -### 4. Session Completion - -Update metadata when workflow completes: - -```bash -if [[ -f "${SESSION_PATH}/session-meta.json" ]]; then - jq '.status = "completed" | .completed_at = (now | strftime("%Y-%m-%dT%H:%M:%SZ"))' \ - "${SESSION_PATH}/session-meta.json" > "${SESSION_PATH}/session-meta.json.tmp" && \ - mv "${SESSION_PATH}/session-meta.json.tmp" "${SESSION_PATH}/session-meta.json" -fi -``` +**Verification:** Confirm the session directory exists and `session-meta.json` is valid JSON before proceeding. ## Artifact Path Mapping -| Artifact Type | SESSION_PATH Format | Legacy Format | -|---------------|---------------------|---------------| +| Artifact Type | SESSION_PATH Format | Legacy Fallback | +|---------------|---------------------|-----------------| | Design/Context | `${SESSION_PATH}/design.md` | `ai-docs/agent-design-{name}.md` | | Plan Review | `${SESSION_PATH}/reviews/plan-review/{model}.md` | `ai-docs/plan-review-{model}.md` | | Impl Review | `${SESSION_PATH}/reviews/impl-review/{model}.md` | `ai-docs/impl-review-{model}.md` | -| Consolidated | `${SESSION_PATH}/reviews/{phase}/consolidated.md` | `ai-docs/{phase}-consolidated.md` | | Final Report | `${SESSION_PATH}/report.md` | `ai-docs/{workflow}-report-{name}.md` | ## Backward Compatibility -**Legacy Mode Triggers:** -1. `SESSION_PATH` not provided in prompt -2. Directory creation fails (permissions) -3. Explicit `LEGACY_MODE: true` in prompt - -**Behavior:** -- Fall back to flat `ai-docs/` paths -- Log warning about legacy mode -- All features still work, just without isolation - -## Session Metadata Schema - -```json -{ - "session_id": "agentdev-seo-20260105-143022-a3f2", - "type": "agentdev", - "target": "SEO agent improvements", - "started_at": "2026-01-05T14:30:22Z", - "completed_at": "2026-01-05T15:45:30Z", - "status": "completed", - "phases_completed": ["init", "design", "plan-review", "implementation", "quality-review"], - "models_used": ["claude-embedded", "x-ai/grok-code-fast-1", "google/gemini-3-pro"], - "artifacts": { - "design": "design.md", - "plan_reviews": ["reviews/plan-review/internal.md", "reviews/plan-review/grok.md"], - "impl_reviews": ["reviews/impl-review/internal.md", "reviews/impl-review/gemini.md"], - "report": "report.md" - } -} -``` - -## Plugins Using Session Isolation - -| Plugin | Command | Session Pattern | -|--------|---------|-----------------| -| **agentdev** | `/develop` | `agentdev-{target}-{timestamp}-{random}` | -| **frontend** | `/review`, `/implement` | `review-{timestamp}-{random}` | -| **seo** | `/review`, `/alternatives` | `seo-review-{timestamp}-{random}` | -| **multimodel** | `/team` | `team-{task-slug}-{timestamp}-{random}` | - -### Team Session Example - -The `/team` command creates a session for multi-model blind voting: - -``` -ai-docs/sessions/team-stats-validation-20260209-143022-a3f2/ -├── task.md # Raw task description (shared by all models) -├── grok-result.md # Grok's investigation findings -├── gemini-result.md # Gemini's investigation findings -├── deepseek-result.md # DeepSeek's investigation findings -├── internal-result.md # Internal Claude's findings -└── verdict.md # Aggregated verdict with vote breakdown -``` - -**Key difference from other plugins:** Team sessions contain results from -multiple AI models investigating the same task independently. Each model -writes to its own result file to prevent conflicts during parallel execution. +Falls back to flat `ai-docs/` paths when SESSION_PATH is not provided, directory creation fails, or `LEGACY_MODE: true` is set. -## Best Practices +## Notes -1. **Always initialize early**: Session creation should happen in Phase 0 -2. **Include SESSION_PATH in all prompts**: Sub-agents need it for output paths -3. **Use descriptive slugs**: Include workflow type and target in folder name -4. **Update metadata on completion**: Track status changes -5. **Fallback gracefully**: Never fail the workflow due to session creation issues +- Initialize session in Phase 0 of the orchestrator command +- Include SESSION_PATH in all sub-agent prompts +- Use descriptive slugs (workflow type + target) in folder names +- Never fail the workflow due to session creation issues — fall back to legacy mode diff --git a/plugins/multimodel/skills/task-complexity-router/SKILL.md b/plugins/multimodel/skills/task-complexity-router/SKILL.md index 5cdda8b..11c1595 100644 --- a/plugins/multimodel/skills/task-complexity-router/SKILL.md +++ b/plugins/multimodel/skills/task-complexity-router/SKILL.md @@ -1,961 +1,77 @@ --- name: task-complexity-router -description: Complexity-based task routing for optimal model selection and cost efficiency. Use when deciding which model tier to use, analyzing task complexity, optimizing API costs, or implementing tiered routing. Trigger keywords - "routing", "complexity", "model selection", "tier", "cost optimization", "haiku", "sonnet", "opus", "task analysis". -version: 0.1.0 -tags: [orchestration, routing, complexity, model-selection, cost-optimization, tiered] -keywords: [routing, complexity, model-selection, tier, cost, haiku, sonnet, opus, optimization, task-analysis] -plugin: multimodel -updated: 2026-01-28 +description: "Routes tasks to optimal model tiers (Native Tools, Haiku, Sonnet, Opus) based on keyword scoring and context analysis. Reduces AI costs by 60-90% through complexity-based routing with dynamic escalation on failure. Use when selecting model tiers, optimizing API costs, implementing tiered routing, or splitting complex tasks into subtasks." --- # Task Complexity Router -**Version:** 1.0.0 -**Purpose:** Intelligent task routing to optimal model tiers for cost efficiency and performance -**Status:** Production Ready +Routes tasks to the cheapest model tier that can handle them, with automatic escalation on failure. -## Overview +## Workflow -Task complexity routing is the practice of **matching tasks to appropriate model tiers** based on complexity, urgency, and resource requirements. Instead of using expensive premium models for all tasks, routing directs simple tasks to fast/cheap models and reserves expensive models for complex work. +1. **Extract keywords** from the user request and score each one (Tier 0-3). +2. **Apply context modifiers** — file count, security implications, user signals. +3. **Route to the selected tier** based on total score. +4. **Monitor for failure** — escalate to the next tier after 2 failed attempts. +5. **Track cost** — log tier, model, tokens, and cost per task. -This skill provides battle-tested patterns for: -- **4-tier routing system** (Native Tools → Haiku → Sonnet → Opus) -- **Complexity detection heuristics** (keyword-based + context-based) -- **Cost optimization strategies** (save 60-90% on API costs) -- **Dynamic tier escalation** (upgrade when task stalls or fails) -- **Routing integration** (works with multi-agent-coordination, quality-gates, proxy-mode) +## Tier Definitions -Well-designed routing can **reduce AI costs by 60-90%** while maintaining quality, since 70% of tasks can be handled by faster, cheaper models. +| Tier | Model | Cost/1M | Speed | When to Use | +|------|-------|---------|-------|-------------| +| 0 | Native Tools | $0 | Instant | File ops, search, format, git | +| 1 | Haiku | $0.80 | 1-3s | Simple edits, docs, typo fixes | +| 2 | Sonnet (default) | $3.00 | 5-10s | Standard dev, multi-file changes | +| 3 | Opus | $15.00 | 15-30s | Architecture, security audits, complex debugging | -## Why Task Routing Matters +## Keyword Scoring -### Cost Comparison (per 1M tokens) +| Score | Tier | Keywords | +|-------|------|----------| +| +0 | Tier 0 | find, search, list, show, format, rename, grep | +| +1 | Tier 1 | simple, basic, quick, minor, add, fix, update, comment | +| +2 | Tier 2 | implement, create, build, refactor, integrate, develop | +| +3 | Tier 3 | architect, design, audit, complex, critical, optimize | -| Model Tier | Model Example | Cost (Input/Output) | Speed | Use Case | -|------------|---------------|---------------------|-------|----------| -| **Tier 0** | Native Tools | $0 | Instant | File operations, searches, formatting | -| **Tier 1** | Claude Haiku 4.5 | $0.80 / $4.00 | Fast | Simple edits, docs, straightforward tasks | -| **Tier 2** | Claude Sonnet 4.5 | $3.00 / $15.00 | Moderate | Standard dev, multi-file changes | -| **Tier 3** | Claude Opus 4.5 | $15.00 / $75.00 | Slower | Architecture, complex debugging, audits | +**Total score:** 0 = Tier 0, 1-2 = Tier 1, 3-5 = Tier 2, 6+ = Tier 3. -**Example Cost Savings:** +## Context Modifiers -``` -Scenario: 100 tasks per day (mix of simple and complex) - -Without Routing (all Sonnet): - 100 tasks × 1000 tokens avg × $0.015 = $1.50/day - Annual: $547.50 - -With Smart Routing: - 50 tasks (native tools) × $0 = $0 - 30 tasks (Haiku) × $0.004 = $0.12 - 15 tasks (Sonnet) × $0.015 = $0.22 - 5 tasks (Opus) × $0.075 = $0.37 - Total: $0.71/day - Annual: $259.15 - -Savings: $288.35/year (52% reduction) for single developer -``` - -### Performance Benefits - -Beyond cost savings, routing improves: -- **Speed:** Fast models return results in 1-2s vs 10-15s for premium -- **Throughput:** Process 5x more simple tasks in parallel -- **Resource efficiency:** Save premium model quota for critical tasks -- **User experience:** Instant results for simple operations - -## The 4-Tier Routing System - -### Tier 0: Native Tools (No LLM) - -**When to Use:** -- File operations (search, rename, move, copy) -- Content search (grep, regex) -- Code formatting (prettier, black, go fmt) -- Git operations (status, log, diff) -- Single-file edits with clear pattern - -**Indicators:** -- Keywords: "find", "search", "format", "rename", "list", "show" -- Patterns: Single regex, exact string replacement, file path operations - -**Cost:** $0 -**Speed:** Instant (< 0.1s) - -**Examples:** -``` -✓ "Find all .tsx files in src/" -✓ "Search for 'TODO' comments" -✓ "Format code with prettier" -✓ "Rename Button.js to Button.tsx" -✓ "Show git status" -✓ "Replace 'oldName' with 'newName' in file.ts" -``` - -**Implementation:** -``` -Task: "Find all TypeScript files" -→ Use Glob tool: *.ts -→ No LLM needed - -Task: "Search for API endpoints" -→ Use Grep tool: "app\.(get|post|put|delete)" -→ No LLM needed - -Task: "Format all code" -→ Use Bash: bun run format -→ No LLM needed -``` - ---- - -### Tier 1: Fast Model (Haiku) - -**When to Use:** -- Simple code changes (add comment, fix typo, rename variable) -- Documentation updates (README, JSDoc, inline comments) -- Straightforward bug fixes (missing import, syntax error) -- Code explanation (what does this function do?) -- Simple test writing (unit test for pure function) - -**Indicators:** -- Keywords: "simple", "basic", "small", "quick", "minor", "add", "fix", "update" -- Scope: Single file, < 50 lines changed -- Complexity: No architectural decisions, clear solution - -**Cost:** ~$0.0004 per task (1000 tokens) -**Speed:** Fast (1-3s response time) - -**Examples:** -``` -✓ "Add JSDoc comment to calculateTotal function" -✓ "Fix typo in error message" -✓ "Rename getUserData to fetchUserData" -✓ "Update README with new installation steps" -✓ "Add missing import statement" -✓ "Write unit test for add(a, b) function" -``` - -**Anti-Patterns (Don't Use Haiku For):** -``` -✗ "Design authentication system" (needs Opus) -✗ "Refactor entire codebase" (needs Sonnet + context) -✗ "Debug complex race condition" (needs Opus) -✗ "Architect database schema" (needs Opus) -``` - ---- - -### Tier 2: Standard Model (Sonnet) - -**When to Use:** -- Standard feature implementation (new component, API endpoint) -- Multi-file refactoring (rename class, extract service) -- Integration tasks (connect frontend to backend) -- Moderate bug fixes (logic errors, edge cases) -- Test suites (integration tests, E2E tests) - -**Indicators:** -- Keywords: "implement", "create", "build", "refactor", "integrate", "develop" -- Scope: 2-10 files, 50-500 lines changed -- Complexity: Requires understanding context, moderate problem-solving - -**Cost:** ~$0.003 per task (1000 tokens) -**Speed:** Moderate (5-10s response time) - -**Examples:** -``` -✓ "Implement user profile page with React" -✓ "Create REST API endpoint for /users/:id" -✓ "Refactor authentication logic into AuthService" -✓ "Fix pagination bug in user list" -✓ "Write integration tests for payment flow" -✓ "Add error handling to API calls" -``` - -**This is the Default Tier:** -When in doubt, use Sonnet. It handles 70% of standard development tasks well. - ---- - -### Tier 3: Premium Model (Opus) - -**When to Use:** -- Architecture decisions (system design, database schema) -- Complex debugging (race conditions, memory leaks, security issues) -- Security audits (vulnerability analysis, threat modeling) -- Performance optimization (algorithm complexity, bottleneck analysis) -- Code review (deep analysis, architectural feedback) -- Critical bug fixes (production outages, data corruption) - -**Indicators:** -- Keywords: "architect", "design", "audit", "complex", "system-wide", "critical", "optimize" -- Scope: System-wide impact, 10+ files, architectural changes -- Complexity: Requires deep reasoning, multiple trade-offs - -**Cost:** ~$0.015 per task (1000 tokens) -**Speed:** Slower (15-30s response time) - -**Examples:** -``` -✓ "Design microservices architecture for e-commerce platform" -✓ "Audit authentication system for security vulnerabilities" -✓ "Debug intermittent race condition in WebSocket handler" -✓ "Optimize algorithm for 1M+ record processing" -✓ "Review entire codebase for architectural issues" -✓ "Design database schema for multi-tenant SaaS" -``` - -**When to Escalate to Opus:** -``` -Task starts in Sonnet, but: - - Task fails after 2 attempts → Escalate to Opus - - User explicitly says "this is complex" → Escalate to Opus - - Implementation reveals architectural issues → Escalate to Opus - - Performance/security concerns discovered → Escalate to Opus -``` - ---- - -## Complexity Detection Heuristics - -### Keyword-Based Routing - -**Scoring Algorithm:** - -``` -Step 1: Extract keywords from user request - -Step 2: Score each keyword: - Tier 0 indicators: +0 points - - find, search, list, show, format, rename, move, copy, grep - - Tier 1 indicators: +1 point - - simple, basic, small, quick, minor, add, fix, update, comment - - Tier 2 indicators: +2 points - - implement, create, build, refactor, integrate, develop, feature - - Tier 3 indicators: +3 points - - architect, design, audit, complex, system-wide, critical, optimize - -Step 3: Calculate total score - Score 0: Use native tools (Tier 0) - Score 1-2: Use Haiku (Tier 1) - Score 3-5: Use Sonnet (Tier 2) - Score 6+: Use Opus (Tier 3) - -Step 4: Apply context modifiers (next section) -``` - -**Example Scoring:** - -``` -Request: "Add simple comment to function" -Keywords: "add" (+1), "simple" (+1), "comment" (+1) -Score: 3 → Sonnet (Tier 2) -Context Modifier: Single file → -1 → Score 2 → Haiku (Tier 1) - -Request: "Implement user authentication" -Keywords: "implement" (+2), "authentication" (+3) -Score: 5 → Sonnet (Tier 2) - -Request: "Design microservices architecture" -Keywords: "design" (+3), "microservices" (+3), "architecture" (+3) -Score: 9 → Opus (Tier 3) - -Request: "Find all TODO comments" -Keywords: "find" (+0) -Score: 0 → Native tools (Tier 0) -``` - ---- - -### Context-Based Routing - -**File Count Modifier:** - -``` -Files affected (from user context or codebase analysis): - 1 file → -1 tier (simpler) - 2-5 files → No modifier - 6-10 files → +0 tier (standard) - 11+ files → +1 tier (complex) - -Example: - Task: "Refactor authentication" (base Tier 2) - Context: Affects 15 files - Modifier: +1 tier → Opus (Tier 3) -``` - -**Code Complexity Modifier:** - -``` -Indicators of complexity (increase tier): - - Async/await patterns (+1) - - Error handling required (+1) - - Database transactions (+1) - - Security implications (+2) - - Performance critical (+2) - - System-wide impact (+2) - -Example: - Task: "Fix login bug" (base Tier 2) - Context: Security implications (+2) - Final: Opus (Tier 3) -``` - -**User Context Modifier:** - -``` -User explicitly signals complexity: - "This is simple" → -1 tier - "This is complex" → +1 tier - "Be careful" → +1 tier - "Quick task" → -1 tier - "Critical" → +1 tier - -Example: - Task: "Update README" (base Tier 1) - User: "Be careful, this affects onboarding" - Modifier: +1 tier → Sonnet (Tier 2) -``` - ---- - -### Override Patterns - -**User-Specified Model:** - -``` -User explicitly requests tier: - "Use Haiku to add comment" - → Override routing, use Haiku (Tier 1) - - "Use Opus to review this" - → Override routing, use Opus (Tier 3) - -Priority: User override > Routing algorithm -``` - -**Fallback Strategy:** - -``` -When routing is uncertain: - - Score is borderline (e.g., 2.5 between tiers) - - Mixed signals (simple keywords, complex context) - - No clear indicators - -Action: Default to Sonnet (Tier 2) - - Safe choice for most tasks - - Not too expensive ($0.003 vs $0.015) - - Good quality for standard work - - Can escalate to Opus if needed -``` - -**Emergency Escalation:** - -``` -Task fails at current tier: - Attempt 1: Use routed tier (e.g., Haiku) - Attempt 2: Same tier, different approach - Attempt 3: Escalate +1 tier (e.g., Sonnet) - Attempt 4: Escalate to Opus (highest tier) - -Example: - Task: "Fix subtle bug" → Haiku (Tier 1) - Result: Fails to identify root cause - → Retry with Haiku (different prompt) - Result: Still fails - → Escalate to Sonnet (Tier 2) - Result: Identifies bug, fixes it ✓ -``` - ---- - -## Cost Optimization Patterns - -### Cost-Benefit Analysis - -**When to Upgrade Tier:** - -``` -Upgrade from Tier 1 (Haiku) to Tier 2 (Sonnet): - Cost increase: $0.002 (0.5x more) - - Upgrade when: - - Task failed 2+ times at Tier 1 - - Task requires multi-file context - - Risk of incorrect solution is high - - Time spent debugging > cost savings - -Upgrade from Tier 2 (Sonnet) to Tier 3 (Opus): - Cost increase: $0.012 (5x more) - - Upgrade when: - - Task has critical security/performance implications - - Architecture decisions needed - - Task failed 2+ times at Tier 2 - - Complex reasoning required (trade-offs, edge cases) -``` - -**When to Downgrade Tier:** - -``` -Downgrade from Tier 2 (Sonnet) to Tier 1 (Haiku): - Cost savings: $0.002 (50% reduction) - - Downgrade when: - - Subtask is simpler than parent task - - Clear, straightforward solution exists - - Single file, < 50 lines changed - - No architectural decisions needed - -Example: - Main task: "Implement user profile" → Sonnet (Tier 2) - Subtask 1: "Add JSDoc to ProfileCard" → Haiku (Tier 1) - Subtask 2: "Write unit test for formatDate" → Haiku (Tier 1) - Subtask 3: "Integrate with API" → Sonnet (Tier 2) -``` - ---- - -### Cost Tracking Integration - -**Track Costs Per Task:** - -``` -Task: "Implement user authentication" -Model: Claude Sonnet 4.5 -Tokens: 1500 input, 3000 output -Cost: (1500 × $0.003 / 1000) + (3000 × $0.015 / 1000) - = $0.0045 + $0.045 - = $0.0495 - -Log to performance tracking: - { - "task": "Implement user authentication", - "tier": 2, - "model": "claude-sonnet-4-5", - "tokens_in": 1500, - "tokens_out": 3000, - "cost": 0.0495, - "duration_seconds": 8, - "success": true - } -``` - -**Aggregate Cost Metrics:** - -``` -Daily Cost Report: - Tier 0 (Native): 45 tasks, $0.00 - Tier 1 (Haiku): 30 tasks, $0.12 ($0.004 avg) - Tier 2 (Sonnet): 20 tasks, $0.60 ($0.030 avg) - Tier 3 (Opus): 5 tasks, $0.75 ($0.150 avg) - - Total: 100 tasks, $1.47 - Routing efficiency: 75% tasks in Tier 0-1 (cheap) -``` - -**Cost Optimization Insights:** - -``` -Analysis: Top 5 expensive tasks - 1. "Design database schema" - Opus - $0.25 - 2. "Security audit" - Opus - $0.20 - 3. "Refactor auth system" - Sonnet - $0.08 - 4. "Implement profile page" - Sonnet - $0.06 - 5. "Debug race condition" - Opus - $0.15 - -Recommendation: - - Tasks 1-2 correctly used Opus (architecture + security) - - Task 3 could have been split into subtasks (Haiku for simple parts) - - Task 5 correctly escalated to Opus after Sonnet failed -``` - ---- +| Signal | Modifier | +|--------|----------| +| 1 file affected | -1 tier | +| 11+ files affected | +1 tier | +| Security implications | +2 tiers | +| User says "simple" / "quick" | -1 tier | +| User says "complex" / "critical" | +1 tier | -## Integration with Orchestration Plugin - -### Routing + Multi-Agent Coordination - -**Pattern: Task Delegation with Routing** - -``` -Workflow: Implement feature with multiple agents - -Step 1: Architect designs system (Tier 3 - Opus) - Task: "Design authentication architecture" - Model: Opus (complex, architectural) - Output: Architecture plan - -Step 2: Split into subtasks with routing - Subtask 1: "Implement JWT service" → Sonnet (Tier 2) - Subtask 2: "Add JSDoc comments" → Haiku (Tier 1) - Subtask 3: "Write unit tests" → Haiku (Tier 1) - Subtask 4: "Integrate with API" → Sonnet (Tier 2) - -Step 3: Delegate to appropriate agents - backend-developer (Sonnet): Subtasks 1, 4 - documenter (Haiku): Subtask 2 - test-writer (Haiku): Subtask 3 - -Cost: - Without routing: 4 tasks × Sonnet = $0.12 - With routing: 1×Opus ($0.015) + 2×Sonnet ($0.06) + 2×Haiku ($0.008) = $0.083 - Savings: 31% -``` - ---- - -### Routing + Multi-Model Validation - -**Pattern: Tiered Review for Cost Efficiency** - -``` -Use Case: Code review with budget constraints - -Step 1: Fast pre-review (Tier 1 - Haiku) - Task: "Check for obvious issues: syntax, imports, formatting" - Model: Haiku (fast, cheap) - Output: Found 3 obvious issues (missing import, typo, formatting) - -Step 2: Fix obvious issues - Developer fixes issues found by Haiku - -Step 3: Deep review (Tier 3 - Opus) - Task: "Security audit and architectural review" - Model: Opus (complex, critical) - Output: Found 2 security issues - -Cost: - Without tiering: Opus review all issues = $0.025 - With tiering: Haiku pre-review ($0.001) + Opus deep review ($0.015) = $0.016 - Savings: 36% - Benefit: Haiku caught simple issues fast, Opus focused on complex issues -``` - ---- - -### Routing + Quality Gates - -**Pattern: Escalation on Failure** +## Example: Routing a Request ``` -Workflow: Test-driven development with escalation +Request: "Add user authentication" +Keywords: "add" (+1), "authentication" (+3) → Score 4 → Sonnet (Tier 2) +Context: Security implications (+2) → Score 6 → Opus (Tier 3) -Iteration 1: Run tests (Tier 2 - Sonnet) - Task: "Analyze test failures and fix" - Model: Sonnet - Result: Fixed 8/10 failures - Cost: $0.003 - -Iteration 2: Re-run tests (Tier 2 - Sonnet) - Task: "Fix remaining 2 failures" - Model: Sonnet - Result: Still failing (complex race condition) - Cost: $0.003 - -Iteration 3: Escalate to Opus (Tier 3) - Task: "Debug complex race condition" - Model: Opus (escalated due to failure) - Result: Identified root cause, fixed ✓ - Cost: $0.015 - -Total Cost: $0.021 -Without escalation: 3 × Opus = $0.045 (214% more expensive) -With escalation: Try cheaper first, upgrade only when needed +Subtask breakdown: + "Design auth architecture" → Opus ($0.015) + "Implement JWT service" → Sonnet ($0.030) + "Add JSDoc comments" → Haiku ($0.004) + "Write unit tests" → Haiku ($0.004) + Total: $0.053 vs $0.105 all-Opus (50% savings) ``` ---- - -### Routing + Proxy Mode - -**Pattern: External Model Routing** - -``` -Use Case: Use external fast models for simple tasks - -Task: "Add comments to functions" -Complexity: Simple (Tier 1) - -Option 1: Claude Haiku 4.5 (OpenRouter) - Cost: $0.004 - Speed: 2s - -Option 2: DeepSeek Coder (OpenRouter) - Cost: $0.001 (75% cheaper than Haiku) - Speed: 3s - -Option 3: Grok Code Fast (OpenRouter) - Cost: $0.002 (50% cheaper than Haiku) - Speed: 1s (fastest) - -Routing Decision: - If speed priority: Use Grok Code Fast - If cost priority: Use DeepSeek Coder - If balance: Use Claude Haiku (best quality/cost/speed) - -Implementation: - Task: task-executor - Model: grok-code-fast-1 - Prompt: "Add JSDoc comments to all functions in UserService.ts" - claudish: x-ai/grok-code-fast-1 -``` - ---- - -## Best Practices - -**Do:** -- ✅ Use native tools (Tier 0) for file operations and searches (instant, free) -- ✅ Start with cheaper tiers and escalate only when needed -- ✅ Track costs per task for optimization insights -- ✅ Set max cost budgets per workflow (e.g., $0.10 per feature) -- ✅ Split complex tasks into simpler subtasks (route each separately) -- ✅ Use Haiku for documentation, comments, simple tests -- ✅ Use Sonnet as default for standard development -- ✅ Reserve Opus for architecture, security, complex debugging -- ✅ Escalate tier after 2 failures at current tier -- ✅ Document routing decisions for team alignment - -**Don't:** -- ❌ Use Opus for every task (waste money, slower) -- ❌ Use Haiku for complex tasks (will fail, waste time) -- ❌ Ignore user context (security/critical tasks need higher tier) -- ❌ Skip cost tracking (can't optimize what you don't measure) -- ❌ Over-optimize for cost (quality matters more than pennies) -- ❌ Use LLM when native tools work (search, format, rename) -- ❌ Downgrade tier for critical tasks (security, production bugs) -- ❌ Route without fallback strategy (always have Sonnet default) - -**Performance Benchmarks:** - -``` -Response Time (avg): - Native Tools: < 0.1s - Haiku: 1-3s - Sonnet: 5-10s - Opus: 15-30s - -Cost Efficiency (per 1000 tasks): - All Opus: $150 (baseline) - All Sonnet: $30 (80% savings) - Smart Routing: $7 (95% savings vs Opus, 77% vs Sonnet) - -Quality (task success rate): - Native Tools: 100% (deterministic) - Haiku: 85% (simple tasks only) - Sonnet: 95% (most tasks) - Opus: 98% (all tasks) -``` - ---- - -## Examples - -### Example 1: Analyzing User Request and Determining Tier - -**Scenario:** User requests feature implementation - -**User Request:** -``` -"Add user authentication to the app" -``` - -**Routing Analysis:** - -``` -Step 1: Extract keywords - Keywords: "add" (+1), "authentication" (+3) - Base Score: 4 → Sonnet (Tier 2) - -Step 2: Analyze context - Scope: Multiple files (routes, middleware, database) - Complexity: Security implications (+2) - Adjusted Score: 6 → Opus (Tier 3) - -Step 3: Routing decision - Tier: 3 (Opus) - Reason: Security-critical, architectural decision - Cost: ~$0.015 per subtask - -Step 4: Task breakdown with routing - Main task → Opus: - "Design authentication architecture" - Output: Architecture plan with security considerations - - Subtasks → Route separately: - 1. "Implement JWT service" → Sonnet (Tier 2) - 2. "Add password hashing" → Sonnet (Tier 2, security) - 3. "Create login endpoint" → Sonnet (Tier 2) - 4. "Add JSDoc comments" → Haiku (Tier 1) - 5. "Write unit tests" → Haiku (Tier 1) - 6. "Write integration tests" → Sonnet (Tier 2) - -Total Cost: - 1 × Opus ($0.015) + 4 × Sonnet ($0.012) + 2 × Haiku ($0.008) - = $0.035 - -Without routing (all Opus): - 7 × Opus = $0.105 - Savings: $0.07 (67% reduction) -``` - ---- - -### Example 2: Multi-Task Workflow with Mixed Tiers - -**Scenario:** Daily development workflow - -**Tasks:** - -``` -Task 1: "Find all TypeScript files in src/" - Keywords: "find" (+0) - Routing: Tier 0 (Native Tools) - Tool: Glob "src/**/*.ts" - Cost: $0 - Time: < 0.1s +**Verification:** Confirm the tier selection matches the scoring algorithm before dispatching. -Task 2: "Format all code" - Keywords: "format" (+0) - Routing: Tier 0 (Native Tools) - Tool: Bash "bun run format" - Cost: $0 - Time: 2s +## Escalation Protocol -Task 3: "Add JSDoc to UserService" - Keywords: "add" (+1), "simple" (implied) - Routing: Tier 1 (Haiku) - Model: claude-haiku-4-5 - Cost: $0.004 - Time: 2s +1. Attempt 1: Use routed tier. +2. Attempt 2: Same tier, different approach. +3. Attempt 3: Escalate +1 tier. +4. Attempt 4: Escalate to Opus (maximum). -Task 4: "Implement user profile page" - Keywords: "implement" (+2) - Context: Multiple files (component, styles, API) - Routing: Tier 2 (Sonnet) - Model: claude-sonnet-4-5 - Cost: $0.030 - Time: 8s - -Task 5: "Fix pagination bug" - Keywords: "fix" (+1), "bug" (+2) - Context: Requires debugging - Routing: Tier 2 (Sonnet) - Model: claude-sonnet-4-5 - Cost: $0.025 - Time: 7s - -Task 6: "Security audit of auth system" - Keywords: "security" (+3), "audit" (+3) - Context: Critical, system-wide - Routing: Tier 3 (Opus) - Model: claude-opus-4-5 - Cost: $0.150 - Time: 25s - -Daily Summary: - Total Tasks: 6 - Native Tools: 2 tasks, $0, 2s - Haiku: 1 task, $0.004, 2s - Sonnet: 2 tasks, $0.055, 15s - Opus: 1 task, $0.150, 25s - - Total Cost: $0.209 - Total Time: 44s - -Without Routing (all Sonnet): - 6 tasks × $0.030 avg = $0.180 (but slower, less optimized) - -Without Routing (all Opus): - 6 tasks × $0.150 avg = $0.900 (4.3× more expensive) -``` - ---- - -### Example 3: Cost Comparison Showing Savings from Routing - -**Scenario:** Weekly development for small team (3 developers) - -**Task Distribution:** - -``` -Week 1: 150 total tasks - -Task Breakdown (with smart routing): - Tier 0 (Native): 50 tasks (file ops, searches, formatting) - Cost: $0 - Time: 50 × 1s = 50s - - Tier 1 (Haiku): 45 tasks (comments, docs, simple fixes) - Cost: 45 × $0.004 = $0.18 - Time: 45 × 2s = 90s - - Tier 2 (Sonnet): 45 tasks (features, refactors, integrations) - Cost: 45 × $0.030 = $1.35 - Time: 45 × 8s = 360s - - Tier 3 (Opus): 10 tasks (architecture, security, complex bugs) - Cost: 10 × $0.150 = $1.50 - Time: 10 × 25s = 250s - -Total with Routing: - Cost: $3.03 - Time: 750s (12.5 min) - Tasks completed: 150 - -Cost Comparison: - -Strategy 1: All Opus (premium everywhere) - 150 tasks × $0.150 = $22.50 - Difference: +$19.47 (7.4× more expensive) - -Strategy 2: All Sonnet (standard everywhere) - 150 tasks × $0.030 = $4.50 - Difference: +$1.47 (1.5× more expensive) - -Strategy 3: Random 50/50 Haiku/Sonnet (no routing logic) - 75 × $0.004 + 75 × $0.030 = $2.55 - Quality issues: 15 tasks failed (Haiku used for complex) - Re-work: 15 × $0.030 = $0.45 - Total: $3.00 (similar cost, but failures and delays) - -Strategy 4: Smart Routing (this approach) - Cost: $3.03 - Quality: High (each task matched to appropriate tier) - Speed: Optimal (fast models for simple, premium for complex) - -Annual Savings (3 developers × 52 weeks): - vs All Opus: 52 × $19.47 = $1,012.44 - vs All Sonnet: 52 × $1.47 = $76.44 - vs Random: 52 × $0.45 (re-work) = $23.40 - -ROI: - Save $1,000+ annually vs premium-everywhere approach - Save $76 annually vs standard-everywhere approach - Better quality than random routing -``` - ---- - -## Troubleshooting - -**Problem: Task routed to Haiku, but failed** - -Cause: Task more complex than keywords suggested - -Solution: Escalate to next tier (Sonnet) and retry - -``` -❌ Wrong: - Task fails with Haiku - → Give up or ask user for help - -✅ Correct: - Task fails with Haiku (Attempt 1) - → Retry with different prompt (Attempt 2) - → Still fails? Escalate to Sonnet (Attempt 3) - → Fixed ✓ -``` - ---- - -**Problem: Too many tasks routed to Opus (high costs)** - -Cause: Overly aggressive routing, not splitting tasks - -Solution: Break complex tasks into simpler subtasks - -``` -❌ Wrong: - Task: "Implement user authentication" → Opus - Cost: $0.150 - -✅ Correct: - Main task: "Design auth architecture" → Opus ($0.015) - Subtask 1: "Implement JWT service" → Sonnet ($0.030) - Subtask 2: "Add JSDoc comments" → Haiku ($0.004) - Subtask 3: "Write tests" → Haiku ($0.004) - Total: $0.053 (65% savings) -``` - ---- - -**Problem: Routing adds complexity, slows down workflow** - -Cause: Over-engineering routing for small projects - -Solution: Use simple heuristics for small teams - -``` -Simple Routing Rule (for small projects): - - File operations → Native tools - - Single file < 50 lines → Haiku - - Everything else → Sonnet - - User says "complex" → Opus - -No need for complex scoring algorithms for < 50 tasks/day. -``` - ---- - -**Problem: Can't track costs (no visibility)** - -Cause: Missing cost tracking integration - -Solution: Add logging to track cost per task - -``` -After each task: - Log to performance-tracking.json: - { - "task": "Implement profile page", - "tier": 2, - "model": "claude-sonnet-4-5", - "tokens_in": 1200, - "tokens_out": 2800, - "cost": 0.0456, - "duration_seconds": 8, - "success": true, - "timestamp": "2026-01-28T10:30:00Z" - } - -Weekly aggregation: - Generate cost report with breakdown by tier -``` - ---- - -## Summary - -Task complexity routing optimizes AI workflows by: - -- **4-tier system** (Native → Haiku → Sonnet → Opus) -- **Complexity detection** (keyword-based + context-based scoring) -- **Cost optimization** (60-90% savings vs premium-everywhere) -- **Dynamic escalation** (upgrade tier when task fails) -- **Quality maintenance** (match task complexity to model capability) - -**Key Takeaways:** - -1. **Use native tools first** (free, instant) -2. **Start cheap, escalate as needed** (Haiku → Sonnet → Opus) -3. **Track costs to optimize** (measure, analyze, improve) -4. **Split complex tasks** (route subtasks separately) -5. **Reserve premium for critical** (security, architecture, debugging) - -Master routing and you'll deliver faster results at a fraction of the cost. - ---- +## Notes -**Inspired By:** -- Cost optimization patterns from production AI workflows -- Multi-tier routing systems (CDN, load balancers, database replication) -- Google's multi-model serving strategy (PaLM vs Gemini routing) -- AWS Lambda tiered pricing (pay for what you use) -- Performance budgets in frontend development (optimize critical path) +- Default to Sonnet (Tier 2) when scoring is ambiguous +- User model override always takes priority over routing algorithm +- Split complex tasks into subtasks and route each separately for best cost efficiency +- Use native tools (Tier 0) for all file operations — they are instant and free diff --git a/plugins/nanobanana/skills/gemini-api/SKILL.md b/plugins/nanobanana/skills/gemini-api/SKILL.md index 5266ca3..5b43779 100644 --- a/plugins/nanobanana/skills/gemini-api/SKILL.md +++ b/plugins/nanobanana/skills/gemini-api/SKILL.md @@ -1,9 +1,7 @@ --- name: gemini-api -description: Google Gemini 3 Pro Image API reference. Covers text-to-image, editing, reference images, aspect ratios, and error handling. +description: "Google Gemini 3 Pro Image API reference for text-to-image generation, image editing, and reference-based styling. Covers API key setup, CLI flags, aspect ratios, error codes, and retry behavior. Use when generating images with Gemini, configuring image generation parameters, or troubleshooting API errors." --- -plugin: nanobanana -updated: 2026-01-20 # Gemini Image API Reference @@ -40,11 +38,8 @@ uv run python main.py output.png "A minimal 3D cube" | 1:1 | Social media, icons | | 3:4 | Portrait photos | | 4:3 | Traditional photos | -| 4:5 | Instagram portrait | -| 5:4 | Landscape photos | -| 9:16 | Mobile, stories | | 16:9 | YouTube, desktop | -| 21:9 | Cinematic, ultrawide | +| 9:16 | Mobile, stories | ## CLI Flags @@ -61,33 +56,18 @@ uv run python main.py output.png "A minimal 3D cube" | Code | Meaning | Recovery | |------|---------|----------| -| `SUCCESS` | Operation completed | N/A | | `API_KEY_MISSING` | GEMINI_API_KEY not set | Export the variable | -| `FILE_NOT_FOUND` | Referenced file missing | Check path | -| `INVALID_INPUT` | Bad prompt or argument | Fix input | | `RATE_LIMITED` | Too many requests | Wait, uses auto-retry | -| `NETWORK_ERROR` | Connection failed | Check network, auto-retry | -| `API_ERROR` | Gemini API error | Check logs | | `CONTENT_POLICY` | Blocked prompt | Adjust content | -| `TIMEOUT` | Request timed out | Retry | -| `PARTIAL_FAILURE` | Some batch items failed | Check individual results | +| `NETWORK_ERROR` | Connection failed | Check network, auto-retry | ## Retry Behavior -The script automatically retries on transient errors: -- Rate limits (429) -- Server errors (502, 503) -- Connection timeouts -- Network errors - -Retry uses exponential backoff: 1s, 2s, 4s, 8s, etc. -Maximum retries configurable with `--max-retries` (default: 3) +Automatically retries on rate limits (429), server errors (502, 503), and connection timeouts using exponential backoff (1s, 2s, 4s, 8s). Configure max retries with `--max-retries` (default: 3). -## Best Practices +## Notes -1. **Prompts**: Be specific about style, lighting, composition -2. **Styles**: Use markdown templates for consistent results -3. **References**: Provide visual examples for style matching -4. **Batch**: Generate variations to pick the best -5. **Iteration**: Edit results to refine -6. **Retries**: Increase `--max-retries` for unreliable connections +- Be specific about style, lighting, and composition in prompts +- Use markdown style templates for consistent results +- Provide visual reference images for style matching +- Increase `--max-retries` for unreliable connections diff --git a/plugins/nanobanana/skills/style-format/SKILL.md b/plugins/nanobanana/skills/style-format/SKILL.md index c3555fb..0e25755 100644 --- a/plugins/nanobanana/skills/style-format/SKILL.md +++ b/plugins/nanobanana/skills/style-format/SKILL.md @@ -1,16 +1,11 @@ --- name: style-format -description: Style template format specification. Single markdown files that describe artistic direction. +description: "Defines the markdown-based style template format for AI image generation. Specifies file structure, color palettes, technical rendering notes, and security validation rules. Use when creating style templates, customizing artistic direction for image generation, or combining styles with reference images." --- -plugin: nanobanana -updated: 2026-01-20 # Style Format Specification -## Overview - -Styles are **single markdown files** in the `styles/` directory. -The entire file content is prepended to the user's prompt. +Styles are single markdown files in the `styles/` directory. The entire file content is prepended to the user's prompt to define artistic direction. ## File Location @@ -27,14 +22,12 @@ The entire file content is prepended to the user's prompt. ```markdown # Style Name -{Overall description of the visual style. Be vivid and specific. -Include mood, atmosphere, key visual characteristics.} +{Overall description of the visual style — mood, atmosphere, key visual characteristics.} ## Color Palette - Primary: {color} ({hex}) - Secondary: {color} ({hex}) - Background: {color} ({hex}) -- Accents: {colors} ## Technical Notes {Rendering style, lighting, materials, post-processing} @@ -50,7 +43,6 @@ A photorealistic 3D render with blue glass material. Objects should have: - Subtle reflections and refractions - Solid black background - Soft studio lighting from above-left -- Sharp shadows ## Color Palette - Primary: Deep blue (#1a4b8c) @@ -60,7 +52,6 @@ A photorealistic 3D render with blue glass material. Objects should have: ## Technical Notes - Use ray-traced rendering appearance - Include caustic light effects -- Maintain consistent material across objects ``` ## Usage @@ -69,10 +60,12 @@ A photorealistic 3D render with blue glass material. Objects should have: # Apply style to generation uv run python main.py out.png "gear icon" --style styles/glass.md -# Combine with reference +# Combine style with reference image uv run python main.py out.png "cube" --style styles/glass.md --ref prev.png ``` +**Verification:** Confirm the style file exists at the specified path before passing to the generator. + ## Style vs Reference | Concept | Type | Purpose | @@ -82,20 +75,13 @@ uv run python main.py out.png "cube" --style styles/glass.md --ref prev.png Both can be combined for best results. -## Security Notes - -Style files are validated for potential injection patterns: -- No bash/shell code blocks -- No variable expansion (${ }) -- No command substitution ($( )) -- No shell operators (& | ; `) +## Security -Suspicious patterns generate warnings but don't block creation. +Style files are validated for injection patterns (no shell code blocks, variable expansion, or command substitution). Suspicious patterns generate warnings. -## Writing Effective Styles +## Notes -1. **Be Specific**: "Soft watercolor washes with visible paper texture" -2. **Include Colors**: Hex codes ensure consistency -3. **Describe Mood**: "Mysterious, slightly unsettling" -4. **Technical Details**: Lighting, camera angle, rendering style -5. **Keep It Focused**: One style per file +- Be specific: "Soft watercolor washes with visible paper texture" +- Include hex color codes for consistency +- Describe mood and atmosphere explicitly +- One style per file for clarity diff --git a/plugins/seo/skills/analytics-interpretation/SKILL.md b/plugins/seo/skills/analytics-interpretation/SKILL.md index b1c606b..9ffad86 100644 --- a/plugins/seo/skills/analytics-interpretation/SKILL.md +++ b/plugins/seo/skills/analytics-interpretation/SKILL.md @@ -1,25 +1,20 @@ --- -plugin: seo -updated: 2026-01-20 name: analytics-interpretation -description: Interpret GA4 and GSC data with benchmarks, status indicators, and actionable insights +description: "Interpret GA4 and GSC data by comparing metrics against benchmarks, diagnosing cross-source patterns, and calculating content health scores. Use when analyzing content performance or explaining analytics to stakeholders." --- -plugin: seo -updated: 2026-01-20 # Analytics Interpretation -## When to Use +## Workflow -- Analyzing content performance reports -- Understanding traffic patterns -- Interpreting search console data -- Making data-driven content decisions -- Explaining metrics to stakeholders +1. **Collect GA4 and GSC metrics** -- gather engagement and search performance data +2. **Compare against benchmarks** -- classify each metric as Good, Warning, or Poor +3. **Cross-reference signals** -- combine GA4 engagement with GSC search data for diagnosis +4. **Calculate health score** -- compute weighted content health score (0-100) +5. **Detect anomalies** -- flag significant week-over-week changes +6. **Generate report** -- produce executive summary with prioritized actions -## Metric Benchmarks - -### Google Analytics 4 (GA4) +## Step 1: GA4 Benchmarks | Metric | Good | Warning | Poor | Action When Poor | |--------|------|---------|------|------------------| @@ -29,124 +24,48 @@ updated: 2026-01-20 | Scroll Depth | >75% | 50-75% | <50% | Add visual breaks, improve structure | | Pages/Session | >2.5 | 1.5-2.5 | <1.5 | Improve internal linking | -### Google Search Console (GSC) +## Step 2: GSC Benchmarks | Metric | Good | Warning | Poor | Action When Poor | |--------|------|---------|------|------------------| | CTR | >5% | 2-5% | <2% | Improve title/meta description | | Avg Position | 1-3 | 4-10 | >10 | Strengthen content, build links | | Impressions Trend | Growing | Stable | Declining | Refresh content, target new keywords | -| Mobile Usability | PASS | - | FAIL | Fix mobile issues immediately | | Core Web Vitals | GOOD | NEEDS_IMPROVEMENT | POOR | Optimize performance | -## Interpreting Combined Signals - -### Traffic Quality Matrix +## Step 3: Cross-Source Diagnosis -``` - High Engagement - │ - ┌──────────────┼──────────────┐ - │ HIDDEN GEM │ STAR │ - │ Low traffic │ High traffic│ - │ High quality│ High quality│ - │ → Promote │ → Maintain │ -Low ───────┼──────────────┼──────────────┼─── High -Traffic │ │ │ Traffic - │ UNDERPERFORM│ LEAKY │ - │ Low traffic │ High traffic│ - │ Low quality │ Low quality │ - │ → Rework │ → Optimize │ - └──────────────┼──────────────┘ - │ - Low Engagement -``` +| GSC Signal | GA4 Signal | Diagnosis | Action | +|------------|------------|-----------|--------| +| High impressions | Low clicks | Title/meta mismatch | Rewrite meta description | +| High CTR | High bounce | Content doesn't deliver | Align content with search intent | +| Low CTR | High engagement | Hidden gem | Improve snippet to get more clicks | +| Growing impressions | Stable clicks | Ranking improving | Optimize CTR while momentum builds | -### Search Intent Alignment - -| GSC Signal | GA4 Signal | Interpretation | -|------------|------------|----------------| -| High impressions | Low clicks | Title/meta mismatch with intent | -| High CTR | High bounce | Content doesn't deliver on promise | -| Low CTR | High engagement (when clicked) | Hidden gem, improve snippet | -| Growing impressions | Stable clicks | Ranking improving, CTR opportunity | - -## Score Calculation Methodology - -### Content Health Score (0-100) +### Traffic Quality Matrix -``` -health_score = ( - engagement_score × 0.30 + - seo_score × 0.30 + - ranking_score × 0.20 + - trend_score × 0.20 -) -``` +| Quadrant | Traffic | Engagement | Action | +|----------|---------|------------|--------| +| STAR | High | High | Maintain, replicate pattern | +| HIDDEN GEM | Low | High | Promote via internal links and social | +| LEAKY | High | Low | Optimize content quality | +| UNDERPERFORMER | Low | Low | Rework or consolidate | -**Component Calculations:** +## Step 4: Calculate Health Score ``` -engagement_score = normalize( - time_on_page_score × 0.4 + - bounce_rate_score × 0.3 + - scroll_depth_score × 0.3 -) - -seo_score = normalize( - ctr_score × 0.4 + - position_score × 0.4 + - impressions_growth × 0.2 -) - -ranking_score = normalize( - avg_position × 0.5 + - visibility_score × 0.3 + - keyword_coverage × 0.2 -) - -trend_score = normalize( - traffic_trend × 0.4 + - ranking_trend × 0.3 + - engagement_trend × 0.3 -) -``` - -### Score Interpretation - -| Score | Rating | Status | Action | -|-------|--------|--------|--------| -| 90-100 | Excellent | Performing optimally | Maintain, minor tweaks | -| 75-89 | Good | Solid performance | Optimize weak areas | -| 60-74 | Fair | Room for improvement | Address key issues | -| 40-59 | Poor | Underperforming | Major revision needed | -| 0-39 | Critical | Failing | Complete overhaul | - -## Trend Analysis - -### Week-over-Week Comparison - -```markdown -| Metric | This Week | Last Week | Change | Status | -|--------|-----------|-----------|--------|--------| -| Sessions | 1,245 | 1,180 | +5.5% | ↑ GROWING | -| Avg Position | 4.2 | 4.8 | +0.6 | ↑ IMPROVING | -| CTR | 2.8% | 2.6% | +0.2pp | ↑ IMPROVING | -| Bounce Rate | 42% | 38% | +4pp | ↓ DECLINING | +health_score = engagement_score * 0.30 + seo_score * 0.30 + ranking_score * 0.20 + trend_score * 0.20 ``` -### Interpreting Trends +| Score | Rating | Action | +|-------|--------|--------| +| 90-100 | Excellent | Maintain, minor tweaks | +| 75-89 | Good | Optimize weak areas | +| 60-74 | Fair | Address key issues | +| 40-59 | Poor | Major revision needed | +| 0-39 | Critical | Complete overhaul | -| Trend Pattern | Interpretation | Recommended Action | -|---------------|----------------|-------------------| -| ↑↑↑ All metrics up | Content gaining momentum | Double down, create related content | -| ↑↓↑ Mixed signals | Transition period | Monitor closely, identify cause | -| ↓↓↓ All metrics down | Content declining | Urgent refresh needed | -| →→→ All flat | Plateau reached | Experiment with new angles | - -## Anomaly Detection - -### Significant Change Thresholds +## Step 5: Detect Anomalies | Metric | Significant Change | Alert Level | |--------|-------------------|-------------| @@ -155,33 +74,9 @@ trend_score = normalize( | Position | ±5 positions | HIGH | | Bounce Rate | ±10pp WoW | MEDIUM | -### Common Anomaly Causes - -| Anomaly | Possible Causes | -|---------|-----------------| -| Sudden traffic drop | Algorithm update, technical issue, competitor | -| CTR spike | SERP feature win, seasonal interest | -| Position fluctuation | Google testing, competitor changes | -| Engagement drop | Content staleness, UX issue | - -## Output Templates - -### Metric Summary Card - -```markdown -## {Metric Name} - -**Current Value**: {value} -**Benchmark**: {benchmark} -**Status**: {GOOD|WARNING|POOR} -**Trend**: {↑|→|↓} ({change}% vs last period) - -**Interpretation**: {1-2 sentence explanation} +Common causes: sudden traffic drop (algorithm update, technical issue), CTR spike (SERP feature win), position fluctuation (Google testing), engagement drop (content staleness). -**Recommended Action**: {specific action if needed} -``` - -### Executive Summary +## Step 6: Generate Report ```markdown ## Content Performance Summary @@ -193,11 +88,30 @@ trend_score = normalize( - {positive finding 2} ### Concerns -- {issue 1} -- {issue 2} +- {issue 1} -- {metric}: {value} vs benchmark {benchmark} ### Priority Actions 1. {highest priority action} 2. {second priority action} 3. {third priority action} ``` + +## Example: Interpreting a Blog Post + +``` +URL: /blog/seo-guide +GA4: Avg Time 4:12 (GOOD), Bounce 35% (GOOD), Scroll 82% (GOOD) +GSC: CTR 1.8% (POOR), Position 6.2 (WARNING), Impressions +15% WoW + +Diagnosis: HIDDEN GEM -- strong engagement but low CTR +Health Score: 90*0.30 + 40*0.30 + 55*0.20 + 70*0.20 = 64 (Fair) +Priority Action: Rewrite title tag and meta description to improve CTR +``` + +## Validation Checklist + +- [ ] Both GA4 and GSC data collected for the same date range +- [ ] Each metric classified against correct benchmark tier +- [ ] Cross-source signals checked for contradictions +- [ ] Health score calculated with all four components +- [ ] Anomalies flagged if any metric exceeds change thresholds diff --git a/plugins/seo/skills/data-extraction-patterns/SKILL.md b/plugins/seo/skills/data-extraction-patterns/SKILL.md index 39a8595..0633f6d 100644 --- a/plugins/seo/skills/data-extraction-patterns/SKILL.md +++ b/plugins/seo/skills/data-extraction-patterns/SKILL.md @@ -1,29 +1,23 @@ --- -plugin: seo -updated: 2026-01-20 name: data-extraction-patterns -description: Common patterns for extracting analytics data from GA4 and GSC with API handling +description: "Extract and combine analytics data from GA4 and GSC APIs using parallel fetching, caching, rate-limit handling, and graceful degradation. Use when building data pipelines or fetching metrics for SEO analysis." --- -plugin: seo -updated: 2026-01-20 # Data Extraction Patterns -## When to Use +## Workflow -- Setting up analytics data pipelines -- Combining data from multiple sources -- Handling API rate limits and errors -- Caching frequently accessed data -- Building data collection workflows +1. **Configure API connections** -- identify GA4 property ID and GSC site URL +2. **Fetch data in parallel** -- issue GA4 and GSC requests simultaneously +3. **Handle errors and rate limits** -- retry with exponential backoff on 429s +4. **Cache results** -- store responses for the session to avoid redundant calls +5. **Merge into unified model** -- combine sources into a single JSON structure +6. **Degrade gracefully** -- produce partial reports when a source is unavailable -## API Reference +## Step 1: GA4 API Reference -### Google Analytics 4 (GA4) +MCP Server: `mcp-server-google-analytics` -**MCP Server**: `mcp-server-google-analytics` - -**Key Operations**: ``` get_report({ propertyId: "properties/123456789", @@ -33,29 +27,12 @@ get_report({ }) ``` -**Useful Metrics**: -| Metric | Description | Use Case | -|--------|-------------|----------| -| screenPageViews | Total page views | Traffic volume | -| sessions | User sessions | Visitor count | -| averageSessionDuration | Avg time in session | Engagement | -| bounceRate | Single-page visits | Content quality | -| engagementRate | Engaged sessions % | True engagement | -| scrolledUsers | Users who scrolled | Content consumption | - -**Useful Dimensions**: -| Dimension | Description | -|-----------|-------------| -| pagePath | URL path | -| date | Date (for trending) | -| sessionSource | Traffic source | -| deviceCategory | Desktop/mobile/tablet | - -### Google Search Console (GSC) - -**MCP Server**: `mcp-server-gsc` - -**Key Operations**: +Key metrics: `screenPageViews`, `sessions`, `averageSessionDuration`, `bounceRate`, `engagementRate`, `scrolledUsers`. Key dimensions: `pagePath`, `date`, `sessionSource`, `deviceCategory`. + +## Step 2: GSC API Reference + +MCP Server: `mcp-server-gsc` + ``` search_analytics({ siteUrl: "https://example.com", @@ -64,316 +41,91 @@ search_analytics({ dimensions: ["query", "page"], rowLimit: 1000 }) - -get_url_inspection({ - siteUrl: "https://example.com", - inspectionUrl: "https://example.com/page" -}) -``` - -**Available Metrics**: -| Metric | Description | Use Case | -|--------|-------------|----------| -| clicks | Total clicks from search | Traffic from Google | -| impressions | Times shown in results | Visibility | -| ctr | Click-through rate | Snippet effectiveness | -| position | Average ranking | SEO success | - -**Dimensions**: -| Dimension | Description | -|-----------|-------------| -| query | Search query | -| page | Landing page URL | -| country | User country | -| device | Desktop/mobile/tablet | -| date | Date (for trending) | - -## Parallel Execution Pattern - -### Optimal Data Fetch (All Sources) - -```markdown -## Parallel Data Fetch Pattern - -When fetching from multiple sources, issue all requests in a SINGLE message -for parallel execution: - -┌─────────────────────────────────────────────────────────────────┐ -│ MESSAGE 1: Parallel Data Requests │ -├─────────────────────────────────────────────────────────────────┤ -│ │ -│ [MCP Call 1]: google-analytics.get_report(...) │ -│ [MCP Call 2]: google-search-console.search_analytics(...) │ -│ │ -│ → All execute simultaneously │ -│ → Results return when all complete │ -│ → ~2x faster than sequential │ -│ │ -└─────────────────────────────────────────────────────────────────┘ ``` -### Sequential (When Needed) +Key metrics: `clicks`, `impressions`, `ctr`, `position`. Key dimensions: `query`, `page`, `country`, `device`, `date`. -Some operations require sequential execution: +## Step 3: Parallel Fetching -```markdown -## Sequential Pattern (Dependencies) +Issue all MCP calls in a single message for simultaneous execution: -When one request depends on another's result: - -┌─────────────────────────────────────────────────────────────────┐ -│ MESSAGE 1: Get list of pages │ -│ → Returns: ["/page1", "/page2", "/page3"] │ -├─────────────────────────────────────────────────────────────────┤ -│ MESSAGE 2: Get details for each page │ -│ → Uses page list from Message 1 │ -│ → Can parallelize within this message │ -└─────────────────────────────────────────────────────────────────┘ +``` +Message 1 (parallel): + [MCP Call 1]: google-analytics.get_report(...) + [MCP Call 2]: google-search-console.search_analytics(...) + → Both execute simultaneously, ~2x faster than sequential ``` -## Rate Limiting +Use sequential fetching only when one call depends on another's result (e.g., fetch page list first, then details per page). -### API Rate Limits +## Step 4: Rate Limits and Retry | API | Limit | Strategy | |-----|-------|----------| -| GA4 | 10 QPS per property | Batch dimensions | +| GA4 | 10 QPS per property | Batch dimensions in one call | | GSC | 1,200 requests/min | Paginate large exports | -### Retry Pattern - -```bash -#!/bin/bash -# Retry with exponential backoff - -MAX_RETRIES=3 -RETRY_DELAY=5 +Retry pattern: on HTTP 429, wait 5s, then double delay on each retry, max 3 attempts. -fetch_with_retry() { - local url="$1" - local attempt=1 +## Step 5: Caching - while [ $attempt -le $MAX_RETRIES ]; do - response=$(curl -s -w "%{http_code}" -o /tmp/response.json "$url") - http_code="${response: -3}" +Cache responses per session with a 1-hour TTL. Key format: `{source}_{url}_{dateRange}`. Check cache before every API call; save response after successful fetch. - if [ "$http_code" = "200" ]; then - cat /tmp/response.json - return 0 - elif [ "$http_code" = "429" ]; then - echo "Rate limited, waiting ${RETRY_DELAY}s..." >&2 - sleep $RETRY_DELAY - RETRY_DELAY=$((RETRY_DELAY * 2)) - else - echo "Error: HTTP $http_code" >&2 - return 1 - fi - - attempt=$((attempt + 1)) - done - - echo "Max retries exceeded" >&2 - return 1 -} -``` - -## Caching Pattern - -### Session-Based Cache - -```bash -# Cache structure -SESSION_PATH="/tmp/seo-performance-20251227-143000-example" -CACHE_DIR="${SESSION_PATH}/cache" -CACHE_TTL=3600 # 1 hour in seconds - -mkdir -p "$CACHE_DIR" - -# Cache key generation -cache_key() { - echo "$1" | md5sum | cut -d' ' -f1 -} - -# Check cache -get_cached() { - local key=$(cache_key "$1") - local cache_file="${CACHE_DIR}/${key}.json" - - if [ -f "$cache_file" ]; then - local age=$(($(date +%s) - $(stat -f%m "$cache_file" 2>/dev/null || stat -c%Y "$cache_file"))) - if [ $age -lt $CACHE_TTL ]; then - cat "$cache_file" - return 0 - fi - fi - return 1 -} - -# Save to cache -save_cache() { - local key=$(cache_key "$1") - local cache_file="${CACHE_DIR}/${key}.json" - cat > "$cache_file" -} - -# Usage -CACHE_KEY="ga4_${URL}_${DATE_RANGE}" -if ! RESULT=$(get_cached "$CACHE_KEY"); then - RESULT=$(fetch_from_api) - echo "$RESULT" | save_cache "$CACHE_KEY" -fi -``` - -## Date Range Standardization - -### Common Date Ranges - -```bash -# Standard date range calculations -TODAY=$(date +%Y-%m-%d) - -case "$RANGE" in - "7d") - START_DATE=$(date -v-7d +%Y-%m-%d 2>/dev/null || date -d "7 days ago" +%Y-%m-%d) - ;; - "30d") - START_DATE=$(date -v-30d +%Y-%m-%d 2>/dev/null || date -d "30 days ago" +%Y-%m-%d) - ;; - "90d") - START_DATE=$(date -v-90d +%Y-%m-%d 2>/dev/null || date -d "90 days ago" +%Y-%m-%d) - ;; - "mtd") - START_DATE=$(date +%Y-%m-01) - ;; - "ytd") - START_DATE=$(date +%Y-01-01) - ;; -esac - -END_DATE="$TODAY" -``` - -### API-Specific Formats - -| API | Format | Example | -|-----|--------|---------| -| GA4 | Relative or ISO | "30daysAgo", "2025-12-01" | -| GSC | ISO 8601 | "2025-12-01" | - -## Graceful Degradation - -### Data Source Fallback - -```markdown -## Fallback Strategy - -When a data source is unavailable: - -┌─────────────────────────────────────────────────────────────────┐ -│ PRIMARY SOURCE │ FALLBACK │ LAST RESORT │ -├──────────────────────┼─────────────────────┼────────────────────┤ -│ GA4 traffic data │ GSC clicks │ Estimate from GSC │ -│ GSC search perf │ Manual SERP check │ WebSearch SERP │ -│ CWV (CrUX) │ PageSpeed API │ Lighthouse CLI │ -└──────────────────────┴─────────────────────┴────────────────────┘ -``` - -### Partial Data Output - -```markdown -## Analysis Report (Partial Data) - -### Data Availability - -| Source | Status | Impact | -|--------|--------|--------| -| GA4 | NOT CONFIGURED | Missing engagement metrics | -| GSC | AVAILABLE | Full search data | - -### Analysis Notes - -This analysis is based on limited data sources: -- Search performance metrics are complete (GSC) -- Engagement metrics unavailable (no GA4) - -**Recommendation**: Configure GA4 for complete analysis. -Run `/setup-analytics` to add Google Analytics. -``` - -## Unified Data Model - -### Combined Output Structure +## Step 6: Unified Output Model ```json { "metadata": { "url": "https://example.com/page", "fetchedAt": "2025-12-27T14:30:00Z", - "dateRange": { - "start": "2025-11-27", - "end": "2025-12-27" - } + "dateRange": { "start": "2025-11-27", "end": "2025-12-27" } }, "sources": { "ga4": { "available": true, - "metrics": { - "pageViews": 2450, - "avgTimeOnPage": 222, - "bounceRate": 38.2, - "engagementRate": 64.5 - } + "metrics": { "pageViews": 2450, "bounceRate": 38.2, "engagementRate": 64.5 } }, "gsc": { "available": true, - "metrics": { - "impressions": 15200, - "clicks": 428, - "ctr": 2.82, - "avgPosition": 4.2 - }, - "topQueries": [ - {"query": "seo guide", "clicks": 156, "position": 4} - ] + "metrics": { "impressions": 15200, "clicks": 428, "ctr": 2.82, "avgPosition": 4.2 } } }, - "computed": { - "healthScore": 72, - "status": "GOOD" - } + "computed": { "healthScore": 72, "status": "GOOD" } } ``` -## Error Handling +## Graceful Degradation + +| Primary Source | Fallback | Last Resort | +|----------------|----------|-------------| +| GA4 traffic data | GSC clicks | Estimate from GSC impressions * CTR | +| GSC search perf | Manual SERP check | WebSearch SERP | +| CWV (CrUX) | PageSpeed API | Lighthouse CLI | -### Common Errors +When a source is unavailable, note it in the report and recommend configuring the missing source. + +## Error Handling | Error | Cause | Resolution | |-------|-------|------------| -| 401 Unauthorized | Invalid/expired credentials | Re-run /setup-analytics | -| 403 Forbidden | Missing permissions | Check API access in console | -| 429 Too Many Requests | Rate limit | Wait and retry with backoff | +| 401 Unauthorized | Invalid credentials | Re-run /setup-analytics | +| 403 Forbidden | Missing permissions | Grant Viewer role in GA4 Admin | +| 429 Too Many Requests | Rate limit hit | Retry with exponential backoff | | 404 Not Found | Invalid property/site | Verify IDs in configuration | -| 500 Server Error | API issue | Retry later, check status page | - -### Error Output Pattern - -```markdown -## Data Fetch Error -**Source**: Google Analytics 4 -**Error**: 403 Forbidden -**Message**: "User does not have sufficient permissions for this property" +## Date Range Reference -### Troubleshooting Steps +| API | Format | Example | +|-----|--------|---------| +| GA4 | Relative or ISO | "30daysAgo", "2025-12-01" | +| GSC | ISO 8601 only | "2025-12-01" | -1. Verify Service Account email in GA4 Admin -2. Ensure "Viewer" role is granted -3. Check Analytics Data API is enabled -4. Wait 5 minutes for permission propagation +Common ranges: `7d`, `30d`, `90d`, `mtd` (month-to-date), `ytd` (year-to-date). -### Workaround +## Validation Checklist -Proceeding with available data sources (GSC). -GA4 engagement metrics will not be included in this analysis. -``` +- [ ] GA4 property ID and GSC site URL configured correctly +- [ ] Parallel fetch used when requests are independent +- [ ] Cache checked before each API call +- [ ] Error responses handled with retry or fallback +- [ ] Partial data reports note which sources are missing diff --git a/plugins/seo/skills/link-strategy/SKILL.md b/plugins/seo/skills/link-strategy/SKILL.md index 5b1c264..44bd31c 100644 --- a/plugins/seo/skills/link-strategy/SKILL.md +++ b/plugins/seo/skills/link-strategy/SKILL.md @@ -1,105 +1,60 @@ --- -plugin: seo -updated: 2026-01-20 name: link-strategy -description: Internal linking strategy and anchor text optimization patterns. Use when planning internal links or optimizing site structure. +description: "Audit internal links, optimize anchor text, and build topic clusters to improve site authority and rankings. Use when planning internal links, fixing orphan pages, or restructuring site architecture." --- -plugin: seo -updated: 2026-01-20 # Link Strategy -## When to Use +## Workflow -- Planning internal linking structure -- Optimizing anchor text -- Building topic clusters -- Improving site architecture +1. **Audit existing links** -- inventory current internal links and identify gaps +2. **Find orphan pages** -- locate pages with zero internal links pointing to them +3. **Fix broken links** -- test all internal links and remove or repair broken ones +4. **Optimize anchor text** -- replace generic anchors with keyword-rich descriptions +5. **Build topic clusters** -- group related content into pillar-cluster structures +6. **Verify improvements** -- confirm new links are crawlable and passing authority -## Internal Linking Principles +## Step 1: Internal Linking Principles -### 1. Link from High to Low Authority -- Homepage -> Category Pages -> Individual Posts -- Old established pages -> New pages -- High-traffic pages -> Pages you want to rank +| Principle | Rule | Example | +|-----------|------|---------| +| Link high to low authority | Homepage → Category → Post | Established pages link to new pages | +| Descriptive anchor text | Use target keyword naturally | Good: "SEO keyword research guide" Bad: "click here" | +| Contextual placement | Body content links > nav links | Link within relevant paragraphs | +| Reasonable density | 3-5 internal links per 1000 words | Avoid 100+ links on one page | -### 2. Use Descriptive Anchor Text -- Good: "SEO keyword research guide" -- Bad: "click here", "read more" -- Include target keyword naturally +## Step 2: Identify Orphan Pages -### 3. Link Contextually -- Links within body content > Navigation links -- Relevant context around link -- Natural reading flow +Orphan pages have no internal links pointing to them. To find them: -### 4. Maintain Reasonable Link Count -- 3-5 internal links per 1000 words -- Avoid excessive linking (100+ links) -- Focus on most relevant pages +1. Use Glob to collect all internal link targets across the site +2. Compare against sitemap or page inventory +3. Pages not in the link target list are orphans +4. Add links from topically relevant content to each orphan -## Topic Cluster Model +## Step 3: Anchor Text Optimization -``` -PILLAR PAGE: "Content Marketing" (broad, high volume) - | - +-- Supporting Article: "Content Marketing Strategy" - | (links to and from pillar) - | - +-- Supporting Article: "Content Marketing Examples" - | (links to and from pillar) - | - +-- Supporting Article: "Content Marketing Tools" - (links to and from pillar) -``` - -**Benefits:** -- Establishes topical authority -- Passes PageRank efficiently -- Improves user navigation -- Signals content relationships - -## Anchor Text Optimization - -### Anchor Text Types - -| Type | Example | When to Use | -|------|---------|-------------| -| Exact Match | "SEO tools" | Sparingly (1-2x per page) | +| Type | Example | Usage | +|------|---------|-------| +| Exact Match | "SEO tools" | Sparingly, 1-2x per page | | Partial Match | "best SEO tools for startups" | Primary usage | | Branded | "SEMrush" | Brand mentions | -| Generic | "click here" | Avoid if possible | -| Naked URL | "https://example.com" | Occasional | +| Generic | "click here" | Avoid | -### Best Practices -- Vary anchor text naturally -- Use target keyword in some anchors -- Avoid over-optimization (100% exact match) -- Make text descriptive and clickable +Rule: Vary anchor text naturally. Never use 100% exact match anchors for the same target. -## Link Audit Process +## Step 4: Build Topic Clusters -1. **Inventory existing links** - - Use Glob to find all internal links - - Map current link structure - -2. **Identify orphan pages** - - Pages with no internal links - - Add links from relevant content - -3. **Find broken links** - - Test all internal links - - Fix or remove broken ones - -4. **Optimize anchor text** - - Replace generic anchors - - Add keyword-rich descriptions +``` +PILLAR: "Content Marketing" (broad, high volume keyword) + ├── Supporting: "Content Marketing Strategy" (links to/from pillar) + ├── Supporting: "Content Marketing Examples" (links to/from pillar) + └── Supporting: "Content Marketing Tools" (links to/from pillar) +``` -5. **Build topic clusters** - - Group related content - - Implement pillar-cluster model +Benefits: establishes topical authority, passes PageRank efficiently, improves user navigation, signals content relationships to search engines. -## Output Format +## Step 5: Generate Linking Plan ```markdown ## Internal Linking Plan @@ -112,17 +67,42 @@ PILLAR PAGE: "Content Marketing" (broad, high volume) 1. **From**: {source_page} - **Anchor**: {anchor_text} - **Context**: {surrounding_sentence} - - **Priority**: HIGH/MEDIUM/LOW + - **Priority**: HIGH 2. **From**: {source_page} - **Anchor**: {anchor_text} - **Context**: {surrounding_sentence} - - **Priority**: HIGH/MEDIUM/LOW - -### Topic Cluster Structure + - **Priority**: MEDIUM +### Topic Cluster PILLAR: {main_topic_page} - Supporting: {page1} - Supporting: {page2} -- Supporting: {page3} ``` + +## Example: Auditing a Blog + +``` +Site: example.com/blog (45 posts) + +Audit results: +- 8 orphan pages found (no internal links) +- 12 generic anchors ("click here", "read more") +- 3 broken internal links (404s) + +Actions taken: +1. Added links to 8 orphan pages from related posts +2. Replaced 12 generic anchors with keyword-rich text +3. Fixed 2 broken links, removed 1 (page deleted) +4. Created 3 topic clusters around pillar pages + +Expected impact: +15-25% crawl coverage, improved rankings for cluster topics +``` + +## Validation Checklist + +- [ ] All orphan pages now have at least one internal link +- [ ] No broken internal links remain (all 404s fixed or removed) +- [ ] Generic anchors replaced with descriptive, keyword-relevant text +- [ ] Each topic cluster has bidirectional links between pillar and supporting pages +- [ ] No page exceeds 100 internal links diff --git a/plugins/seo/skills/performance-correlation/SKILL.md b/plugins/seo/skills/performance-correlation/SKILL.md index 8f97985..0638e25 100644 --- a/plugins/seo/skills/performance-correlation/SKILL.md +++ b/plugins/seo/skills/performance-correlation/SKILL.md @@ -1,180 +1,60 @@ --- -plugin: seo -updated: 2026-01-20 name: performance-correlation -description: Correlate content attributes with GA4 and GSC metrics to identify performance drivers +description: "Correlate content changes with GA4 and GSC metric movements to identify performance drivers and build optimization hypotheses. Use when diagnosing why metrics changed or planning content optimizations." --- -plugin: seo -updated: 2026-01-20 # Performance Correlation -## When to Use +## Workflow -- Connecting content changes to metric changes -- Identifying what drives performance -- Building optimization hypotheses -- A/B test analysis -- Content audit findings +1. **Match symptoms to patterns** -- identify which cross-source pattern applies +2. **Gather evidence** -- collect supporting data from GA4, GSC, and competitor analysis +3. **Log content changes** -- maintain a timeline of modifications +4. **Map metric responses** -- correlate metric shifts to specific changes +5. **Rate confidence** -- assess cause-effect reliability +6. **Build hypothesis** -- propose next optimization with success criteria -## Cross-Source Correlation Patterns +## Step 1: Pattern Library -### Pattern Library +### Pattern 1: High Impressions + Low CTR + Good Position (3-7) -#### Pattern 1: High Impressions + Low CTR + Good Position +**Diagnosis**: Title/meta description not compelling enough. -``` -GSC: Impressions ↑ | CTR ↓ | Position 3-7 -GA4: N/A (users don't click) -``` - -**Diagnosis**: Title/meta description not compelling enough - -**Evidence Needed**: -- Compare your snippet to competitors in positions 1-2 -- Check for SERP features stealing attention -- Analyze query intent match - -**Recommended Actions**: -1. Rewrite title with power words, numbers, or year -2. Add compelling meta description with clear benefit -3. Target featured snippet if applicable - -**Expected Impact**: +50-100% CTR improvement possible - ---- -plugin: seo -updated: 2026-01-20 - -#### Pattern 2: High CTR + Low Engagement - -``` -GSC: CTR ↑ | Position stable -GA4: Bounce ↑ | Time on Page ↓ | Scroll Depth ↓ -``` - -**Diagnosis**: Content doesn't match search intent or promise - -**Evidence Needed**: -- Compare content to search query expectations -- Check if title oversells/misleads -- Analyze competing content that ranks - -**Recommended Actions**: -1. Align content opening with search intent -2. Deliver promised value in first 100 words -3. Add table of contents for scanners - -**Expected Impact**: -20-30% bounce rate, +50% time on page +Actions: Rewrite title with power words/numbers/year, add benefit-driven meta description, target featured snippet. Expected impact: +50-100% CTR. ---- -plugin: seo -updated: 2026-01-20 - -#### Pattern 3: High Engagement + Low Rankings - -``` -GA4: Time on Page ↑ | Bounce ↓ | Scroll Depth ↑ -GSC: Position ↓ | Impressions ↓ -``` - -**Diagnosis**: Good content but weak SEO signals - -**Evidence Needed**: -- Check backlink profile vs competitors -- Analyze internal linking to this page -- Review technical SEO factors - -**Recommended Actions**: -1. Build quality backlinks to page -2. Add internal links from high-authority pages -3. Improve on-page SEO (keyword density, headers) +### Pattern 2: High CTR + Low Engagement (high bounce, low time on page) -**Expected Impact**: +5-15 position improvement over 2-3 months +**Diagnosis**: Content doesn't match search intent or promise. ---- -plugin: seo -updated: 2026-01-20 - -#### Pattern 4: Declining Rankings + Stable Traffic +Actions: Align opening with search intent, deliver value in first 100 words, add table of contents. Expected impact: -20-30% bounce rate, +50% time on page. -``` -GSC: Position ↓ | Impressions → | Clicks → (or slight ↓) -GA4: Traffic → (from brand/direct) -``` +### Pattern 3: High Engagement + Low Rankings -**Diagnosis**: Competitors advancing, brand queries protecting you +**Diagnosis**: Good content but weak SEO signals. -**Evidence Needed**: -- Competitor content comparison -- Content freshness analysis -- Backlink velocity comparison +Actions: Build quality backlinks, add internal links from high-authority pages, improve on-page SEO. Expected impact: +5-15 positions over 2-3 months. -**Recommended Actions**: -1. Content refresh with updated data/examples -2. Add new sections competitors have -3. Accelerate link building +### Pattern 4: Declining Rankings + Stable Traffic -**Expected Impact**: Prevent further decline, regain positions +**Diagnosis**: Competitors advancing; brand queries protecting traffic. ---- -plugin: seo -updated: 2026-01-20 +Actions: Refresh content with updated data, add sections competitors have, accelerate link building. -#### Pattern 5: Good Rankings + Low Impressions +### Pattern 5: Good Rankings (1-5) + Low Impressions -``` -GSC: Position 1-5 | Impressions ↓ | CTR normal -GA4: Traffic ↓ -``` +**Diagnosis**: Keyword losing search volume. -**Diagnosis**: Keyword losing search volume +Actions: Target related growing keywords, expand content for adjacent queries, consider pivoting topic angle. -**Evidence Needed**: -- Google Trends for keyword -- Seasonal patterns analysis -- Industry shifts +### Pattern 6: Position Volatility (fluctuates +/-10 daily) -**Recommended Actions**: -1. Target related growing keywords -2. Expand content for related queries -3. Consider pivoting topic angle +**Diagnosis**: Google testing content quality or thin content threshold. -**Expected Impact**: Capture adjacent search demand +Actions: Strengthen E-E-A-T signals, add depth and originality, improve page experience. Expected stabilization: 2-4 weeks. ---- -plugin: seo -updated: 2026-01-20 - -#### Pattern 6: Position Volatility - -``` -GSC: Position fluctuates ±10 daily -``` - -**Diagnosis**: Google testing your content, or thin content threshold - -**Evidence Needed**: -- Content depth vs competitors -- E-E-A-T signals present -- Page experience metrics - -**Recommended Actions**: -1. Strengthen E-E-A-T signals -2. Add depth and originality -3. Improve page experience - -**Expected Impact**: Position stabilization within 2-4 weeks - -## Correlation Matrix Template - -### Content Changes Timeline - -Track all modifications to correlate with metrics: +## Step 2: Log Content Changes ```markdown -## Content Change Log: {URL} - | Date | Change Type | Description | Scope | |------|-------------|-------------|-------| | 2025-12-01 | Content | Added 500 words on AI SEO | Major | @@ -183,13 +63,9 @@ Track all modifications to correlate with metrics: | 2025-12-20 | Technical | Improved page speed | Technical | ``` -### Metric Response Timeline - -Map metric changes to content changes: +## Step 3: Map Metric Responses ```markdown -## Metric Response Analysis - | Date | Metric | Before | After | Change | Likely Cause | |------|--------|--------|-------|--------|--------------| | Dec 5 | Position | 8.2 | 6.1 | +2.1 | Content expansion | @@ -198,101 +74,54 @@ Map metric changes to content changes: | Dec 22 | LCP | 2.8s | 1.9s | -0.9s | Speed optimization | ``` -### Correlation Confidence - -Rate confidence in cause-effect relationships: - -```markdown -## Correlation Confidence Assessment +## Step 4: Rate Confidence | Change | Metric Impact | Confidence | Reasoning | |--------|---------------|------------|-----------| | +500 words | Position +2.1 | HIGH | Timing matches, logical connection | | Title update | CTR +1.7pp | HIGH | Direct relationship, immediate effect | -| Internal links | ? | LOW | Too recent, effect delayed | +| Internal links | Unknown | LOW | Too recent, effect delayed 2-4 weeks | | Speed fix | Bounce -5% | MEDIUM | Timing matches, indirect relationship | -``` - -## Multi-Source Correlation - -### Unified Performance View - -```markdown -## Cross-Platform Correlation: {URL} - -### Traffic & Visibility -| Source | Metric | Value | Trend | Correlation | -|--------|--------|-------|-------|-------------| -| GSC | Impressions | 15,200 | ↑ +12% | Search visibility growing | -| GSC | Clicks | 428 | ↑ +8% | Traffic following visibility | -| GA4 | Sessions | 512 | ↑ +10% | Confirms GSC data | - -### Engagement Quality -| Source | Metric | Value | Trend | Correlation | -|--------|--------|-------|-------|-------------| -| GSC | CTR | 2.8% | → stable | Snippet unchanged | -| GA4 | Bounce Rate | 38% | ↓ -4% | Content improvements working | -| GA4 | Avg Time | 3:42 | ↑ +0:45 | Users more engaged | -| GA4 | Scroll Depth | 72% | ↑ +8% | Content structure improved | - -### Ranking Performance -| Source | Keyword | Position | Change | Opportunity | -|--------|---------|----------|--------|-------------| -| GSC | seo guide | 4 | +2 | Target position 1-3 | -| GSC | seo best practices | 7 | +1 | Content gap vs leader | -| GSC | seo tips 2025 | 12 | -3 | Needs freshness update | -``` - -## Hypothesis Building -### Template +## Step 5: Build Hypothesis ```markdown ## Optimization Hypothesis **Observation**: {what the data shows} - **Hypothesis**: {proposed cause-effect relationship} **Test Plan**: 1. {specific change to make} -2. {metrics to monitor} -3. {timeframe for evaluation} +2. {metrics to monitor}: CTR, position, clicks +3. {timeframe}: evaluate after 2 weeks **Success Criteria**: -- Primary: {main metric target} -- Secondary: {supporting metric targets} +- Primary: {main metric target, e.g., CTR > 4%} +- Secondary: {supporting target, e.g., clicks +30%} -**Risk Assessment**: -- Probability of success: {HIGH|MEDIUM|LOW} -- Potential downside: {risk description} -- Mitigation: {how to minimize risk} +**Risk**: {HIGH|MEDIUM|LOW} -- {risk description and mitigation} ``` -### Example Hypothesis +## Example: Full Correlation Analysis -```markdown -## Optimization Hypothesis: CTR Improvement +``` +URL: /blog/seo-guide +Change: Title updated from "SEO Guide" to "Complete SEO Guide 2025: 15 Strategies" -**Observation**: Page ranks #4 for "seo guide 2025" with 15K monthly -impressions but only 2.8% CTR (below 5% benchmark). +Before (Dec 1-7): Position 4.0, CTR 2.8%, Clicks 420 +After (Dec 15-21): Position 3.8, CTR 4.5%, Clicks 680 -**Hypothesis**: Updating title to include "Complete" and current year -will increase CTR by appealing to users seeking comprehensive, fresh content. +Pattern match: Pattern 1 (High Impressions + Low CTR) +Confidence: HIGH -- direct relationship, timing matches within days +Result: CTR +60%, Clicks +62%, Position stable +Next hypothesis: Add FAQ section to capture People Also Ask SERP feature +``` -**Test Plan**: -1. Change title from "SEO Guide: Tips for Success" to - "Complete SEO Guide 2025: 15 Proven Strategies" -2. Monitor: CTR, impressions, position, clicks -3. Evaluate after 2 weeks of data +## Validation Checklist -**Success Criteria**: -- Primary: CTR increases from 2.8% to >4% -- Secondary: Clicks increase by >30% -- Position maintains or improves - -**Risk Assessment**: -- Probability of success: HIGH (title changes typically show results) -- Potential downside: Slight position fluctuation during testing -- Mitigation: Don't change other page elements simultaneously -``` +- [ ] Content change log maintained with dates and scope +- [ ] Metric data compared for matching time periods (before vs after) +- [ ] At least 2 weeks of post-change data before drawing conclusions +- [ ] Confidence rated for each correlation (HIGH/MEDIUM/LOW) +- [ ] Only one major change made at a time for clear attribution diff --git a/plugins/seo/skills/serp-analysis/SKILL.md b/plugins/seo/skills/serp-analysis/SKILL.md index 3e5f00b..03d3b77 100644 --- a/plugins/seo/skills/serp-analysis/SKILL.md +++ b/plugins/seo/skills/serp-analysis/SKILL.md @@ -1,88 +1,70 @@ --- -plugin: seo -updated: 2026-01-20 name: serp-analysis -description: SERP analysis techniques for intent classification, feature identification, and competitive intelligence. Use when analyzing search results for content strategy. +description: "Analyze search engine results pages to classify intent, identify SERP feature opportunities, and assess competitor content gaps. Use when researching keywords, planning content strategy, or evaluating competitive positioning." --- -plugin: seo -updated: 2026-01-20 # SERP Analysis -## When to Use +## Workflow -- Analyzing search results for a keyword -- Classifying search intent -- Identifying SERP feature opportunities -- Competitive intelligence gathering +1. **Search the keyword** -- use WebSearch to fetch current SERP results +2. **Classify intent** -- determine informational, commercial, transactional, or navigational +3. **Identify SERP features** -- note featured snippets, PAA, image packs, and other features +4. **Analyze competitors** -- collect format, word count, heading structure, and gaps for top 10 +5. **Find content gaps** -- identify topics competitors miss or cover weakly +6. **Generate recommendations** -- propose content format, word count, and differentiator -## Intent Classification +## Step 1: Classify Search Intent -### Intent Types +| Intent | SERP Signals | Content Format | +|--------|--------------|----------------| +| Informational | Wikipedia, knowledge panels, "what is" queries | Guide, tutorial, explainer | +| Commercial | Reviews, comparisons, "best X" queries | Comparison, listicle, review | +| Transactional | Product pages, shopping results, "buy X" | Product page, pricing | +| Navigational | Brand homepage, login pages | Homepage, login page | -| Intent | SERP Signals | User Goal | Content Format | -|--------|--------------|-----------|----------------| -| **Informational** | Wikipedia, knowledge panels, "what is" queries | Learn something | Guide, tutorial, explainer | -| **Commercial** | Reviews, comparisons, "best X" queries | Compare options | Comparison, listicle, review | -| **Transactional** | Product pages, shopping results, "buy X" | Purchase something | Product page, pricing | -| **Navigational** | Brand homepage, login pages | Find specific site | Homepage, login page | +Process: Search the keyword, count result types, assign intent based on dominant type, note confidence as percentage of results supporting classification. -### Classification Process +## Step 2: Identify SERP Features -1. **Search the keyword** using WebSearch -2. **Analyze result types**: - - All informational = Informational intent - - Mix of reviews/comparisons = Commercial intent - - Product pages dominant = Transactional intent - - Single brand dominant = Navigational intent -3. **Check for mixed intent** (common for broad keywords) -4. **Note confidence level** (% of results supporting classification) +| Feature | Optimization Strategy | +|---------|----------------------| +| Featured Snippet | Direct answer in first 100 words (40-60 words for paragraph type) | +| People Also Ask | Add FAQ section answering common questions | +| Image Pack | High-quality images with descriptive alt text | +| Video Results | Create relevant YouTube content | +| Local Pack | GMB optimization, create location pages | +| Knowledge Panel | Add Schema markup, establish Wikipedia presence | +| Sitelinks | Clear site structure with strong internal linking | -## SERP Features +### Featured Snippet Formats -### Feature Identification +| Type | How to Optimize | +|------|-----------------| +| Paragraph | 40-60 word direct answer immediately after the H2 | +| List | Use ordered/unordered HTML lists | +| Table | Use HTML tables with clear headers | -| Feature | How to Identify | Optimization Strategy | -|---------|-----------------|----------------------| -| **Featured Snippet** | Box at top with answer | Direct answer in first 100 words | -| **People Also Ask** | Expandable question boxes | FAQ section, answer common questions | -| **Image Pack** | Row of images | High-quality images with alt text | -| **Video Results** | YouTube thumbnails | Create video content | -| **Local Pack** | Map with business listings | GMB optimization, location pages | -| **Knowledge Panel** | Right sidebar info box | Schema markup, Wikipedia presence | -| **Sitelinks** | Sub-links under main result | Clear site structure, internal linking | +## Step 3: Analyze Competitors -### Featured Snippet Types +For each top 10 result, collect: -| Type | Format | How to Optimize | -|------|--------|-----------------| -| Paragraph | Text block | 40-60 word direct answer | -| List | Numbered/bulleted list | Use ordered/unordered lists | -| Table | Data table | Use HTML tables | -| Video | YouTube embed | Create relevant video content | +| Data Point | Why It Matters | +|------------|---------------| +| Content format | Reveals what Google rewards for this query | +| Word count | Sets minimum content depth target | +| Heading structure | Shows required subtopics | +| Unique angle | Identifies differentiation opportunities | +| Content gaps | Reveals what to add for competitive advantage | -## Competitive Analysis - -### Competitor Data to Collect - -For each top 10 result, note: - -1. **Domain authority** (relative, not exact) -2. **Content format** (guide, listicle, comparison, etc.) -3. **Word count** (approximate) -4. **Heading structure** (H2 topics covered) -5. **Unique angle** (what makes them different) -6. **Content gaps** (what they miss) - -### Competitor Matrix Template +### Competitor Matrix | Rank | Domain | Format | Words | Unique Angle | Gap | |------|--------|--------|-------|--------------|-----| | 1 | {domain} | {format} | {count} | {angle} | {gap} | | 2 | {domain} | {format} | {count} | {angle} | {gap} | -| ... | | | | | | -## Output Format +## Step 4: Generate Report ```markdown ## SERP Analysis: {keyword} @@ -98,22 +80,41 @@ For each top 10 result, note: - [ ] Image Pack - [ ] Video Results - [ ] Local Pack -- [ ] Knowledge Panel -- [ ] Sitelinks ### Competitor Analysis | Rank | Domain | Format | Words | Unique Angle | |------|--------|--------|-------|--------------| | 1 | {domain} | {format} | {count} | {angle} | -... -### Content Gaps Identified -1. {gap} - {which competitors miss this} -2. {gap} - {which competitors miss this} +### Content Gaps +1. {gap} -- {which competitors miss this} ### Recommendations -1. **Content Format**: {recommended format based on SERP} -2. **Word Count**: {recommended based on competitors + 20%} -3. **Featured Snippet**: {opportunity and how to capture} +1. **Content Format**: {based on what ranks} +2. **Word Count**: {competitor avg + 20%} +3. **Featured Snippet**: {opportunity and capture strategy} 4. **Differentiator**: {unique angle to stand out} ``` + +## Example: Analyzing "best project management tools" + +``` +Intent: Commercial (90% confidence) -- reviews and comparisons dominate +SERP Features: Featured Snippet (list), PAA (6 questions), Image Pack + +Top 3 competitors: +1. pcmag.com -- Listicle, 4500 words, detailed reviews with screenshots +2. forbes.com -- Comparison, 3200 words, pricing tables +3. zapier.com -- Listicle, 2800 words, integration focus + +Content gaps: No competitor covers AI features, team size recommendations, or migration guides +Recommendation: 4000+ word comparison with pricing tables, AI feature analysis, and team-size matrix +``` + +## Validation Checklist + +- [ ] At least top 10 SERP results analyzed +- [ ] Intent classified with confidence percentage +- [ ] All visible SERP features documented +- [ ] Content gaps identified from competitor analysis +- [ ] Recommended word count exceeds competitor average by 20% diff --git a/plugins/statusline/skills/statusline-customization/SKILL.md b/plugins/statusline/skills/statusline-customization/SKILL.md index 014ad72..3c54e72 100644 --- a/plugins/statusline/skills/statusline-customization/SKILL.md +++ b/plugins/statusline/skills/statusline-customization/SKILL.md @@ -1,10 +1,6 @@ --- name: statusline-customization -description: Configuration reference and troubleshooting for the statusline plugin — sections, themes, bar widths, and script architecture -globs: - - "**/statusline*" - - "**/.claude/settings.json" - - "**/statusline-config.json" +description: "Configures statusline sections (model, branch, cost, context bar, plan limits), themes, and bar widths via JSON config. Covers troubleshooting for jq, plan limits, and script permissions. Use when customizing the statusline display, changing themes, adjusting bar widths, or diagnosing statusline rendering issues." --- # Statusline Customization Reference @@ -13,8 +9,6 @@ globs: **Location:** `~/.claude/statusline-config.json` -### Schema - ```json { "sections": { @@ -26,126 +20,58 @@ globs: "context_bar": true, // Context window usage bar "plan_limits": true // Plan limit bars with reset countdowns }, - "context_bar_width": 12, // Width of context bar in chars (8-20) - "plan_bar_width": 10, // Width of plan limit bar in chars (6-16) + "context_bar_width": 12, // Width in chars (8-20) + "plan_bar_width": 10, // Width in chars (6-16) "theme": "default" // Color theme name } ``` All fields are optional. Missing fields use defaults shown above. -## Sections Reference +## Sections | Section | Color | Description | |---------|-------|-------------| | `model` | Cyan (bold) | Shortened model name with `*` prefix | | `branch` | Green | Current git branch or short commit hash | -| `worktree` | Orange (bold) | `wt:name` — only shown when inside a linked worktree | +| `worktree` | Orange (bold) | `wt:name` — only inside linked worktrees | | `cost` | Yellow | Cumulative session cost in USD | | `duration` | Magenta | Session duration in minutes/seconds | -| `context_bar` | Green→Red gradient | Visual bar + token count (90k/200k) + compaction indicator (⟳) | -| `plan_limits` | Teal→Red gradient | Dual bar: top=5h, bottom=7d plan usage with reset countdowns | - -### Plan Limits Bar Characters - -- `█` — both 5h and 7d usage at this position -- `▀` — only 5h usage (top half) -- `▄` — only 7d usage (bottom half) -- `-` — empty (unused capacity) - -### Reset Countdown Format - -After each percentage, a countdown shows when the limit resets: - -- `↻1h40m` — resets in 1 hour 40 minutes -- `↻3d12h` — resets in 3 days 12 hours -- `↻now` — resetting now - -Example: `█▄▄------- 5h:18% ↻1h40m 7d:35% ↻3d12h` - -### Context Bar Token Count - -After the percentage, a dim token count shows current/max context usage: - -- `45% 90k/200k` — 90k tokens used out of 200k window -- `72% 144k/200k` — approaching limit -- Only shown when Claude Code provides token data in stdin - -### Compaction Detection - -A bold magenta `⟳` appears after the token count when auto-compaction is detected: - -- `25% 50k/200k ⟳` — compaction just happened (tokens dropped) -- The indicator appears for one render only, then disappears -- Detection works by caching `total_input_tokens` between renders; a drop means compaction occurred -- Cache file: `~/.claude/.statusline-token-cache` +| `context_bar` | Green-Red gradient | Visual bar + token count + compaction indicator | +| `plan_limits` | Teal-Red gradient | Dual bar: 5h/7d plan usage with reset countdowns | ## Themes | Theme | Description | |-------|-------------| -| `default` | Warm/cool ANSI palette — bright cyan, green, yellow, orange, red | -| `monochrome` | White and gray only — no colors | -| `minimal` | Muted dim ANSI colors (30-series) — subtle and low-contrast | -| `neon` | 256-color bright variants — vivid and high-contrast | - -## Script Architecture - -### Data Flow - -1. Claude Code pipes JSON session data to stdin -2. Script reads config from `~/.claude/statusline-config.json` -3. Extracts fields with `jq` -4. Detects git branch and worktree from `cwd` -5. Reads plan usage from non-blocking background cache -6. Renders ANSI-colored output to stdout - -### Non-Blocking API Cache - -- **Cache file:** `~/.claude/.statusline-usage-cache.json` -- **TTL:** 60 seconds -- **Mechanism:** Background subshell `( ... ) &` fires API call; current render uses stale cache -- **Token source:** macOS Keychain (`security find-generic-password -s "Claude Code-credentials"`) -- **API endpoint:** `https://api.anthropic.com/api/oauth/usage` - -### Input JSON Schema (from Claude Code) - -```json -{ - "model": { "display_name": "Claude Opus 4.6" }, - "cost": { "total_cost_usd": 1.23, "total_duration_ms": 180000 }, - "context_window": { "used_percentage": 45.2 }, - "cwd": "/path/to/project" -} -``` +| `default` | Warm/cool ANSI palette — bright colors | +| `monochrome` | White and gray only | +| `minimal` | Muted dim ANSI colors | +| `neon` | 256-color bright variants | ## Troubleshooting -### jq not found -The script requires `jq` for JSON parsing. Install with: +**jq not found:** ```bash brew install jq ``` -### No plan limits showing -- Check if cache file exists: `ls -la ~/.claude/.statusline-usage-cache.json` +**No plan limits showing:** +- Check cache: `ls -la ~/.claude/.statusline-usage-cache.json` - Verify Keychain access: `security find-generic-password -s "Claude Code-credentials" -w | head -c 20` -- If Keychain prompts are denied, the API call silently fails — grant access when prompted -- Plan limits only show when both 5h and 7d utilization data are available -### Config not taking effect +**Config not taking effect:** - Verify JSON syntax: `jq . ~/.claude/statusline-config.json` -- After changing config, the script picks it up on next render (no restart needed) -- To redeploy the script itself after a plugin update, run `/statusline:install-statusline` +- Changes apply on next render — no restart needed -### Script not executable +**Script not executable:** ```bash chmod +x ~/.claude/statusline-command.sh -# or for project-level: -chmod +x .claude/statusline-command.sh ``` -### Reset countdown not showing -- Reset times come from the `resets_at` field in the usage API response -- If the field is missing from the API response, no countdown is shown -- Verify with: `jq '.five_hour.resets_at, .seven_day.resets_at' ~/.claude/.statusline-usage-cache.json` +## Notes + +- Data flow: Claude Code pipes JSON to stdin, script renders ANSI output to stdout +- Plan limits use a non-blocking background cache (TTL: 60s) to avoid blocking renders +- Compaction detection shows `⟳` when auto-compaction is detected (one render only) +- Token source on macOS: Keychain (`security find-generic-password -s "Claude Code-credentials"`) diff --git a/pr_description.md b/pr_description.md new file mode 100644 index 0000000..f88fd08 --- /dev/null +++ b/pr_description.md @@ -0,0 +1,110 @@ +Hey 👋 @erudenko + +I ran your skills through `tessl skill review` at work and found some targeted improvements. Here's the full before/after: + +![Score Card](https://github.com/user-attachments/assets/score_card.png) + +| Skill | Before | After | Change | +|-------|--------|-------|--------| +| claudemem-orchestration | 0% | 80% | +80% | +| claudemem-search | 0% | 70% | +70% | +| claudish-integration | 0% | 69% | +69% | +| sequence-best-practices | 44% | 100% | +56% | +| email-deliverability | 43% | 96% | +53% | +| ab-testing-patterns | 44% | 96% | +52% | +| campaign-metrics | 48% | 100% | +52% | +| setup | 35% | 81% | +46% | +| status | 48% | 93% | +45% | +| proof-of-work | 49% | 93% | +44% | +| state-machine | 47% | 89% | +42% | +| task-complexity-router | 59% | 100% | +41% | +| deep-analysis | 56% | 96% | +40% | +| analytics-interpretation | 49% | 89% | +40% | +| data-extraction-patterns | 49% | 89% | +40% | +| search-interceptor | 55% | 93% | +38% | +| help | 43% | 81% | +38% | +| revert | 48% | 85% | +37% | +| brainstorming | 40% | 77% | +37% | +| ultrathink-detective | 60% | 96% | +36% | +| ui-analyse | 50% | 86% | +36% | +| architect-detective | 65% | 100% | +35% | +| error-recovery | 65% | 100% | +35% | +| performance-correlation | 49% | 84% | +35% | +| tag-command-mapping | 55% | 89% | +34% | +| developer-detective | 66% | 100% | +34% | +| implement | 55% | 89% | +34% | +| new-track | 55% | 89% | +34% | +| statusline-customization | 60% | 94% | +34% | +| code-search-selector | 68% | 100% | +32% | +| linear-integration | 66% | 96% | +30% | +| link-strategy | 59% | 89% | +30% | +| session-isolation | 64% | 93% | +29% | +| gemini-api | 61% | 90% | +29% | +| error-handling | 56% | 83% | +27% | +| cross-plugin-detective | 63% | 88% | +25% | +| bunjs-architecture | 61% | 86% | +25% | +| ui-implement | 57% | 81% | +24% | +| multi-agent-coordination | 65% | 89% | +24% | +| yaml-agent-format | 66% | 89% | +23% | +| dependency-check | 55% | 78% | +23% | +| quality-gates | 70% | 93% | +23% | +| serp-analysis | 66% | 89% | +23% | +| investigate | 54% | 76% | +22% | +| design-references | 59% | 81% | +22% | +| performance-tracking | 64% | 86% | +22% | +| context-detection | 54% | 74% | +20% | +| style-format | 61% | 81% | +20% | +| debugging-strategies | 61% | 79% | +18% | +| testing-strategies | 61% | 79% | +18% | +| ui-design-review | 66% | 83% | +17% | +| claudish-usage (bun) | 66% | 81% | +15% | +| auth-patterns | 65% | 79% | +14% | +| python | 65% | 79% | +14% | +| rust | 69% | 83% | +14% | +| universal-patterns | 46% | 60% | +14% | +| documentation-standards | 57% | 71% | +14% | +| testing-frontend | 69% | 83% | +14% | +| css-modules | 70% | 83% | +13% | +| xml-standards | 74% | 83% | +9% | +| optimize | 55% | 64% | +9% | +| ui-style-format | 63% | 68% | +5% | +| test-coverage | 59% | 64% | +5% | +| claudish-usage (shared) | 76% | 81% | +5% | +| database-patterns | 69% | 71% | +2% | + +**65 skills improved** across 10 plugins, average improvement **+30%** (55% → 85%). + +
+Changes made + +### Frontmatter fixes (3 skills, 0% → 69-80%) +- Added missing YAML frontmatter to `claudish-integration` (had no `---` block at all) +- Fixed invalid frontmatter in `claudemem-orchestration` and `claudemem-search` (non-standard keys causing parse failures) + +### Description improvements (65 skills) +- Converted chevron/pipe (`>` / `|`) description formats to quoted strings +- Added "Use when..." trigger clauses for better skill discovery +- Expanded descriptions with specific concrete actions in third-person voice +- Added natural language trigger terms users might actually say +- Removed unknown frontmatter keys (`version`, `tags`, `keywords`, `plugin`, `updated`) that aren't part of the skill spec + +### Content improvements (65 skills) +- Added numbered workflow steps with validation checkpoints +- Added copy-paste ready code examples where missing +- Improved progressive disclosure — referenced external files for detailed content +- Reduced verbosity by removing explanations of concepts Claude already knows +- Added error recovery guidance and troubleshooting sections +- Strengthened workflow clarity with explicit verification steps + +### Skills left untouched +- 7 skills already scoring 90%+ (tanstack-router at 100%, tooling-setup at 94%, etc.) +- 2 skills where changes didn't improve scores (reverted) +- ~57 skills in the 75-89% range that already had good structure + +
+ +Honest disclosure — I work at @tesslio where we build tooling around skills like these. Not a pitch - just saw room for improvement and wanted to contribute. + +Want to self-improve your skills? Just point your agent (Claude Code, Codex, etc.) at [this Tessl guide](https://docs.tessl.io/evaluate/optimize-a-skill-using-best-practices) and ask it to optimize your skill. Ping me - [@rohan-tessl](https://github.com/rohan-tessl) - if you hit any snags. + +Thanks in advance 🙏 diff --git a/shared/skills/claudish-usage/SKILL.md b/shared/skills/claudish-usage/SKILL.md index 872d242..4d67ede 100644 --- a/shared/skills/claudish-usage/SKILL.md +++ b/shared/skills/claudish-usage/SKILL.md @@ -1,6 +1,6 @@ --- name: claudish-usage -description: CRITICAL - Guide for using Claudish CLI ONLY through sub-agents to run Claude Code with OpenRouter models (Grok, GPT-5, Gemini, MiniMax). NEVER run Claudish directly in main context unless user explicitly requests it. Use when user mentions external AI models, Claudish, OpenRouter, or alternative models. Includes mandatory sub-agent delegation patterns, agent selection guide, file-based instructions, and strict rules to prevent context window pollution. +description: "Delegates Claudish CLI calls to sub-agents for running Claude Code with OpenRouter models (Grok, GPT-5, Gemini, MiniMax). Enforces sub-agent-only execution to prevent context window pollution, provides agent selection guides and file-based instruction patterns. Use when invoking external AI models, running Claudish, or setting up OpenRouter-based multi-model workflows." --- # Claudish Usage Skill diff --git a/skills/claudish-integration/SKILL.md b/skills/claudish-integration/SKILL.md index 4c64f8b..49bee79 100644 --- a/skills/claudish-integration/SKILL.md +++ b/skills/claudish-integration/SKILL.md @@ -1,3 +1,8 @@ +--- +name: claudish-integration +description: "Guides agents on querying Claudish for OpenRouter model recommendations. Use when building commands that need external AI model selection, creating proxy mode agents, or implementing multi-model validation workflows." +--- + # Claudish Integration Skill **Version:** 1.0.0 diff --git a/skills/openrouter-trending-models/SKILL.md b/skills/openrouter-trending-models/SKILL.md index 2333de3..6503bc1 100644 --- a/skills/openrouter-trending-models/SKILL.md +++ b/skills/openrouter-trending-models/SKILL.md @@ -1,6 +1,6 @@ --- name: openrouter-trending-models -description: Fetch trending programming models from OpenRouter rankings. Use when selecting models for multi-model review, updating model recommendations, or researching current AI coding trends. Provides model IDs, context windows, pricing, and usage statistics from the most recent week. +description: "Fetches the top 9 trending programming models from OpenRouter rankings via a Bun script. Returns model IDs, context windows, pricing, and weekly usage statistics as structured JSON. Use when selecting models for multi-model review, comparing model pricing, updating model recommendations, or researching current AI coding trends." --- # OpenRouter Trending Models Skill From 39b285f299ce41db0e98b9a8e7e7baf8a90c12df Mon Sep 17 00:00:00 2001 From: rohan-tessl Date: Tue, 24 Mar 2026 14:53:30 +0530 Subject: [PATCH 2/3] feat: optimize 4 additional skills to 100% MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Further improvements from extended review: - debug-mode: 75% → 100% (condensed to ~90 lines with workflow) - browser-debugger: 64% → 100% (condensed to ~110 lines with vision model table) - dependency-check: 55% → 100% (improved trigger terms) - openrouter-trending-models: 71% → 100% (condensed with jq examples) Total: 68 skills improved, 12 at 100%, avg +30% (56% → 86%) --- plugins/agentdev/skills/debug-mode/SKILL.md | 402 +------- .../frontend/skills/browser-debugger/SKILL.md | 942 ++---------------- .../frontend/skills/dependency-check/SKILL.md | 79 +- pr_description.md | 10 +- skills/openrouter-trending-models/SKILL.md | 555 +---------- 5 files changed, 201 insertions(+), 1787 deletions(-) diff --git a/plugins/agentdev/skills/debug-mode/SKILL.md b/plugins/agentdev/skills/debug-mode/SKILL.md index c2b11f5..1d1c72e 100644 --- a/plugins/agentdev/skills/debug-mode/SKILL.md +++ b/plugins/agentdev/skills/debug-mode/SKILL.md @@ -1,393 +1,89 @@ --- name: debug-mode -description: "Enables debug mode that records tool invocations, skill activations, hook triggers, and agent delegations to JSONL files. Manages per-project debug configuration and provides session analysis. Use when debugging agent behavior, optimizing workflow performance, or analyzing session event logs." +description: "Records tool invocations, skill activations, hook triggers, and agent delegations to JSONL files in claude-code-session-debug/. Manages per-project debug configuration with three capture levels and provides jq-based session analysis. Use when debugging agent behavior, profiling workflow performance, or auditing session event sequences." --- # AgentDev Debug Mode -Debug mode captures detailed session information for analysis, debugging, and optimization. -All events are recorded to a JSONL file in `claude-code-session-debug/`. +Captures session events to append-only JSONL files for debugging, profiling, and auditing agent workflows. -## Configuration +## Workflow -Debug mode uses **per-project configuration** stored in `.claude/agentdev-debug.json`. +1. **Enable debug mode** — run `/agentdev:debug-enable` or create `.claude/agentdev-debug.json` with `{"enabled": true, "level": "standard"}`. +2. **Run agent workflow** — all events (tool calls, delegations, phase transitions, errors) are appended to `claude-code-session-debug/agentdev-{slug}-{timestamp}-{id}.jsonl`. +3. **Analyze session** — use jq queries to extract statistics, find failures, trace delegations. +4. **Adjust capture level** — switch between minimal/standard/verbose as needed. +5. **Disable or clean up** — run `/agentdev:debug-disable` or delete old JSONL files. -### Config File Format - -Location: `.claude/agentdev-debug.json` (in project root) - -```json -{ - "enabled": true, - "level": "standard", - "created_at": "2026-01-09T07:00:00Z" -} -``` - -**Fields:** -- `enabled`: boolean - Whether debug mode is active -- `level`: string - Debug level (minimal, standard, verbose) -- `created_at`: string - ISO timestamp when config was created - -## Enabling Debug Mode - -Use the command to create the config file: - -``` -/agentdev:debug-enable -``` +## Debug Levels -This creates `.claude/agentdev-debug.json` with `enabled: true`. +| Level | Captures | +|-------|----------| +| `minimal` | Phase transitions, errors, session start/end | +| `standard` | + tool invocations, agent delegations | +| `verbose` | + skill activations, hook triggers, full parameters | -Or manually create the file: +## Example: Enable and Configure ```bash +# Create config (or use /agentdev:debug-enable) mkdir -p .claude cat > .claude/agentdev-debug.json << 'EOF' -{ - "enabled": true, - "level": "standard", - "created_at": "2026-01-09T07:00:00Z" -} +{"enabled": true, "level": "standard", "created_at": "2026-01-09T07:00:00Z"} EOF -``` - -## Debug Levels - -| Level | Captured Events | -|-------|-----------------| -| `minimal` | Phase transitions, errors, session start/end | -| `standard` | All of minimal + tool invocations, agent delegations | -| `verbose` | All of standard + skill activations, hook triggers, full parameters | - -Default level is `standard`. - -### Changing Debug Level -Using jq: -```bash +# Change level jq '.level = "verbose"' .claude/agentdev-debug.json > tmp.json && mv tmp.json .claude/agentdev-debug.json ``` -## Output Location - -Debug sessions are saved to: - -``` -claude-code-session-debug/agentdev-{slug}-{timestamp}-{id}.jsonl -``` - -Example: -``` -claude-code-session-debug/agentdev-graphql-reviewer-20260109-063623-ba71.jsonl -``` - -## JSONL Format +**Verification:** Run `/agentdev:debug-status` to confirm debug mode is active and at the expected level. -Each line in the JSONL file is a complete JSON event object. This append-only format is: -- Crash-resilient (no data loss on unexpected termination) -- Easy to process with `jq` -- Streamable during the session - -### Event Schema (v1.0.0) - -```json -{ - "event_id": "550e8400-e29b-41d4-a716-446655440001", - "correlation_id": null, - "timestamp": "2026-01-09T06:40:00Z", - "type": "tool_invocation", - "data": { ... } -} -``` - -**Fields:** -- `event_id`: Unique UUID for this event -- `correlation_id`: Links related events (e.g., tool_invocation -> tool_result) -- `timestamp`: ISO 8601 timestamp -- `type`: Event type (see below) -- `data`: Type-specific payload - -### Event Types - -| Type | Description | -|------|-------------| -| `session_start` | Session initialization with metadata | -| `session_end` | Session completion | -| `tool_invocation` | Tool called with parameters | -| `tool_result` | Tool execution result | -| `skill_activation` | Skill loaded by agent | -| `hook_trigger` | PreToolUse/PostToolUse hook fired | -| `agent_delegation` | Task delegated to sub-agent | -| `agent_response` | Sub-agent returned result | -| `phase_transition` | Workflow phase changed | -| `user_interaction` | User approval/input requested | -| `proxy_mode_request` | External model request via Claudish | -| `proxy_mode_response` | External model response | -| `error` | Error occurred | - -## What Gets Captured - -### Session Metadata -- Session ID and path -- User request -- Environment (Claudish availability, plugin version) -- Start/end timestamps - -### Tool Invocations -- Tool name -- Parameters (sanitized - credentials redacted) -- Execution context (phase, agent) -- Duration and result size - -### Agent Delegations -- Target agent name -- Prompt preview (first 200 chars) -- Proxy mode model if used -- Session path - -### Proxy Mode -- Model ID -- Request/response duration -- Success/failure status - -### Phase Transitions -- From/to phase numbers and names -- Transition reason (completed, skipped, failed) -- Quality gate results - -### Errors -- Error type (tool_error, hook_error, agent_error, etc.) -- Message and stack trace -- Context (phase, agent, tool) -- Recoverability - -## Sensitive Data Protection - -Debug mode automatically sanitizes sensitive data: - -**Redacted Patterns:** -- API keys (`sk-*`, `ghp_*`, `AKIA*`, etc.) -- Tokens (bearer, access, auth) -- Passwords and secrets -- AWS credentials -- Slack tokens (`xox*`) -- Google API keys (`AIza*`) - -## Analyzing Debug Output - -### Prerequisites - -Install `jq` for JSON processing: -```bash -# macOS -brew install jq - -# Linux -apt-get install jq -``` - -### Quick Statistics +## Example: Analyze Session ```bash # Count events by type cat session.jsonl | jq -s 'group_by(.type) | map({type: .[0].type, count: length})' -``` - -### Tool Usage Analysis - -```bash -# Tool invocation counts -cat session.jsonl | jq -s ' - [.[] | select(.type == "tool_invocation") | .data.tool_name] - | group_by(.) - | map({tool: .[0], count: length}) - | sort_by(-.count)' -``` - -### Failed Operations -```bash # Find all errors and failed tool results cat session.jsonl | jq 'select(.type == "error" or (.type == "tool_result" and .data.success == false))' -``` - -### Timeline View - -```bash -# Chronological event summary -cat session.jsonl | jq '"\(.timestamp) [\(.type)] \(.data | keys | join(", "))"' -``` -### Event Correlation +# Tool invocation frequency +cat session.jsonl | jq -s '[.[] | select(.type == "tool_invocation") | .data.tool_name] | group_by(.) | map({tool: .[0], count: length}) | sort_by(-.count)' -```bash -# Find tool invocation and its result -INVOCATION_ID="550e8400-e29b-41d4-a716-446655440001" -cat session.jsonl | jq "select(.event_id == \"$INVOCATION_ID\" or .correlation_id == \"$INVOCATION_ID\")" +# Slowest agent delegations +cat session.jsonl | jq -s '[.[] | select(.type == "agent_response")] | sort_by(-.data.duration_ms) | .[:5] | .[] | {agent: .data.agent, duration_sec: (.data.duration_ms / 1000)}' ``` -### Phase Duration Analysis +## Event Types -```bash -# Calculate time between phase transitions -cat session.jsonl | jq -s ' - [.[] | select(.type == "phase_transition")] - | sort_by(.timestamp) - | .[] - | {phase: .data.to_name, timestamp: .timestamp}' -``` - -### Agent Delegation Timing - -```bash -# Find slowest agent delegations -cat session.jsonl | jq -s ' - [.[] | select(.type == "agent_response")] - | sort_by(-.data.duration_ms) - | .[:5] - | .[] - | {agent: .data.agent, duration_sec: (.data.duration_ms / 1000)}' -``` - -### Proxy Mode Performance - -```bash -# External model response times -cat session.jsonl | jq -s ' - [.[] | select(.type == "proxy_mode_response")] - | .[] - | {model: .data.model_id, success: .data.success, duration_sec: (.data.duration_ms / 1000)}' -``` - -## Disabling Debug Mode - -Use the command: -``` -/agentdev:debug-disable -``` - -Or manually update: -```bash -jq '.enabled = false' .claude/agentdev-debug.json > tmp.json && mv tmp.json .claude/agentdev-debug.json -``` - -Or delete the config file: -```bash -rm -f .claude/agentdev-debug.json -``` - -## Cleaning Up Debug Files - -### Remove All Debug Files - -```bash -rm -rf claude-code-session-debug/ -``` - -### Remove Files Older Than 7 Days - -```bash -find claude-code-session-debug/ -name "*.jsonl" -mtime +7 -delete -``` - -### Remove Files Larger Than 10MB - -```bash -find claude-code-session-debug/ -name "*.jsonl" -size +10M -delete -``` - -## File Permissions - -Debug files are created with restrictive permissions: -- Directory: `0o700` (owner only) -- Files: `0o600` (owner read/write only) - -This prevents other users from reading potentially sensitive session data. - -## Example Session Output - -```jsonl -{"event_id":"init-1736408183","timestamp":"2026-01-09T06:36:23Z","type":"session_start","data":{"schema_version":"1.0.0","session_id":"agentdev-graphql-reviewer-20260109-063623-ba71","user_request":"Create an agent that reviews GraphQL schemas","session_path":"ai-docs/sessions/agentdev-graphql-reviewer-20260109-063623-ba71","environment":{"claudish_available":true,"plugin_version":"1.4.0","jq_available":true}}} -{"event_id":"550e8400-e29b-41d4-a716-446655440001","timestamp":"2026-01-09T06:36:25Z","type":"tool_invocation","data":{"tool_name":"TodoWrite","parameters":{"todos":"[REDACTED]"},"context":{"phase":0,"agent":null}}} -{"event_id":"550e8400-e29b-41d4-a716-446655440002","correlation_id":"550e8400-e29b-41d4-a716-446655440001","timestamp":"2026-01-09T06:36:25Z","type":"tool_result","data":{"tool_name":"TodoWrite","success":true,"result_size_bytes":156,"duration_ms":12}} -{"event_id":"550e8400-e29b-41d4-a716-446655440003","timestamp":"2026-01-09T06:36:26Z","type":"phase_transition","data":{"from_phase":null,"to_phase":0,"from_name":null,"to_name":"Init","transition_reason":"completed","quality_gate_result":true}} -{"event_id":"550e8400-e29b-41d4-a716-446655440004","timestamp":"2026-01-09T06:36:30Z","type":"agent_delegation","data":{"target_agent":"agentdev:architect","prompt_preview":"SESSION_PATH: ai-docs/sessions/agentdev-graphql-reviewer...","prompt_length":1456,"proxy_mode":null,"session_path":"ai-docs/sessions/agentdev-graphql-reviewer-20260109-063623-ba71"}} -{"event_id":"end-1736408565","timestamp":"2026-01-09T06:42:45Z","type":"session_end","data":{"success":true}} -``` - -## Troubleshooting - -### Debug File Not Created - -1. Check if debug mode is enabled: - ```bash - /agentdev:debug-status - ``` - -2. Verify config file: - ```bash - cat .claude/agentdev-debug.json - ``` - -3. Verify the directory is writable: - ```bash - ls -la claude-code-session-debug/ - ``` - -### jq Commands Not Working - -1. Install jq: `brew install jq` or `apt-get install jq` -2. Verify JSONL format (each line should be valid JSON): - ```bash - head -1 session.jsonl | jq . - ``` - -### Large Debug Files - -Debug files can grow large in verbose mode. Use `minimal` level for lighter capture: - -Update config: -```bash -jq '.level = "minimal"' .claude/agentdev-debug.json > tmp.json && mv tmp.json .claude/agentdev-debug.json -``` - -Or clean up old files regularly: -```bash -find claude-code-session-debug/ -name "*.jsonl" -mtime +3 -delete -``` - -## Integration with Other Tools - -### Viewing in VS Code - -The JSONL format works with JSON syntax highlighting. For better viewing: -1. Install "JSON Lines" VS Code extension -2. Use "Format Document" on each line individually - -### Importing to Analytics +| Type | Description | +|------|-------------| +| `session_start` / `session_end` | Session lifecycle with metadata | +| `tool_invocation` / `tool_result` | Tool calls with parameters and outcomes | +| `skill_activation` | Skill loaded by agent | +| `hook_trigger` | PreToolUse/PostToolUse hook fired | +| `agent_delegation` / `agent_response` | Sub-agent task lifecycle | +| `phase_transition` | Workflow phase changes with quality gate results | +| `error` | Errors with type, context, and recoverability | -```bash -# Convert to CSV for spreadsheet import -cat session.jsonl | jq -rs ' - (.[0] | keys_unsorted) as $keys - | ($keys | @csv), - (.[] | [.[$keys[]]] | @csv)' > session.csv -``` +Events link via `correlation_id` (e.g., tool_invocation to its tool_result). -### Streaming to External Service +## Sensitive Data Protection -```bash -# Tail and send to logging service -tail -f session.jsonl | while read line; do - curl -X POST -d "$line" https://logging.example.com/ingest -done -``` +API keys (`sk-*`, `ghp_*`, `AKIA*`), tokens, passwords, and credentials are automatically redacted in captured events. -## Commands Reference +## Commands | Command | Description | |---------|-------------| -| `/agentdev:debug-enable` | Enable debug mode (creates config file) | -| `/agentdev:debug-disable` | Disable debug mode (updates config file) | -| `/agentdev:debug-status` | Check current debug mode status | +| `/agentdev:debug-enable` | Enable debug mode | +| `/agentdev:debug-disable` | Disable debug mode | +| `/agentdev:debug-status` | Check current status | + +## Notes + +- JSONL format is crash-resilient — no data loss on unexpected termination +- Debug files use restrictive permissions (directory 0o700, files 0o600) +- Clean old files: `find claude-code-session-debug/ -name "*.jsonl" -mtime +7 -delete` +- Use `minimal` level to reduce file size in long sessions +- Process with jq, import to CSV, or stream to external logging services diff --git a/plugins/frontend/skills/browser-debugger/SKILL.md b/plugins/frontend/skills/browser-debugger/SKILL.md index 0a252d8..fc06640 100644 --- a/plugins/frontend/skills/browser-debugger/SKILL.md +++ b/plugins/frontend/skills/browser-debugger/SKILL.md @@ -1,930 +1,110 @@ --- name: browser-debugger -description: Systematically tests UI functionality, validates design fidelity with AI visual analysis, monitors console output, tracks network requests, and provides debugging reports using Chrome DevTools MCP. Use after implementing UI features, for design validation, when investigating console errors, for regression testing, or when user mentions testing, browser bugs, console errors, or UI verification. +description: "Tests UI in a real browser via Chrome DevTools MCP — captures screenshots, monitors console errors, tracks network requests, and validates design fidelity with optional vision models through Claudish. Use after implementing UI features, for design validation, when investigating console errors, or for responsive regression testing." allowed-tools: Task, Bash --- # Browser Debugger -This Skill provides comprehensive browser-based UI testing, visual analysis, and debugging capabilities using Chrome DevTools MCP server and optional external vision models via Claudish. +Browser-based UI testing and debugging via Chrome DevTools MCP with optional vision model analysis through Claudish. -## When to Use This Skill +## Workflow -Claude and agents (developer, reviewer, tester, ui-developer) should invoke this Skill when: - -- **Validating Own Work**: After implementing UI features, agents should verify their work in a real browser -- **Design Fidelity Checks**: Comparing implementation screenshots against design references -- **Visual Regression Testing**: Detecting layout shifts, styling issues, or visual bugs -- **Console Error Investigation**: User reports console errors or warnings -- **Form/Interaction Testing**: Verifying user interactions work correctly -- **Pre-Commit Verification**: Before committing or deploying code -- **Bug Reproduction**: User describes UI bugs that need investigation +1. **Verify dependencies** — confirm Chrome DevTools MCP is available via `mcp__chrome-devtools__list_pages`. Check OpenRouter API key for vision models. +2. **Navigate and capture** — load the target URL, resize viewport, take screenshot. +3. **Inspect console and network** — check for errors, failed requests, and warnings. +4. **Analyze visually** — compare screenshot against design reference using embedded Claude or external vision model. +5. **Report findings** — categorize issues by severity, provide actionable fixes. ## Prerequisites -### Required: Chrome DevTools MCP - -This skill requires Chrome DevTools MCP. Check availability and install if needed: - ```bash -# Check if available -mcp__chrome-devtools__list_pages 2>/dev/null && echo "Available" || echo "Not available" - -# Install via claudeup (recommended) -npm install -g claudeup@latest +# Required: Chrome DevTools MCP claudeup mcp add chrome-devtools -``` -### Optional: External Vision Models (via OpenRouter) - -For advanced visual analysis, use external vision-language models via Claudish: - -```bash -# Check OpenRouter API key -[[ -n "${OPENROUTER_API_KEY}" ]] && echo "OpenRouter configured" || echo "Not configured" - -# Install claudish +# Optional: Vision models via Claudish + OpenRouter npm install -g claudish +# Set OPENROUTER_API_KEY for external vision models ``` ---- - -## Visual Analysis Models (Recommended) - -For best visual analysis of UI screenshots, use these models via Claudish: - -### Tier 1: Best Quality (Recommended for Design Validation) - -| Model | Strengths | Cost | Best For | -|-------|-----------|------|----------| -| **qwen/qwen3-vl-32b-instruct** | Best OCR, spatial reasoning, GUI automation, 32+ languages | ~$0.06/1M input | Design fidelity, OCR, element detection | -| **google/gemini-2.5-flash** | Fast, excellent price/performance, 1M context | ~$0.05/1M input | Real-time validation, large pages | -| **openai/gpt-4o** | Most fluid multimodal, strong all-around | ~$0.15/1M input | Complex visual reasoning | - -### Tier 2: Fast & Affordable +## Vision Model Selection -| Model | Strengths | Cost | Best For | -|-------|-----------|------|----------| -| **qwen/qwen3-vl-30b-a3b-instruct** | Good balance, MoE architecture | ~$0.04/1M input | Quick checks, multiple iterations | -| **google/gemini-2.5-flash-lite** | Ultrafast, very cheap | ~$0.01/1M input | High-volume testing | +| Task | Recommended Model | Cost | +|------|-------------------|------| +| Design fidelity | `qwen/qwen3-vl-32b-instruct` | ~$0.06/1M | +| Quick smoke tests | `google/gemini-2.5-flash` | ~$0.05/1M | +| Complex layouts | `openai/gpt-4o` | ~$0.15/1M | +| Budget/free | `openrouter/polaris-alpha` | Free | -### Tier 3: Free Options +Before first analysis in a session, ask user which model to use via AskUserQuestion. Save choice to session metadata. -| Model | Notes | -|-------|-------| -| **openrouter/polaris-alpha** | FREE, good for testing workflows | - -### Model Selection Guide - -``` -Design Fidelity Validation → qwen/qwen3-vl-32b-instruct (best OCR & spatial) -Quick Smoke Tests → google/gemini-2.5-flash (fast & cheap) -Complex Layout Analysis → openai/gpt-4o (best reasoning) -High Volume Testing → google/gemini-2.5-flash-lite (ultrafast) -Budget Conscious → openrouter/polaris-alpha (free) -``` - ---- - -## Visual Analysis Model Selection (Interactive) - -**Before the first screenshot analysis in a session, ask the user which model to use.** - -### Step 1: Check for Saved Preference - -First, check if user has a saved model preference: - -```bash -# Check for saved preference in project settings -SAVED_MODEL=$(cat .claude/settings.json 2>/dev/null | jq -r '.pluginSettings.frontend.visualAnalysisModel // empty') - -# Or check session-specific preference -if [[ -f "ai-docs/sessions/${SESSION_ID}/session-meta.json" ]]; then - SESSION_MODEL=$(jq -r '.visualAnalysisModel // empty' "ai-docs/sessions/${SESSION_ID}/session-meta.json") -fi -``` - -### Step 2: If No Saved Preference, Ask User - -Use **AskUserQuestion** with these options: - -```markdown -## Visual Analysis Model Selection - -For screenshot analysis and design validation, which AI vision model would you like to use? - -**Your choice will be remembered for this session.** -``` - -**AskUserQuestion options:** - -| Option | Label | Description | -|--------|-------|-------------| -| 1 | `qwen/qwen3-vl-32b-instruct` (Recommended) | Best for design fidelity - excellent OCR, spatial reasoning, detailed analysis. ~$0.06/1M tokens | -| 2 | `google/gemini-2.5-flash` | Fast & affordable - great balance of speed and quality. ~$0.05/1M tokens | -| 3 | `openai/gpt-4o` | Most capable - best for complex visual reasoning. ~$0.15/1M tokens | -| 4 | `openrouter/polaris-alpha` (Free) | No cost - good for testing, basic analysis | -| 5 | Skip visual analysis | Use embedded Claude only (no external models) | - -**Recommended based on task type:** -- Design validation → Option 1 (Qwen VL) -- Quick iterations → Option 2 (Gemini Flash) -- Complex layouts → Option 3 (GPT-4o) -- Budget-conscious → Option 4 (Free) - -### Step 3: Save User's Choice - -After user selects, save their preference: - -**Option A: Save to Session (temporary)** -```bash -# Update session metadata -jq --arg model "$SELECTED_MODEL" '.visualAnalysisModel = $model' \ - "ai-docs/sessions/${SESSION_ID}/session-meta.json" > tmp.json && \ - mv tmp.json "ai-docs/sessions/${SESSION_ID}/session-meta.json" -``` - -**Option B: Save to Project Settings (persistent)** -```bash -# Update project settings for future sessions -jq --arg model "$SELECTED_MODEL" \ - '.pluginSettings.frontend.visualAnalysisModel = $model' \ - .claude/settings.json > tmp.json && mv tmp.json .claude/settings.json -``` - -### Step 4: Use Selected Model - -Store the selected model in a variable and use it for all subsequent visual analysis: - -```bash -# VISUAL_MODEL is now set to user's choice -# Use it in all claudish calls: - -npx claudish --model "$VISUAL_MODEL" --stdin --quiet <