refactor: strengthen create-expert delegation and instruction quality guardrails (#766)

FL4TLiN3 · claude · web-flow · commit ae642c6d8566 · 2026-03-13T23:03:22.000+09:00
Generated expert definitions were missing `delegates` arrays (making delegation
silently fail) and contained code snippets, procedural instructions, and
file-by-file output specs that bloat instructions without adding value.

Changes across write-definition, design-roles, and verify-test:
- Make `delegates` array explicitly REQUIRED for coordinators in schema, rules, and checks
- Add verify-test checks for missing delegates, code snippets, file specs, library guides
- Strengthen instruction quality rules with concrete anti-pattern examples
- Bump all expert versions to 1.0.6

Co-authored-by: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/definitions/create-expert/perstack.toml b/definitions/create-expert/perstack.toml
@@ -17,7 +17,7 @@
 
 [experts."create-expert"]
 defaultModelTier = "high"
-version = "1.0.5"
+version = "1.0.6"
 description = "Creates and modifies Perstack expert definitions in perstack.toml"
 instruction = """
 You are the coordinator for creating and modifying Perstack expert definitions. perstack.toml is the single source of truth — your job is to produce or modify it according to the user's request.
@@ -65,7 +65,7 @@ pick = ["readTextFile", "exec", "attemptCompletion"]
 
 [experts."@create-expert/plan"]
 defaultModelTier = "high"
-version = "1.0.5"
+version = "1.0.6"
 description = """
 Analyzes the user's request and defines the expert's product requirements.
 Provide: (1) what the expert should do, (2) path to existing perstack.toml if one exists.
@@ -159,7 +159,7 @@ pick = [
 
 [experts."@create-expert/design-roles"]
 defaultModelTier = "high"
-version = "1.0.5"
+version = "1.0.6"
 description = """
 Designs the technical architecture for a Perstack expert from a requirements plan.
 Provide: path to plan.md.
@@ -173,7 +173,7 @@ You are a technical architect for Perstack experts. You take a product requireme
 - **description** = public interface. Seen by delegating experts as a tool description. Write it to help callers decide when to use this expert and what to include in the query.
 - **instruction** = private domain knowledge. Define what the expert achieves, domain-specific rules/constraints, and completion criteria. NOT step-by-step procedures.
 - **skills** = MCP tools (file ops, exec, custom MCP servers). Always include attemptCompletion.
-- **delegates** = other experts this one can call. Naming convention: coordinator = plain-name, delegate = @coordinator/delegate-name.
+- **delegates** = REQUIRED array for any expert that delegates. Without this array, the runtime cannot register delegates as callable tools — delegation silently fails. Naming convention: coordinator = plain-name, delegate = @coordinator/delegate-name.
 - **Context isolation**: delegates receive only the query, no parent context. Data exchange happens via workspace files.
 - **Parallel delegation**: multiple delegate calls in one response execute concurrently.
 
@@ -234,7 +234,7 @@ For each expert:
 - Role summary
 - Skills needed: specific @perstack/base tools as a pick list (e.g., `pick = ["readTextFile", "exec", "attemptCompletion"]`). Only include tools the expert actually needs.
 - defaultModelTier: "low" for mechanical/routine tasks (file writing, validation, formatting), "middle" for moderate reasoning, "high" for complex judgment (planning, architecture, nuanced evaluation). Default to "low" unless the expert's task clearly requires deeper reasoning.
-- Delegates (if coordinator)
+- delegates array (REQUIRED for any expert that delegates — list all delegate keys explicitly)
 
 ### MCP Skills
 For each MCP skill found by find-skill:
@@ -266,7 +266,7 @@ pick = [
 
 [experts."@create-expert/build"]
 defaultModelTier = "low"
-version = "1.0.5"
+version = "1.0.6"
 description = """
 Orchestrates the write → test → verify → improve cycle for perstack.toml.
 Provide: path to plan.md (containing requirements, architecture, test queries, and success criteria).
@@ -328,7 +328,7 @@ pick = ["readTextFile", "exec", "todo", "attemptCompletion"]
 
 [experts."@create-expert/write-definition"]
 defaultModelTier = "low"
-version = "1.0.5"
+version = "1.0.6"
 description = """
 Writes or modifies a perstack.toml definition from plan.md requirements and architecture.
 Provide: (1) path to plan.md, (2) optionally path to existing perstack.toml to preserve, (3) optionally feedback from a failed test to address.
@@ -345,7 +345,7 @@ description = "Brief description of what this expert does"  # caller-facing: whe
 instruction = \"\"\"
 Domain knowledge and guidelines for the expert.
 \"\"\"
-delegates = ["@expert-name/delegate"]  # optional
+delegates = ["@expert-name/delegate"]  # REQUIRED for coordinators/sub-coordinators
 
 # Skill key MUST be exactly "@perstack/base" — runtime requires this exact key
 [experts."expert-name".skills."@perstack/base"]
@@ -367,6 +367,7 @@ instruction = \"\"\"Domain knowledge.\"\"\"
 
 - **File structure**: start every perstack.toml with a TOML comment block showing the delegation tree as an ASCII diagram, followed by expert definitions in tree order (coordinator first, then depth-first through delegates). This file header serves as a map for anyone reading the definition.
 - **Expert keys**: coordinators = kebab-case (`my-expert`), delegates = `@coordinator/delegate-name` (never omit @)
+- **Delegates (CRITICAL)**: every expert that delegates to others MUST have a `delegates` array listing all delegate keys. Without this array, the runtime cannot register delegates as callable tools and delegation will silently fail. Leaf experts (no delegates) omit this field entirely.
 - **Skills**: minimal set. Always include attemptCompletion. Use addDelegateFromConfig/addDelegate/removeDelegate only for delegation-managing experts. Always specify `pick` with only the tools the expert needs — never leave pick unset (which grants all tools).
 - **defaultModelTier**: always set per expert. Use the tier specified in plan.md's architecture section.
 - **TOML**: triple-quoted strings for multi-line instructions. Every expert needs version, description, instruction. `"@perstack/base"` is the exact required key — never `"base"` or aliases.
@@ -383,13 +384,14 @@ The instruction field is the most impactful part of the definition. Apply these
 - Priority rules for when constraints conflict
 
 ### What does NOT belong in an instruction
-- **Code snippets and implementation templates** — the LLM knows how to write code. Showing it a mulberry32 PRNG implementation or a blessed screen setup teaches it nothing. State the constraint ("use seeded RNG for deterministic tests") and let the LLM implement it.
-- **General programming knowledge** — ECS patterns, A* search, collision detection algorithms, TypeScript configuration. These are well within the LLM's training. Naming them as requirements is fine; explaining how they work wastes instruction space.
-- **Step-by-step procedures** — "first do X, then Y, then Z." Define the goal and constraints; the LLM will figure out the steps.
-- **File-by-file output specifications** — "create src/engine/ecs.ts, src/engine/state.ts, ..." Let the LLM decide the file structure based on the requirements.
+- **Code snippets and implementation templates** — the LLM knows how to write code. Never include inline code blocks, JSON schema examples, TypeScript interfaces, or mock patterns. State the constraint ("output JSON Lines with one object per turn") and let the LLM implement it. A code snippet in an instruction is always a sign that the author didn't trust the LLM enough.
+- **General programming knowledge** — ECS patterns, A* search, collision detection, terminal ANSI codes, Jest configuration, tsconfig settings, package.json structure. These are well within the LLM's training. Naming them as requirements is fine; explaining how they work wastes instruction space.
+- **Step-by-step procedures** — "first do X, then Y, then Z." Define the goal and constraints; the LLM will figure out the steps. Numbered implementation checklists and ordered task lists are procedures in disguise.
+- **File-by-file output specifications** — "create src/engine/ecs.ts, src/engine/state.ts, ..." Let the LLM decide the file structure based on the requirements. Specifying exact file paths constrains the LLM without adding value.
+- **Library selection guides** — "prefer ink for React-like, blessed for widgets, chalk as fallback." The LLM can choose appropriate libraries. State the requirement ("interactive TUI with keyboard input"), not the implementation choice.
 
 ### Self-check
-Before writing each instruction, ask: "If I removed this sentence, would the LLM produce a worse result?" If the answer is no — because the LLM already knows this — remove it.
+Before writing each instruction, ask: "If I removed this sentence, would the LLM produce a worse result?" If the answer is no — because the LLM already knows this — remove it. Apply this test to every paragraph, every bullet point, and every sub-heading.
 
 ## Description Rules
 
@@ -432,7 +434,7 @@ pick = [
 
 [experts."@create-expert/verify-test"]
 defaultModelTier = "low"
-version = "1.0.5"
+version = "1.0.6"
 description = """
 Independently verifies test-expert results by inspecting produced artifacts against success criteria.
 Provide: (1) the test-expert's result (status, query, criteria evaluation), (2) the success criteria from plan.md, (3) path to perstack.toml and workspace.
@@ -452,11 +454,14 @@ You are an independent test verifier. You do NOT trust test-expert's verdict at
 ### 2. Definition Quality Verification
 Read the perstack.toml and check for these quality issues. Any violation is grounds for CONTINUE:
 
-- **Bloated instructions**: instructions containing code snippets, implementation templates, or general programming knowledge that the LLM already knows. Instructions should contain only domain-specific constraints, policies, and quality bars.
+- **Missing delegates array (CRITICAL)**: every expert whose instruction references delegating to other experts MUST have a `delegates` array listing those experts. Without it, the runtime cannot register delegates as tools and delegation silently fails. Cross-check: if an instruction mentions delegating to `@scope/name`, then `delegates` must include `"@scope/name"`. This is the single most common and most severe defect.
+- **Bloated instructions**: instructions containing code snippets (```), implementation templates, JSON schema examples, TypeScript interfaces, mock patterns, or general programming knowledge that the LLM already knows. Instructions should contain only domain-specific constraints, policies, and quality bars. Scan for fenced code blocks and inline code — their presence almost always indicates bloat.
 - **Missing pick**: every @perstack/base skill must have an explicit `pick` list. Omitting pick grants all tools, which is almost never correct.
 - **Missing defaultModelTier**: every expert should have a defaultModelTier set.
 - **Flat delegation without justification**: if a coordinator has many direct delegates with interdependencies, suggest grouping related delegates under sub-coordinators based on shared concerns (cohesion).
-- **Procedural instructions**: instructions that read as step-by-step procedures rather than domain knowledge (rules, constraints, policies).
+- **Procedural instructions**: instructions that read as step-by-step procedures rather than domain knowledge (rules, constraints, policies). Numbered implementation checklists are procedures in disguise.
+- **File-by-file output specs**: instructions that specify exact output file paths (e.g., "create src/Game.ts, src/tui.ts"). The LLM should decide file structure.
+- **Library selection guides**: instructions that list library alternatives with selection criteria (e.g., "prefer ink for X, blessed for Y"). State the requirement, not the implementation.
 
 ## Verdicts
 
@@ -482,7 +487,7 @@ pick = ["readTextFile", "exec", "todo", "attemptCompletion"]
 
 [experts."@create-expert/test-expert"]
 defaultModelTier = "low"
-version = "1.0.5"
+version = "1.0.6"
 description = """
 Tests a single query against a Perstack expert definition.
 Provide: (1) path to perstack.toml, (2) the test query to execute, (3) the success criteria to evaluate against, (4) the coordinator expert name to test.
@@ -546,7 +551,7 @@ pick = [
 
 [experts."@create-expert/find-skill"]
 defaultModelTier = "low"
-version = "1.0.5"
+version = "1.0.6"
 description = """
 Searches the MCP registry for MCP servers that match a skill requirement.
 Provide: the capability needed and suggested search keywords.