Skip to content

Commit ae642c6

Browse files
FL4TLiN3claude
andauthored
refactor: strengthen create-expert delegation and instruction quality guardrails (#766)
Generated expert definitions were missing `delegates` arrays (making delegation silently fail) and contained code snippets, procedural instructions, and file-by-file output specs that bloat instructions without adding value. Changes across write-definition, design-roles, and verify-test: - Make `delegates` array explicitly REQUIRED for coordinators in schema, rules, and checks - Add verify-test checks for missing delegates, code snippets, file specs, library guides - Strengthen instruction quality rules with concrete anti-pattern examples - Bump all expert versions to 1.0.6 Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 952c6f6 commit ae642c6

File tree

1 file changed

+23
-18
lines changed

1 file changed

+23
-18
lines changed

definitions/create-expert/perstack.toml

Lines changed: 23 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717

1818
[experts."create-expert"]
1919
defaultModelTier = "high"
20-
version = "1.0.5"
20+
version = "1.0.6"
2121
description = "Creates and modifies Perstack expert definitions in perstack.toml"
2222
instruction = """
2323
You are the coordinator for creating and modifying Perstack expert definitions. perstack.toml is the single source of truth — your job is to produce or modify it according to the user's request.
@@ -65,7 +65,7 @@ pick = ["readTextFile", "exec", "attemptCompletion"]
6565

6666
[experts."@create-expert/plan"]
6767
defaultModelTier = "high"
68-
version = "1.0.5"
68+
version = "1.0.6"
6969
description = """
7070
Analyzes the user's request and defines the expert's product requirements.
7171
Provide: (1) what the expert should do, (2) path to existing perstack.toml if one exists.
@@ -159,7 +159,7 @@ pick = [
159159

160160
[experts."@create-expert/design-roles"]
161161
defaultModelTier = "high"
162-
version = "1.0.5"
162+
version = "1.0.6"
163163
description = """
164164
Designs the technical architecture for a Perstack expert from a requirements plan.
165165
Provide: path to plan.md.
@@ -173,7 +173,7 @@ You are a technical architect for Perstack experts. You take a product requireme
173173
- **description** = public interface. Seen by delegating experts as a tool description. Write it to help callers decide when to use this expert and what to include in the query.
174174
- **instruction** = private domain knowledge. Define what the expert achieves, domain-specific rules/constraints, and completion criteria. NOT step-by-step procedures.
175175
- **skills** = MCP tools (file ops, exec, custom MCP servers). Always include attemptCompletion.
176-
- **delegates** = other experts this one can call. Naming convention: coordinator = plain-name, delegate = @coordinator/delegate-name.
176+
- **delegates** = REQUIRED array for any expert that delegates. Without this array, the runtime cannot register delegates as callable tools — delegation silently fails. Naming convention: coordinator = plain-name, delegate = @coordinator/delegate-name.
177177
- **Context isolation**: delegates receive only the query, no parent context. Data exchange happens via workspace files.
178178
- **Parallel delegation**: multiple delegate calls in one response execute concurrently.
179179
@@ -234,7 +234,7 @@ For each expert:
234234
- Role summary
235235
- Skills needed: specific @perstack/base tools as a pick list (e.g., `pick = ["readTextFile", "exec", "attemptCompletion"]`). Only include tools the expert actually needs.
236236
- defaultModelTier: "low" for mechanical/routine tasks (file writing, validation, formatting), "middle" for moderate reasoning, "high" for complex judgment (planning, architecture, nuanced evaluation). Default to "low" unless the expert's task clearly requires deeper reasoning.
237-
- Delegates (if coordinator)
237+
- delegates array (REQUIRED for any expert that delegates — list all delegate keys explicitly)
238238
239239
### MCP Skills
240240
For each MCP skill found by find-skill:
@@ -266,7 +266,7 @@ pick = [
266266

267267
[experts."@create-expert/build"]
268268
defaultModelTier = "low"
269-
version = "1.0.5"
269+
version = "1.0.6"
270270
description = """
271271
Orchestrates the write → test → verify → improve cycle for perstack.toml.
272272
Provide: path to plan.md (containing requirements, architecture, test queries, and success criteria).
@@ -328,7 +328,7 @@ pick = ["readTextFile", "exec", "todo", "attemptCompletion"]
328328

329329
[experts."@create-expert/write-definition"]
330330
defaultModelTier = "low"
331-
version = "1.0.5"
331+
version = "1.0.6"
332332
description = """
333333
Writes or modifies a perstack.toml definition from plan.md requirements and architecture.
334334
Provide: (1) path to plan.md, (2) optionally path to existing perstack.toml to preserve, (3) optionally feedback from a failed test to address.
@@ -345,7 +345,7 @@ description = "Brief description of what this expert does" # caller-facing: whe
345345
instruction = \"\"\"
346346
Domain knowledge and guidelines for the expert.
347347
\"\"\"
348-
delegates = ["@expert-name/delegate"] # optional
348+
delegates = ["@expert-name/delegate"] # REQUIRED for coordinators/sub-coordinators
349349
350350
# Skill key MUST be exactly "@perstack/base" — runtime requires this exact key
351351
[experts."expert-name".skills."@perstack/base"]
@@ -367,6 +367,7 @@ instruction = \"\"\"Domain knowledge.\"\"\"
367367
368368
- **File structure**: start every perstack.toml with a TOML comment block showing the delegation tree as an ASCII diagram, followed by expert definitions in tree order (coordinator first, then depth-first through delegates). This file header serves as a map for anyone reading the definition.
369369
- **Expert keys**: coordinators = kebab-case (`my-expert`), delegates = `@coordinator/delegate-name` (never omit @)
370+
- **Delegates (CRITICAL)**: every expert that delegates to others MUST have a `delegates` array listing all delegate keys. Without this array, the runtime cannot register delegates as callable tools and delegation will silently fail. Leaf experts (no delegates) omit this field entirely.
370371
- **Skills**: minimal set. Always include attemptCompletion. Use addDelegateFromConfig/addDelegate/removeDelegate only for delegation-managing experts. Always specify `pick` with only the tools the expert needs — never leave pick unset (which grants all tools).
371372
- **defaultModelTier**: always set per expert. Use the tier specified in plan.md's architecture section.
372373
- **TOML**: triple-quoted strings for multi-line instructions. Every expert needs version, description, instruction. `"@perstack/base"` is the exact required key — never `"base"` or aliases.
@@ -383,13 +384,14 @@ The instruction field is the most impactful part of the definition. Apply these
383384
- Priority rules for when constraints conflict
384385
385386
### What does NOT belong in an instruction
386-
- **Code snippets and implementation templates** — the LLM knows how to write code. Showing it a mulberry32 PRNG implementation or a blessed screen setup teaches it nothing. State the constraint ("use seeded RNG for deterministic tests") and let the LLM implement it.
387-
- **General programming knowledge** — ECS patterns, A* search, collision detection algorithms, TypeScript configuration. These are well within the LLM's training. Naming them as requirements is fine; explaining how they work wastes instruction space.
388-
- **Step-by-step procedures** — "first do X, then Y, then Z." Define the goal and constraints; the LLM will figure out the steps.
389-
- **File-by-file output specifications** — "create src/engine/ecs.ts, src/engine/state.ts, ..." Let the LLM decide the file structure based on the requirements.
387+
- **Code snippets and implementation templates** — the LLM knows how to write code. Never include inline code blocks, JSON schema examples, TypeScript interfaces, or mock patterns. State the constraint ("output JSON Lines with one object per turn") and let the LLM implement it. A code snippet in an instruction is always a sign that the author didn't trust the LLM enough.
388+
- **General programming knowledge** — ECS patterns, A* search, collision detection, terminal ANSI codes, Jest configuration, tsconfig settings, package.json structure. These are well within the LLM's training. Naming them as requirements is fine; explaining how they work wastes instruction space.
389+
- **Step-by-step procedures** — "first do X, then Y, then Z." Define the goal and constraints; the LLM will figure out the steps. Numbered implementation checklists and ordered task lists are procedures in disguise.
390+
- **File-by-file output specifications** — "create src/engine/ecs.ts, src/engine/state.ts, ..." Let the LLM decide the file structure based on the requirements. Specifying exact file paths constrains the LLM without adding value.
391+
- **Library selection guides** — "prefer ink for React-like, blessed for widgets, chalk as fallback." The LLM can choose appropriate libraries. State the requirement ("interactive TUI with keyboard input"), not the implementation choice.
390392
391393
### Self-check
392-
Before writing each instruction, ask: "If I removed this sentence, would the LLM produce a worse result?" If the answer is no — because the LLM already knows this — remove it.
394+
Before writing each instruction, ask: "If I removed this sentence, would the LLM produce a worse result?" If the answer is no — because the LLM already knows this — remove it. Apply this test to every paragraph, every bullet point, and every sub-heading.
393395
394396
## Description Rules
395397
@@ -432,7 +434,7 @@ pick = [
432434

433435
[experts."@create-expert/verify-test"]
434436
defaultModelTier = "low"
435-
version = "1.0.5"
437+
version = "1.0.6"
436438
description = """
437439
Independently verifies test-expert results by inspecting produced artifacts against success criteria.
438440
Provide: (1) the test-expert's result (status, query, criteria evaluation), (2) the success criteria from plan.md, (3) path to perstack.toml and workspace.
@@ -452,11 +454,14 @@ You are an independent test verifier. You do NOT trust test-expert's verdict at
452454
### 2. Definition Quality Verification
453455
Read the perstack.toml and check for these quality issues. Any violation is grounds for CONTINUE:
454456
455-
- **Bloated instructions**: instructions containing code snippets, implementation templates, or general programming knowledge that the LLM already knows. Instructions should contain only domain-specific constraints, policies, and quality bars.
457+
- **Missing delegates array (CRITICAL)**: every expert whose instruction references delegating to other experts MUST have a `delegates` array listing those experts. Without it, the runtime cannot register delegates as tools and delegation silently fails. Cross-check: if an instruction mentions delegating to `@scope/name`, then `delegates` must include `"@scope/name"`. This is the single most common and most severe defect.
458+
- **Bloated instructions**: instructions containing code snippets (```), implementation templates, JSON schema examples, TypeScript interfaces, mock patterns, or general programming knowledge that the LLM already knows. Instructions should contain only domain-specific constraints, policies, and quality bars. Scan for fenced code blocks and inline code — their presence almost always indicates bloat.
456459
- **Missing pick**: every @perstack/base skill must have an explicit `pick` list. Omitting pick grants all tools, which is almost never correct.
457460
- **Missing defaultModelTier**: every expert should have a defaultModelTier set.
458461
- **Flat delegation without justification**: if a coordinator has many direct delegates with interdependencies, suggest grouping related delegates under sub-coordinators based on shared concerns (cohesion).
459-
- **Procedural instructions**: instructions that read as step-by-step procedures rather than domain knowledge (rules, constraints, policies).
462+
- **Procedural instructions**: instructions that read as step-by-step procedures rather than domain knowledge (rules, constraints, policies). Numbered implementation checklists are procedures in disguise.
463+
- **File-by-file output specs**: instructions that specify exact output file paths (e.g., "create src/Game.ts, src/tui.ts"). The LLM should decide file structure.
464+
- **Library selection guides**: instructions that list library alternatives with selection criteria (e.g., "prefer ink for X, blessed for Y"). State the requirement, not the implementation.
460465
461466
## Verdicts
462467
@@ -482,7 +487,7 @@ pick = ["readTextFile", "exec", "todo", "attemptCompletion"]
482487

483488
[experts."@create-expert/test-expert"]
484489
defaultModelTier = "low"
485-
version = "1.0.5"
490+
version = "1.0.6"
486491
description = """
487492
Tests a single query against a Perstack expert definition.
488493
Provide: (1) path to perstack.toml, (2) the test query to execute, (3) the success criteria to evaluate against, (4) the coordinator expert name to test.
@@ -546,7 +551,7 @@ pick = [
546551

547552
[experts."@create-expert/find-skill"]
548553
defaultModelTier = "low"
549-
version = "1.0.5"
554+
version = "1.0.6"
550555
description = """
551556
Searches the MCP registry for MCP servers that match a skill requirement.
552557
Provide: the capability needed and suggested search keywords.

0 commit comments

Comments
 (0)