You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor: strengthen create-expert delegation and instruction quality guardrails (#766)
Generated expert definitions were missing `delegates` arrays (making delegation
silently fail) and contained code snippets, procedural instructions, and
file-by-file output specs that bloat instructions without adding value.
Changes across write-definition, design-roles, and verify-test:
- Make `delegates` array explicitly REQUIRED for coordinators in schema, rules, and checks
- Add verify-test checks for missing delegates, code snippets, file specs, library guides
- Strengthen instruction quality rules with concrete anti-pattern examples
- Bump all expert versions to 1.0.6
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: definitions/create-expert/perstack.toml
+23-18Lines changed: 23 additions & 18 deletions
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@
17
17
18
18
[experts."create-expert"]
19
19
defaultModelTier = "high"
20
-
version = "1.0.5"
20
+
version = "1.0.6"
21
21
description = "Creates and modifies Perstack expert definitions in perstack.toml"
22
22
instruction = """
23
23
You are the coordinator for creating and modifying Perstack expert definitions. perstack.toml is the single source of truth — your job is to produce or modify it according to the user's request.
Analyzes the user's request and defines the expert's product requirements.
71
71
Provide: (1) what the expert should do, (2) path to existing perstack.toml if one exists.
@@ -159,7 +159,7 @@ pick = [
159
159
160
160
[experts."@create-expert/design-roles"]
161
161
defaultModelTier = "high"
162
-
version = "1.0.5"
162
+
version = "1.0.6"
163
163
description = """
164
164
Designs the technical architecture for a Perstack expert from a requirements plan.
165
165
Provide: path to plan.md.
@@ -173,7 +173,7 @@ You are a technical architect for Perstack experts. You take a product requireme
173
173
- **description** = public interface. Seen by delegating experts as a tool description. Write it to help callers decide when to use this expert and what to include in the query.
174
174
- **instruction** = private domain knowledge. Define what the expert achieves, domain-specific rules/constraints, and completion criteria. NOT step-by-step procedures.
- **delegates** = other experts this one can call. Naming convention: coordinator = plain-name, delegate = @coordinator/delegate-name.
176
+
- **delegates** = REQUIRED array for any expert that delegates. Without this array, the runtime cannot register delegates as callable tools — delegation silently fails. Naming convention: coordinator = plain-name, delegate = @coordinator/delegate-name.
177
177
- **Context isolation**: delegates receive only the query, no parent context. Data exchange happens via workspace files.
178
178
- **Parallel delegation**: multiple delegate calls in one response execute concurrently.
179
179
@@ -234,7 +234,7 @@ For each expert:
234
234
- Role summary
235
235
- Skills needed: specific @perstack/base tools as a pick list (e.g., `pick = ["readTextFile", "exec", "attemptCompletion"]`). Only include tools the expert actually needs.
236
236
- defaultModelTier: "low" for mechanical/routine tasks (file writing, validation, formatting), "middle" for moderate reasoning, "high" for complex judgment (planning, architecture, nuanced evaluation). Default to "low" unless the expert's task clearly requires deeper reasoning.
237
-
- Delegates (if coordinator)
237
+
- delegates array (REQUIRED for any expert that delegates — list all delegate keys explicitly)
238
238
239
239
### MCP Skills
240
240
For each MCP skill found by find-skill:
@@ -266,7 +266,7 @@ pick = [
266
266
267
267
[experts."@create-expert/build"]
268
268
defaultModelTier = "low"
269
-
version = "1.0.5"
269
+
version = "1.0.6"
270
270
description = """
271
271
Orchestrates the write → test → verify → improve cycle for perstack.toml.
272
272
Provide: path to plan.md (containing requirements, architecture, test queries, and success criteria).
- **File structure**: start every perstack.toml with a TOML comment block showing the delegation tree as an ASCII diagram, followed by expert definitions in tree order (coordinator first, then depth-first through delegates). This file header serves as a map for anyone reading the definition.
- **Delegates (CRITICAL)**: every expert that delegates to others MUST have a `delegates` array listing all delegate keys. Without this array, the runtime cannot register delegates as callable tools and delegation will silently fail. Leaf experts (no delegates) omit this field entirely.
370
371
- **Skills**: minimal set. Always include attemptCompletion. Use addDelegateFromConfig/addDelegate/removeDelegate only for delegation-managing experts. Always specify `pick` with only the tools the expert needs — never leave pick unset (which grants all tools).
371
372
- **defaultModelTier**: always set per expert. Use the tier specified in plan.md's architecture section.
372
373
- **TOML**: triple-quoted strings for multi-line instructions. Every expert needs version, description, instruction. `"@perstack/base"` is the exact required key — never `"base"` or aliases.
@@ -383,13 +384,14 @@ The instruction field is the most impactful part of the definition. Apply these
383
384
- Priority rules for when constraints conflict
384
385
385
386
### What does NOT belong in an instruction
386
-
- **Code snippets and implementation templates** — the LLM knows how to write code. Showing it a mulberry32 PRNG implementation or a blessed screen setup teaches it nothing. State the constraint ("use seeded RNG for deterministic tests") and let the LLM implement it.
387
-
- **General programming knowledge** — ECS patterns, A* search, collision detection algorithms, TypeScript configuration. These are well within the LLM's training. Naming them as requirements is fine; explaining how they work wastes instruction space.
388
-
- **Step-by-step procedures** — "first do X, then Y, then Z." Define the goal and constraints; the LLM will figure out the steps.
389
-
- **File-by-file output specifications** — "create src/engine/ecs.ts, src/engine/state.ts, ..." Let the LLM decide the file structure based on the requirements.
387
+
- **Code snippets and implementation templates** — the LLM knows how to write code. Never include inline code blocks, JSON schema examples, TypeScript interfaces, or mock patterns. State the constraint ("output JSON Lines with one object per turn") and let the LLM implement it. A code snippet in an instruction is always a sign that the author didn't trust the LLM enough.
388
+
- **General programming knowledge** — ECS patterns, A* search, collision detection, terminal ANSI codes, Jest configuration, tsconfig settings, package.json structure. These are well within the LLM's training. Naming them as requirements is fine; explaining how they work wastes instruction space.
389
+
- **Step-by-step procedures** — "first do X, then Y, then Z." Define the goal and constraints; the LLM will figure out the steps. Numbered implementation checklists and ordered task lists are procedures in disguise.
390
+
- **File-by-file output specifications** — "create src/engine/ecs.ts, src/engine/state.ts, ..." Let the LLM decide the file structure based on the requirements. Specifying exact file paths constrains the LLM without adding value.
391
+
- **Library selection guides** — "prefer ink for React-like, blessed for widgets, chalk as fallback." The LLM can choose appropriate libraries. State the requirement ("interactive TUI with keyboard input"), not the implementation choice.
390
392
391
393
### Self-check
392
-
Before writing each instruction, ask: "If I removed this sentence, would the LLM produce a worse result?" If the answer is no — because the LLM already knows this — remove it.
394
+
Before writing each instruction, ask: "If I removed this sentence, would the LLM produce a worse result?" If the answer is no — because the LLM already knows this — remove it. Apply this test to every paragraph, every bullet point, and every sub-heading.
393
395
394
396
## Description Rules
395
397
@@ -432,7 +434,7 @@ pick = [
432
434
433
435
[experts."@create-expert/verify-test"]
434
436
defaultModelTier = "low"
435
-
version = "1.0.5"
437
+
version = "1.0.6"
436
438
description = """
437
439
Independently verifies test-expert results by inspecting produced artifacts against success criteria.
438
440
Provide: (1) the test-expert's result (status, query, criteria evaluation), (2) the success criteria from plan.md, (3) path to perstack.toml and workspace.
@@ -452,11 +454,14 @@ You are an independent test verifier. You do NOT trust test-expert's verdict at
452
454
### 2. Definition Quality Verification
453
455
Read the perstack.toml and check for these quality issues. Any violation is grounds for CONTINUE:
454
456
455
-
- **Bloated instructions**: instructions containing code snippets, implementation templates, or general programming knowledge that the LLM already knows. Instructions should contain only domain-specific constraints, policies, and quality bars.
457
+
- **Missing delegates array (CRITICAL)**: every expert whose instruction references delegating to other experts MUST have a `delegates` array listing those experts. Without it, the runtime cannot register delegates as tools and delegation silently fails. Cross-check: if an instruction mentions delegating to `@scope/name`, then `delegates` must include `"@scope/name"`. This is the single most common and most severe defect.
458
+
- **Bloated instructions**: instructions containing code snippets (```), implementation templates, JSON schema examples, TypeScript interfaces, mock patterns, or general programming knowledge that the LLM already knows. Instructions should contain only domain-specific constraints, policies, and quality bars. Scan for fenced code blocks and inline code — their presence almost always indicates bloat.
456
459
- **Missing pick**: every @perstack/base skill must have an explicit `pick` list. Omitting pick grants all tools, which is almost never correct.
457
460
- **Missing defaultModelTier**: every expert should have a defaultModelTier set.
458
461
- **Flat delegation without justification**: if a coordinator has many direct delegates with interdependencies, suggest grouping related delegates under sub-coordinators based on shared concerns (cohesion).
459
-
- **Procedural instructions**: instructions that read as step-by-step procedures rather than domain knowledge (rules, constraints, policies).
462
+
- **Procedural instructions**: instructions that read as step-by-step procedures rather than domain knowledge (rules, constraints, policies). Numbered implementation checklists are procedures in disguise.
463
+
- **File-by-file output specs**: instructions that specify exact output file paths (e.g., "create src/Game.ts, src/tui.ts"). The LLM should decide file structure.
464
+
- **Library selection guides**: instructions that list library alternatives with selection criteria (e.g., "prefer ink for X, blessed for Y"). State the requirement, not the implementation.
Tests a single query against a Perstack expert definition.
488
493
Provide: (1) path to perstack.toml, (2) the test query to execute, (3) the success criteria to evaluate against, (4) the coordinator expert name to test.
@@ -546,7 +551,7 @@ pick = [
546
551
547
552
[experts."@create-expert/find-skill"]
548
553
defaultModelTier = "low"
549
-
version = "1.0.5"
554
+
version = "1.0.6"
550
555
description = """
551
556
Searches the MCP registry for MCP servers that match a skill requirement.
552
557
Provide: the capability needed and suggested search keywords.
0 commit comments