refactor: add binary instruction content checklist and make rules domain-agnostic (#774)

FL4TLiN3 · claude · web-flow · commit 3cbbf8065bbd · 2026-03-13T16:21:08.000Z
Two changes:

1. Replace subjective self-check ("would the LLM produce worse results
   without this?") with six binary checks that have clear pass/fail
   criteria: no code blocks, no library/tool names, no file paths,
   no procedures, no technique explanations, ≤15 line budget.

2. Make instruction quality rules domain-agnostic by removing coding-
   specific vocabulary (TypeScript, Jest, blessed/ink, file paths) and
   replacing with domain-neutral language that applies equally across
   coding, writing, research, design, and operations.

Also update verify-test Step 3 to use the same binary criteria for
CONTINUE decisions.

Co-authored-by: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/definitions/create-expert/perstack.toml b/definitions/create-expert/perstack.toml
@@ -15,7 +15,7 @@
 
 [experts."create-expert"]
 defaultModelTier = "high"
-version = "1.0.13"
+version = "1.0.15"
 description = "Creates and modifies Perstack expert definitions in perstack.toml"
 instruction = """
 You are the coordinator for creating and modifying Perstack expert definitions. perstack.toml is the single source of truth — your job is to produce or modify it according to the user's request.
@@ -60,7 +60,7 @@ pick = ["readTextFile", "exec", "attemptCompletion"]
 
 [experts."@create-expert/plan"]
 defaultModelTier = "high"
-version = "1.0.13"
+version = "1.0.15"
 description = """
 Analyzes the user's request and produces plan.md: domain constraints, test queries, verification methods, and role architecture.
 Provide: (1) what the expert should do, (2) path to existing perstack.toml if one exists.
@@ -96,7 +96,7 @@ For each test query:
 - What commands to run to verify it works
 
 ### Failure Conditions
-Conditions derived from domain constraints that mean the work must be rejected. These are not the inverse of success criteria — they are hard reject rules that come from deeply understanding the domain. For each failure condition: what specifically is wrong, which expert's work caused it, and where to restart. Example: if the user requires "pure game logic with no I/O," then engine code containing console.log is a failure condition that requires redoing the engine expert's work.
+Conditions derived from domain constraints that mean the work must be rejected. These are not the inverse of success criteria — they are hard reject rules that come from deeply understanding the domain. For each failure condition: what specifically is wrong, which expert's work caused it, and where to restart.
 
 ### Architecture
 Delegation tree with role assignments. Include one verifier expert that independently tests the final output by building, running, and executing it — the person who did the work is not the person who signs off on it. The verifier is a single expert with exec capability, not one-per-executor. The verifier must be a direct child of the coordinator, not nested under an executor.
@@ -126,7 +126,7 @@ pick = [
 
 [experts."@create-expert/build"]
 defaultModelTier = "low"
-version = "1.0.13"
+version = "1.0.15"
 description = """
 Orchestrates the write → test → verify → improve cycle for perstack.toml.
 Provide: path to plan.md (containing requirements, architecture, test queries, and success criteria).
@@ -188,7 +188,7 @@ pick = ["readTextFile", "exec", "todo", "attemptCompletion"]
 
 [experts."@create-expert/write-definition"]
 defaultModelTier = "low"
-version = "1.0.13"
+version = "1.0.15"
 description = """
 Writes or modifies a perstack.toml definition from plan.md requirements and architecture.
 Provide: (1) path to plan.md, (2) optionally path to existing perstack.toml to preserve, (3) optionally feedback from a failed test to address.
@@ -250,20 +250,26 @@ The instruction field is the most impactful part of the definition. Apply these
 - Priority rules for when constraints conflict
 
 ### What does NOT belong in an instruction
-- **Code snippets and implementation templates** — the LLM knows how to write code. Never include inline code blocks, JSON schema examples, TypeScript interfaces, or mock patterns. State the constraint ("output JSON Lines with one object per turn") and let the LLM implement it. A code snippet in an instruction is always a sign that the author didn't trust the LLM enough.
-- **General programming knowledge** — ECS patterns, A* search, collision detection, terminal ANSI codes, Jest configuration, tsconfig settings, package.json structure. These are well within the LLM's training. Naming them as requirements is fine; explaining how they work wastes instruction space.
-- **Step-by-step procedures** — "first do X, then Y, then Z." Define the goal and constraints; the LLM will figure out the steps. Numbered implementation checklists and ordered task lists are procedures in disguise.
-- **File-by-file output specifications** — "create src/engine/ecs.ts, src/engine/state.ts, ..." Let the LLM decide the file structure based on the requirements. Specifying exact file paths constrains the LLM without adding value.
-- **Library selection guides** — "prefer ink for React-like, blessed for widgets, chalk as fallback." The LLM can choose appropriate libraries. State the requirement ("interactive TUI with keyboard input"), not the implementation choice.
-
-### Self-check before writing
-Before finalizing perstack.toml, verify:
-1. **Instruction content**: for every sentence, ask "If I removed this, would the LLM produce a worse result?" If no — the LLM already knows it — remove it.
-2. **Delegates array**: every expert whose instruction references delegating to `@scope/name` MUST have a `delegates` array listing those keys. Without it, delegation silently fails at runtime.
-3. **Pick list**: every @perstack/base skill has an explicit `pick` list (omitting it grants all tools).
-4. **defaultModelTier**: every expert has this set.
-5. **Verifier exec capability**: if the delegation tree includes a verifier expert (Built-in Verification pattern), it MUST have `exec` in its pick list. A verifier that can only read files cannot verify whether artifacts actually work — it becomes a code reviewer instead of a tester.
-6. **Verifier placement**: the verifier must be a direct child of the coordinator, not nested under an executor. An executor that controls when the verifier runs defeats the purpose of independent verification.
+- **Implementation details the LLM already knows** — code snippets, file structure specifications, tool/library recommendations, configuration boilerplate. The LLM has broad training across programming, writing, design, analysis, and other domains. State the constraint or requirement; trust the LLM to choose the implementation. An instruction that explains *how* to do something the LLM already knows is wasted space.
+- **General domain knowledge** — well-known techniques, standard practices, textbook algorithms. Naming them as requirements is fine ("use seedable RNG", "follow APA citation style"); explaining how they work is not.
+- **Step-by-step procedures** — "first do X, then Y, then Z." Define the goal and constraints; the LLM will figure out the steps. Numbered checklists and ordered task lists are procedures in disguise.
+- **Specific output structures** — exact file paths, section templates, schema definitions. Describe what the output must contain and its quality bar, not its exact shape. The LLM will organize the output appropriately for the task.
+
+### Instruction content checklist
+Before finalizing perstack.toml, check every instruction (coordinator excluded from line limit) against these binary rules. If any check fails, fix it before writing.
+1. **No code blocks**: instruction contains no ``` fenced code. Remove any code snippets, shell commands, JSON examples, or inline templates.
+2. **No library/tool names**: instruction names no specific library, framework, or tool. Replace with capability requirement ("terminal UI library" not "blessed or ink", "test framework" not "Jest").
+3. **No file paths**: instruction specifies no file or directory paths. Remove all path references — the LLM decides file structure.
+4. **No procedures**: instruction contains no numbered step sequences or ordered checklists. State the goal and constraints, not the steps.
+5. **No technique explanations**: instruction does not explain well-known techniques. Name them as requirements if needed ("seedable RNG", "immutable state transitions"), never explain how they work.
+6. **Line budget**: non-coordinator instruction is ≤ 15 lines. If over, re-check each line against rules 1-5.
+
+### Structure checklist
+1. **Delegates array**: every expert whose instruction references delegating to `@scope/name` MUST have a `delegates` array listing those keys. Without it, delegation silently fails at runtime.
+2. **Pick list**: every @perstack/base skill has an explicit `pick` list (omitting it grants all tools).
+3. **defaultModelTier**: every expert has this set.
+4. **Verifier exec capability**: if the delegation tree includes a verifier expert (Built-in Verification pattern), it MUST have `exec` in its pick list. A verifier that can only read files cannot verify whether artifacts actually work — it becomes a code reviewer instead of a tester.
+5. **Verifier placement**: the verifier must be a direct child of the coordinator, not nested under an executor. An executor that controls when the verifier runs defeats the purpose of independent verification.
 
 ## Description Rules
 
@@ -306,7 +312,7 @@ pick = [
 
 [experts."@create-expert/verify-test"]
 defaultModelTier = "low"
-version = "1.0.13"
+version = "1.0.15"
 description = """
 Verifies test-expert results by inspecting produced artifacts, executing them, and reviewing the definition against plan.md.
 Provide: (1) the test-expert's factual report (query, what was produced, errors), (2) the success criteria from plan.md, (3) path to plan.md (for semantic review of instructions), (4) path to perstack.toml.
@@ -326,19 +332,15 @@ Read test-expert's result, then independently inspect every artifact it referenc
 
 ## Step 2: Artifact Execution (MANDATORY)
 
-Use exec to verify that produced artifacts actually work. What to run depends on what was produced:
-- Code projects: build (e.g., `bun install && bun run build`), run tests if they exist, run lint if configured
-- Scripts: execute them and verify output
-- Configuration files: validate syntax (e.g., `toml-lint`, `json5 --validate`)
-- If the artifact type has no meaningful execution step, document why and proceed
+Use exec to verify that produced artifacts actually work. What to run depends on what was produced — build it, run it, validate it. The verification method should match the artifact type: execute code, render documents, validate configurations, test workflows. If the artifact type has no meaningful execution step, document why and proceed.
 
 A success criterion is not met if the artifact looks correct on paper but fails to build, run, or pass its own tests.
 
 ## Step 3: Instruction Semantic Review (MANDATORY)
 
 Read plan.md's Domain Knowledge section and the perstack.toml's instruction fields. Verify:
 - Every domain-specific constraint from plan.md is reflected in the instruction. Missing constraints mean the expert will not enforce them at runtime.
-- No instruction contains content the LLM already knows (code snippets, general programming knowledge, step-by-step procedures, library selection guides). These dilute the domain knowledge.
+- No instruction violates content rules: contains code blocks, names specific libraries/tools, specifies file paths, includes numbered procedures, or explains well-known techniques. Non-coordinator instructions should be ≤ 15 lines. Each violation is a CONTINUE reason.
 - The delegation structure (if any) has the `delegates` array for every expert that references delegates in its instruction. Without it, delegation silently fails at runtime.
 - Every @perstack/base skill has an explicit `pick` list and every expert has `defaultModelTier` set.
 - Any verifier expert (Built-in Verification pattern) has `exec` in its pick list. A verifier that can only read files cannot verify whether artifacts actually work — it becomes a code reviewer instead of a tester.
@@ -367,7 +369,7 @@ pick = ["readTextFile", "exec", "todo", "attemptCompletion"]
 
 [experts."@create-expert/test-expert"]
 defaultModelTier = "low"
-version = "1.0.13"
+version = "1.0.15"
 description = """
 Executes a single test query against a Perstack expert definition and reports what happened.
 Provide: (1) path to perstack.toml, (2) the test query to execute, (3) the coordinator expert name to test.