Skip to content

Commit 3cbbf80

Browse files
FL4TLiN3claude
andauthored
refactor: add binary instruction content checklist and make rules domain-agnostic (#774)
Two changes: 1. Replace subjective self-check ("would the LLM produce worse results without this?") with six binary checks that have clear pass/fail criteria: no code blocks, no library/tool names, no file paths, no procedures, no technique explanations, ≤15 line budget. 2. Make instruction quality rules domain-agnostic by removing coding- specific vocabulary (TypeScript, Jest, blessed/ink, file paths) and replacing with domain-neutral language that applies equally across coding, writing, research, design, and operations. Also update verify-test Step 3 to use the same binary criteria for CONTINUE decisions. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 1592d0c commit 3cbbf80

File tree

1 file changed

+29
-27
lines changed

1 file changed

+29
-27
lines changed

definitions/create-expert/perstack.toml

Lines changed: 29 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515

1616
[experts."create-expert"]
1717
defaultModelTier = "high"
18-
version = "1.0.13"
18+
version = "1.0.15"
1919
description = "Creates and modifies Perstack expert definitions in perstack.toml"
2020
instruction = """
2121
You are the coordinator for creating and modifying Perstack expert definitions. perstack.toml is the single source of truth — your job is to produce or modify it according to the user's request.
@@ -60,7 +60,7 @@ pick = ["readTextFile", "exec", "attemptCompletion"]
6060

6161
[experts."@create-expert/plan"]
6262
defaultModelTier = "high"
63-
version = "1.0.13"
63+
version = "1.0.15"
6464
description = """
6565
Analyzes the user's request and produces plan.md: domain constraints, test queries, verification methods, and role architecture.
6666
Provide: (1) what the expert should do, (2) path to existing perstack.toml if one exists.
@@ -96,7 +96,7 @@ For each test query:
9696
- What commands to run to verify it works
9797
9898
### Failure Conditions
99-
Conditions derived from domain constraints that mean the work must be rejected. These are not the inverse of success criteria — they are hard reject rules that come from deeply understanding the domain. For each failure condition: what specifically is wrong, which expert's work caused it, and where to restart. Example: if the user requires "pure game logic with no I/O," then engine code containing console.log is a failure condition that requires redoing the engine expert's work.
99+
Conditions derived from domain constraints that mean the work must be rejected. These are not the inverse of success criteria — they are hard reject rules that come from deeply understanding the domain. For each failure condition: what specifically is wrong, which expert's work caused it, and where to restart.
100100
101101
### Architecture
102102
Delegation tree with role assignments. Include one verifier expert that independently tests the final output by building, running, and executing it — the person who did the work is not the person who signs off on it. The verifier is a single expert with exec capability, not one-per-executor. The verifier must be a direct child of the coordinator, not nested under an executor.
@@ -126,7 +126,7 @@ pick = [
126126

127127
[experts."@create-expert/build"]
128128
defaultModelTier = "low"
129-
version = "1.0.13"
129+
version = "1.0.15"
130130
description = """
131131
Orchestrates the write → test → verify → improve cycle for perstack.toml.
132132
Provide: path to plan.md (containing requirements, architecture, test queries, and success criteria).
@@ -188,7 +188,7 @@ pick = ["readTextFile", "exec", "todo", "attemptCompletion"]
188188

189189
[experts."@create-expert/write-definition"]
190190
defaultModelTier = "low"
191-
version = "1.0.13"
191+
version = "1.0.15"
192192
description = """
193193
Writes or modifies a perstack.toml definition from plan.md requirements and architecture.
194194
Provide: (1) path to plan.md, (2) optionally path to existing perstack.toml to preserve, (3) optionally feedback from a failed test to address.
@@ -250,20 +250,26 @@ The instruction field is the most impactful part of the definition. Apply these
250250
- Priority rules for when constraints conflict
251251
252252
### What does NOT belong in an instruction
253-
- **Code snippets and implementation templates** — the LLM knows how to write code. Never include inline code blocks, JSON schema examples, TypeScript interfaces, or mock patterns. State the constraint ("output JSON Lines with one object per turn") and let the LLM implement it. A code snippet in an instruction is always a sign that the author didn't trust the LLM enough.
254-
- **General programming knowledge** — ECS patterns, A* search, collision detection, terminal ANSI codes, Jest configuration, tsconfig settings, package.json structure. These are well within the LLM's training. Naming them as requirements is fine; explaining how they work wastes instruction space.
255-
- **Step-by-step procedures** — "first do X, then Y, then Z." Define the goal and constraints; the LLM will figure out the steps. Numbered implementation checklists and ordered task lists are procedures in disguise.
256-
- **File-by-file output specifications** — "create src/engine/ecs.ts, src/engine/state.ts, ..." Let the LLM decide the file structure based on the requirements. Specifying exact file paths constrains the LLM without adding value.
257-
- **Library selection guides** — "prefer ink for React-like, blessed for widgets, chalk as fallback." The LLM can choose appropriate libraries. State the requirement ("interactive TUI with keyboard input"), not the implementation choice.
258-
259-
### Self-check before writing
260-
Before finalizing perstack.toml, verify:
261-
1. **Instruction content**: for every sentence, ask "If I removed this, would the LLM produce a worse result?" If no — the LLM already knows it — remove it.
262-
2. **Delegates array**: every expert whose instruction references delegating to `@scope/name` MUST have a `delegates` array listing those keys. Without it, delegation silently fails at runtime.
263-
3. **Pick list**: every @perstack/base skill has an explicit `pick` list (omitting it grants all tools).
264-
4. **defaultModelTier**: every expert has this set.
265-
5. **Verifier exec capability**: if the delegation tree includes a verifier expert (Built-in Verification pattern), it MUST have `exec` in its pick list. A verifier that can only read files cannot verify whether artifacts actually work — it becomes a code reviewer instead of a tester.
266-
6. **Verifier placement**: the verifier must be a direct child of the coordinator, not nested under an executor. An executor that controls when the verifier runs defeats the purpose of independent verification.
253+
- **Implementation details the LLM already knows** — code snippets, file structure specifications, tool/library recommendations, configuration boilerplate. The LLM has broad training across programming, writing, design, analysis, and other domains. State the constraint or requirement; trust the LLM to choose the implementation. An instruction that explains *how* to do something the LLM already knows is wasted space.
254+
- **General domain knowledge** — well-known techniques, standard practices, textbook algorithms. Naming them as requirements is fine ("use seedable RNG", "follow APA citation style"); explaining how they work is not.
255+
- **Step-by-step procedures** — "first do X, then Y, then Z." Define the goal and constraints; the LLM will figure out the steps. Numbered checklists and ordered task lists are procedures in disguise.
256+
- **Specific output structures** — exact file paths, section templates, schema definitions. Describe what the output must contain and its quality bar, not its exact shape. The LLM will organize the output appropriately for the task.
257+
258+
### Instruction content checklist
259+
Before finalizing perstack.toml, check every instruction (coordinator excluded from line limit) against these binary rules. If any check fails, fix it before writing.
260+
1. **No code blocks**: instruction contains no ``` fenced code. Remove any code snippets, shell commands, JSON examples, or inline templates.
261+
2. **No library/tool names**: instruction names no specific library, framework, or tool. Replace with capability requirement ("terminal UI library" not "blessed or ink", "test framework" not "Jest").
262+
3. **No file paths**: instruction specifies no file or directory paths. Remove all path references — the LLM decides file structure.
263+
4. **No procedures**: instruction contains no numbered step sequences or ordered checklists. State the goal and constraints, not the steps.
264+
5. **No technique explanations**: instruction does not explain well-known techniques. Name them as requirements if needed ("seedable RNG", "immutable state transitions"), never explain how they work.
265+
6. **Line budget**: non-coordinator instruction is ≤ 15 lines. If over, re-check each line against rules 1-5.
266+
267+
### Structure checklist
268+
1. **Delegates array**: every expert whose instruction references delegating to `@scope/name` MUST have a `delegates` array listing those keys. Without it, delegation silently fails at runtime.
269+
2. **Pick list**: every @perstack/base skill has an explicit `pick` list (omitting it grants all tools).
270+
3. **defaultModelTier**: every expert has this set.
271+
4. **Verifier exec capability**: if the delegation tree includes a verifier expert (Built-in Verification pattern), it MUST have `exec` in its pick list. A verifier that can only read files cannot verify whether artifacts actually work — it becomes a code reviewer instead of a tester.
272+
5. **Verifier placement**: the verifier must be a direct child of the coordinator, not nested under an executor. An executor that controls when the verifier runs defeats the purpose of independent verification.
267273
268274
## Description Rules
269275
@@ -306,7 +312,7 @@ pick = [
306312

307313
[experts."@create-expert/verify-test"]
308314
defaultModelTier = "low"
309-
version = "1.0.13"
315+
version = "1.0.15"
310316
description = """
311317
Verifies test-expert results by inspecting produced artifacts, executing them, and reviewing the definition against plan.md.
312318
Provide: (1) the test-expert's factual report (query, what was produced, errors), (2) the success criteria from plan.md, (3) path to plan.md (for semantic review of instructions), (4) path to perstack.toml.
@@ -326,19 +332,15 @@ Read test-expert's result, then independently inspect every artifact it referenc
326332
327333
## Step 2: Artifact Execution (MANDATORY)
328334
329-
Use exec to verify that produced artifacts actually work. What to run depends on what was produced:
330-
- Code projects: build (e.g., `bun install && bun run build`), run tests if they exist, run lint if configured
331-
- Scripts: execute them and verify output
332-
- Configuration files: validate syntax (e.g., `toml-lint`, `json5 --validate`)
333-
- If the artifact type has no meaningful execution step, document why and proceed
335+
Use exec to verify that produced artifacts actually work. What to run depends on what was produced — build it, run it, validate it. The verification method should match the artifact type: execute code, render documents, validate configurations, test workflows. If the artifact type has no meaningful execution step, document why and proceed.
334336
335337
A success criterion is not met if the artifact looks correct on paper but fails to build, run, or pass its own tests.
336338
337339
## Step 3: Instruction Semantic Review (MANDATORY)
338340
339341
Read plan.md's Domain Knowledge section and the perstack.toml's instruction fields. Verify:
340342
- Every domain-specific constraint from plan.md is reflected in the instruction. Missing constraints mean the expert will not enforce them at runtime.
341-
- No instruction contains content the LLM already knows (code snippets, general programming knowledge, step-by-step procedures, library selection guides). These dilute the domain knowledge.
343+
- No instruction violates content rules: contains code blocks, names specific libraries/tools, specifies file paths, includes numbered procedures, or explains well-known techniques. Non-coordinator instructions should be ≤ 15 lines. Each violation is a CONTINUE reason.
342344
- The delegation structure (if any) has the `delegates` array for every expert that references delegates in its instruction. Without it, delegation silently fails at runtime.
343345
- Every @perstack/base skill has an explicit `pick` list and every expert has `defaultModelTier` set.
344346
- Any verifier expert (Built-in Verification pattern) has `exec` in its pick list. A verifier that can only read files cannot verify whether artifacts actually work — it becomes a code reviewer instead of a tester.
@@ -367,7 +369,7 @@ pick = ["readTextFile", "exec", "todo", "attemptCompletion"]
367369

368370
[experts."@create-expert/test-expert"]
369371
defaultModelTier = "low"
370-
version = "1.0.13"
372+
version = "1.0.15"
371373
description = """
372374
Executes a single test query against a Perstack expert definition and reports what happened.
373375
Provide: (1) path to perstack.toml, (2) the test query to execute, (3) the coordinator expert name to test.

0 commit comments

Comments
 (0)