Skip to content

Commit 3b4283a

Browse files
FL4TLiN3claude
andauthored
feat: add must/should priority to verification signals (#790)
All signals were treated equally — any failure caused CONTINUE and looping. But "app runs" and "all tests pass" have different user impact. Add priority: - must: failure blocks completion (user cannot use the artifact) - should: failure reported as known limitation (artifact is usable) Changes: - Design Principle 5: signal priority (must/should) - Plan: Verification Signals require must/should per signal - Plan self-check: new item 4 — every signal has a priority - verify-test: only must failures cause CONTINUE - Bump to 1.0.20 Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent c19134a commit 3b4283a

File tree

1 file changed

+24
-20
lines changed

1 file changed

+24
-20
lines changed

definitions/create-expert/perstack.toml

Lines changed: 24 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -57,12 +57,13 @@
5757
# - Without this boundary, plan bloat leaks directly into instructions.
5858
#
5959
# 5. Verification Signal Design
60-
# - Success checks and reject rules are both expressed as hard signals:
61-
# a command with a deterministic expected result.
60+
# - Each signal is classified as must (blocks completion) or should
61+
# (reported but does not block). Must signals protect core usability;
62+
# should signals cover polish and secondary quality.
6263
# - Reject signals are not the inverse of success signals — they detect
6364
# domain-specific anti-patterns that indicate fundamental failure.
64-
# - Each signal specifies: what to run, what to expect, and where to
65-
# restart if it fails.
65+
# - Each signal specifies: what to run, what to expect, and priority
66+
# (must/should).
6667
#
6768
# 6. Instruction Content = Domain Constraints Only
6869
# - An instruction should contain ONLY what the LLM cannot derive on
@@ -88,7 +89,7 @@
8889

8990
[experts."create-expert"]
9091
defaultModelTier = "high"
91-
version = "1.0.19"
92+
version = "1.0.20"
9293
description = "Creates and modifies Perstack expert definitions in perstack.toml"
9394
instruction = """
9495
You are the coordinator for creating and modifying Perstack expert definitions. perstack.toml is the single source of truth — your job is to produce or modify it according to the user's request.
@@ -133,7 +134,7 @@ pick = ["readTextFile", "exec", "attemptCompletion"]
133134

134135
[experts."@create-expert/plan"]
135136
defaultModelTier = "high"
136-
version = "1.0.19"
137+
version = "1.0.20"
137138
description = """
138139
Analyzes the user's request and produces plan.md: domain constraints, test query, verification signals, and role architecture.
139140
Provide: (1) what the expert should do, (2) path to existing perstack.toml if one exists.
@@ -164,10 +165,12 @@ Constraints and rules unique to this expert, extracted from the user's request.
164165
One comprehensive, realistic query that exercises the expert's full capability. Design the query so that its verification signals can cover all domain constraints from the Domain Knowledge section. Coverage comes from signal design depth, not from running multiple queries.
165166
166167
### Verification Signals
167-
Hard signals for the test query — verification checks whose results do not depend on LLM judgment:
168+
Hard signals for the test query — verification checks whose results do not depend on LLM judgment. Each signal specifies:
168169
- The exact command to run (deterministic, repeatable)
169170
- The expected result (specific output, presence/absence of content, numeric threshold)
170-
- Why this checks ground truth, not a proxy
171+
- Priority: **must** (failure blocks completion — the user cannot use the artifact) or **should** (failure is reported but does not block — the artifact is usable with known limitations)
172+
173+
Must signals protect core usability — can the user run the artifact and get the primary value? Should signals cover polish, testing, and secondary quality.
171174
172175
Include both positive signals (artifact works correctly) and reject signals (domain-specific anti-patterns are absent). Reject signals are not the inverse of positive signals — they detect fundamental failures derived from deeply understanding the domain.
173176
@@ -193,9 +196,10 @@ Re-read plan.md and verify each rule. If any check fails, fix plan.md before att
193196
1. **Section names exact match**: plan.md uses exactly these section names and no others — "Expert Purpose", "Domain Knowledge", "Use Cases", "Test Query", "Verification Signals", "Architecture". Extra sections confuse downstream experts.
194197
2. **Single test query**: "Test Query" section contains exactly one query, not multiple.
195198
3. **Every signal is a command**: each entry in "Verification Signals" specifies a concrete command to execute and its expected result. Entries that describe what to observe or what correct output "looks like" without a command are not signals — rewrite them.
196-
4. **No soft language in signals**: signals contain no phrases like "verify that", "check that", "should be", "looks correct", "works properly". Each signal is: run X → expect Y.
197-
5. **Domain constraint coverage**: every constraint in "Domain Knowledge" is exercised by at least one signal. List which signal covers which constraint.
198-
6. **Architecture is names only**: "Architecture" section contains expert name, one-line purpose, and role (executor/verifier) per expert. No deliverables, no constraints, no implementation details.
199+
4. **Every signal has a priority**: each signal is marked as **must** (blocks completion) or **should** (reported, does not block). At least one must signal exists. Must signals protect core usability — can the user run the artifact and get the primary value?
200+
5. **No soft language in signals**: signals contain no phrases like "verify that", "check that", "should be", "looks correct", "works properly". Each signal is: run X → expect Y.
201+
6. **Domain constraint coverage**: every constraint in "Domain Knowledge" is exercised by at least one signal. List which signal covers which constraint.
202+
7. **Architecture is names only**: "Architecture" section contains expert name, one-line purpose, and role (executor/verifier) per expert. No deliverables, no constraints, no implementation details.
199203
200204
After writing plan.md, attemptCompletion with the file path.
201205
"""
@@ -220,7 +224,7 @@ pick = [
220224

221225
[experts."@create-expert/build"]
222226
defaultModelTier = "low"
223-
version = "1.0.19"
227+
version = "1.0.20"
224228
description = """
225229
Orchestrates the write → review → test → verify cycle for perstack.toml.
226230
Provide: path to plan.md (containing requirements, architecture, test query, and verification signals).
@@ -281,7 +285,7 @@ pick = ["readTextFile", "exec", "todo", "attemptCompletion"]
281285

282286
[experts."@create-expert/write-definition"]
283287
defaultModelTier = "low"
284-
version = "1.0.19"
288+
version = "1.0.20"
285289
description = """
286290
Writes or modifies a perstack.toml definition from plan.md requirements and architecture.
287291
Provide: (1) path to plan.md, (2) optionally path to existing perstack.toml to preserve, (3) optionally feedback from a failed test to address.
@@ -384,7 +388,7 @@ pick = [
384388

385389
[experts."@create-expert/review-definition"]
386390
defaultModelTier = "low"
387-
version = "1.0.19"
391+
version = "1.0.20"
388392
description = """
389393
Reviews perstack.toml against plan.md for domain knowledge alignment and instruction quality.
390394
Provide: (1) path to plan.md, (2) path to perstack.toml.
@@ -433,7 +437,7 @@ pick = ["readTextFile", "todo", "attemptCompletion"]
433437

434438
[experts."@create-expert/verify-test"]
435439
defaultModelTier = "low"
436-
version = "1.0.19"
440+
version = "1.0.20"
437441
description = """
438442
Executes hard signal checks against test-expert's results, verifies their reproducibility, and checks the definition structure.
439443
Provide: (1) the test-expert's factual report (query, what was produced, errors), (2) the verification signals from plan.md, (3) path to perstack.toml.
@@ -477,12 +481,12 @@ Report each as PASS/FAIL with the command output as evidence.
477481
478482
## Verdicts
479483
480-
- **PASS** — all signals pass in Step 1, all signals reproduce in Step 2, all structural checks pass in Step 3.
481-
- **CONTINUE** — any signal failed, any signal did not reproduce, or any structural check failed. Include: which check failed, expected vs actual, specific fix needed.
484+
- **PASS** — all must signals pass and reproduce. Should signal results are reported but do not affect the verdict.
485+
- **CONTINUE** — any must signal failed, any must signal did not reproduce, or any structural check failed. Include: which check failed, expected vs actual, specific fix needed.
482486
483-
Default to CONTINUE when any check lacks a clear PASS.
487+
Should signal failures are included in the report as known limitations but never cause CONTINUE.
484488
485-
attemptCompletion with: verdict, per-signal results from Step 1, reproducibility results from Step 2, structural check results from Step 3, and (if CONTINUE) specific fix feedback.
489+
attemptCompletion with: verdict, per-signal results (with must/should labels) from Step 1, reproducibility results from Step 2, structural check results from Step 3, should-signal failures as known limitations, and (if CONTINUE) specific fix feedback for must failures only.
486490
"""
487491

488492
[experts."@create-expert/verify-test".skills."@perstack/base"]
@@ -498,7 +502,7 @@ pick = ["readTextFile", "exec", "todo", "attemptCompletion"]
498502

499503
[experts."@create-expert/test-expert"]
500504
defaultModelTier = "low"
501-
version = "1.0.19"
505+
version = "1.0.20"
502506
description = """
503507
Executes a single test query against a Perstack expert definition and reports what happened.
504508
Provide: (1) path to perstack.toml, (2) the test query to execute, (3) the coordinator expert name to test.

0 commit comments

Comments
 (0)