|
14 | 14 | # |
15 | 15 | # 1. Hard Signal Verification |
16 | 16 | # - Verification loop must be driven by hard signals — checks whose |
17 | | -# results do not depend on LLM judgment. Exit codes, test results, |
18 | | -# grep matches, deterministic command output. |
| 17 | +# results do not depend on LLM judgment. Any command that produces |
| 18 | +# deterministic, binary output qualifies. |
19 | 19 | # - Hard signal = Ground truth × Context separation × Determinism. |
20 | 20 | # All three required; missing any one degrades to soft signal (noise). |
21 | 21 | # - Ground truth: verify the final artifact itself, not a proxy. |
@@ -164,7 +164,7 @@ One comprehensive, realistic query that exercises the expert's full capability. |
164 | 164 | ### Verification Signals |
165 | 165 | Hard signals for the test query — verification checks whose results do not depend on LLM judgment: |
166 | 166 | - The exact command to run (deterministic, repeatable) |
167 | | -- The expected result (exit code, output pattern, file existence) |
| 167 | +- The expected result (specific output, presence/absence of content, numeric threshold) |
168 | 168 | - Why this checks ground truth, not a proxy |
169 | 169 |
|
170 | 170 | Include both positive signals (artifact works correctly) and reject signals (domain-specific anti-patterns are absent). Reject signals are not the inverse of positive signals — they detect fundamental failures derived from deeply understanding the domain. |
@@ -394,7 +394,7 @@ All three steps below are MANDATORY. Skipping any step is grounds for an invalid |
394 | 394 |
|
395 | 395 | Run every hard signal check defined in plan.md's Verification Signals: |
396 | 396 | - Execute the exact command specified |
397 | | -- Compare the result against the expected output (exit code, pattern match, file existence) |
| 397 | +- Compare the result against the expected output (specific output, presence/absence of content, numeric threshold) |
398 | 398 | - Record per check: command run, expected result, actual result, PASS/FAIL |
399 | 399 |
|
400 | 400 | If a check has no deterministic expected output, flag it as an invalid signal and CONTINUE — the plan must define a proper hard signal. |
|
0 commit comments