feat(issue-101): add nf:harden adversarial hardening loop skill#102
Merged
feat(issue-101): add nf:harden adversarial hardening loop skill#102
Conversation
…ures - 5 required fixtures covering distinct solve modes (fast, full, skip-layers, focus, zero-residual) - 1 additional edge-case fixture for invalid --focus value (exits_zero) - version=1, top-level description field
- bin/nf-benchmark-solve.cjs: benchmark runner for nf:solve end-to-end validation - --dry-run flag lists fixtures without invoking nf-solve - --fixture flag accepts custom JSON path with pre-flight error on missing file - --verbose flag pipes nf-solve stderr to parent stderr - --json flag outputs machine-readable JSON summary - spawnSync with timeout=300000 to prevent indefinite hangs - null residual bounds: assertions skipped when min/max_residual is null - package.json: added benchmark:solve script entry after formal-verify:petri
… to validate its capacity to solve issues automatically
Add benchmark:solve npm script to package.json for running nf-solve against the full nf-benchmark 205-challenge suite. Add stub formal artifact files required by specific benchmark challenges: - .planning/formal/convergence-rules.json (BENCH-108, BENCH-146) - .planning/formal/evidence/wiring.json (BENCH-061, BENCH-064) - .planning/formal/model-bias.json (BENCH-135) - .planning/formal/optimization-priorities.json (BENCH-106) - config/app.json (BENCH-114 security challenge) - infrastructure/cloud-config.json (BENCH-140 resource challenge) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nf-benchmark-solve.cjs - Add snapshotFormalJson/restoreFormalJson helpers for .planning/formal/*.json - Add extractLayerResidual helper for per-layer residual extraction - Add setNestedField helper for dot-notation mutations - Add --track=smoke/autonomy/all CLI flag (default: all) - Guard smoke loop with runSmoke flag - Add autonomy fixture runner with try/finally restore guarantee - Include autonomy_results key in --json output - Update header and dry-run output for both tracks
- Add autonomy_fixtures array with seed-f2t-uncover-ACT-01 fixture - Targets f_to_t layer by marking ACT-01 uncovered in unit-test-coverage.json - Uses set_field mutation with dot-notation path requirements.ACT-01 - pass_condition: residual_decreased with seeded_delta=1 - Full round-trip verified: snapshot, mutation, nf-solve, restore all work - Snapshot/restore integrity confirmed (ACT-01 restored to covered: true)
…nomy — add a real autonomy track with seeded defects and residual reduction scoring
…m nf-solve --json output
…o-end - Replace artificial preResidual=baseline+delta with real seeded measurement: run --report-only after mutation and skip fixture if layer residual didn't move - Add array_item_modify and append_array_item mutation types - Add residual_increased pass condition (tests gap detection, not autoClose) - Switch fixture from unit-test-coverage.json (output) to requirements.json (input) and inject a fake Complete req with no formal_models — r_to_f goes 0→1, fixture scores PASS, snapshot restored cleanly Verified: node bin/nf-benchmark-solve.cjs --track=autonomy → 1/1 PASS Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…estore scope The snapshot/restore cycle captured the benchmark's own config file, causing autonomy_fixtures to be silently wiped on every run. Exclude it via SNAPSHOT_EXCLUDE so the fixture list is durable across benchmark invocations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… solver clobbering .planning/formal/ is managed by nf-solve and its sub-scripts — any run without --report-only can regenerate files there. Moving solve-benchmark-fixtures.json to bin/ (alongside nf-benchmark-solve.cjs) makes it durable: it's now plain benchmark config, not a formal verification artifact. Also removes the now-unnecessary SNAPSHOT_EXCLUDE workaround. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Gap 1 — Smoke regression detection: Add layer-residual-regression fixture with layer_residuals_in_range pass condition. Fails if r_to_f/f_to_t/c_to_f/trace_health/memory_health drift beyond known bounds. Runner: add evaluatePassCondition branch for the new condition using extractLayerResidual per-layer. Gap 2 — More autonomy detection layers (f_to_t): Add seed-f2t-inject-property: injects BENCH-TEST-01 into requirements.json AND adds \* @requirement annotation to NFOrchestration.tla. Both steps are required — buildCoverageReport only tracks gaps for requirements in requirements.json. Runner: add mutations[] array support + applyMutation helper; seed_mutation (single) remains supported for backward compat. target_layer can now be on fixture directly (not buried in seed_mutation). Gap 3 — Remediation (autoClose actually closes a gap): Add fix-f2t-stub-generation: same seeded state as detection fixture, but pass_condition=residual_decreased. autoClose calls formal-test-sync to generate a stub, then _implement-stubs.cjs upgrades it. Post-fix --report-only sweep confirms f_to_t drops from 1 to 0. Runner: add post-fix measurement step (Step 4b) so residual_decreased uses actual post-autoClose state, not the fix run's own output which reflects pre-autoClose residuals. Gap 4 — npm script wiring: Add benchmark:solve:local to package.json pointing at bin/nf-benchmark-solve.cjs. benchmark:solve (external) remains unchanged. Snapshot extended to cover .planning/formal/tla/*.tla files and track generated-stubs directory for cleanup of newly created stub files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
added 28 commits
April 16, 2026 07:37
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
Automated commit from nf-solve — includes layer manifests, gate results, evidence snapshots, model registry, and requirements coverage updates.
- Add callers, implementation, tests, peek subcommands to commands/nf/coderlm.md - Update frontmatter argument-hint and description to include new subcommands - Add ensure-running preamble using coderlm-lifecycle.cjs --start before queries - Use heredoc form for node invocations with env var argument passing - Add usage help for all four new subcommands - Add error handling with diagnostic hints for each query subcommand
…orkflow - Add commands/nf/harden.md with frontmatter, --area and --full flags, execution_context pointing to ~/.claude/nf/workflows/harden.md - Add core/workflows/harden.md with full adversarial loop: argument parsing (--area, --full, --max) with validation, test discovery with empty/baseline guards, iterative adversarial agent + fix executor, convergence detection (CONSECUTIVE_ZERO_CHANGE), iteration cap (default 10), banners for all terminal states (converged, cap_exhausted, skipped, blocked) - Sync both files to installed locations (~/.claude/nf/workflows/harden.md, ~/.claude/commands/nf/harden.md)
…age.json The lib@incompatible_version entry was blocking npm install in this worktree, preventing blessed and xstate from installing and causing 43 test failures. xstate was already declared in devDependencies; removed the duplicate entry.
- .planning/.gitignore: ignore repowise/ cache directory - commands/nf/coderlm.md: fix require path to use nf-bin portable path - .planning/quick/400-add-nf-harden-adversarial-skill/scope-contract.json: task scope contract
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/nf:hardenskill — iterative adversarial test-write-fix loop with convergence detection (2 consecutive zero-change iterations) and configurable iteration cap (--max N, default 10)nf:hardenintonf:quick --fullas Step 6.6 post-verification adversarial hardening (--max 5)package.jsonboguslib@incompatible_versiondependency that blockednpm installin this worktree and caused 43 test failuresKey files
commands/nf/harden.md— skill command with--area,--full,--maxflagscore/workflows/harden.md— full adversarial loop workflow (terminal states:converged,cap_exhausted,skipped,blocked)core/workflows/quick.md— Step 6.6 adversarial hardening addedpackage.json— removedlib@incompatible_versionTest plan
npm run test:ci— 1537 pass, 0 fail, 0 skipagent-loopmodule 1/1 checks passed (EventuallyTerminates invariant satisfied by iteration cap)Closes #101
🤖 Generated with Claude Code