-
Notifications
You must be signed in to change notification settings - Fork 0
feat: ratchet mode — measure-before-after for autonomous improvements #16
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Add a ratchet mode to the taskrunner that measures outcomes before and after autonomous code changes, automatically reverting changes that don't measurably improve results.
Why
Current flow: self-improvement files issue → taskrunner creates PR → PR gets merged (or not). No verification that the change actually improved anything. We've shipped "improvements" that introduced regressions (synthesis noise, conversation-facts errors).
The ratchet pattern: improve → re-run → measure → keep only if metrics improve, else revert.
How it works
- Baseline capture — before task runs, snapshot relevant metrics (test pass rate, typecheck, error count from task_runs table)
- Task execution — normal taskrunner flow, creates branch + PR
- Validation run — after task completes, re-run tests/typecheck on the branch
- Comparison — compare post-task metrics against baseline
- Decision — if metrics improved or held steady: keep. If regressed: auto-close PR with explanation
Scope
- Opt-in per task via
ratchet: truein task config or category-level default - Best fit for
refactorandbugfixcategories docsandtestscategories skip ratchet (no regression risk)- Metrics to compare: typecheck pass/fail, test suite results, error counts
References
- recursive-improve
/ratchetcommand (Apache 2.0) — overnight improvement loop with keep/revert - Existing:
--loopmode, adversarial governance, PR utility scoring
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request