Skip to content

feat: ratchet mode — measure-before-after for autonomous improvements #16

@stackbilt-admin

Description

@stackbilt-admin

Summary

Add a ratchet mode to the taskrunner that measures outcomes before and after autonomous code changes, automatically reverting changes that don't measurably improve results.

Why

Current flow: self-improvement files issue → taskrunner creates PR → PR gets merged (or not). No verification that the change actually improved anything. We've shipped "improvements" that introduced regressions (synthesis noise, conversation-facts errors).

The ratchet pattern: improve → re-run → measure → keep only if metrics improve, else revert.

How it works

  1. Baseline capture — before task runs, snapshot relevant metrics (test pass rate, typecheck, error count from task_runs table)
  2. Task execution — normal taskrunner flow, creates branch + PR
  3. Validation run — after task completes, re-run tests/typecheck on the branch
  4. Comparison — compare post-task metrics against baseline
  5. Decision — if metrics improved or held steady: keep. If regressed: auto-close PR with explanation

Scope

  • Opt-in per task via ratchet: true in task config or category-level default
  • Best fit for refactor and bugfix categories
  • docs and tests categories skip ratchet (no regression risk)
  • Metrics to compare: typecheck pass/fail, test suite results, error counts

References

  • recursive-improve /ratchet command (Apache 2.0) — overnight improvement loop with keep/revert
  • Existing: --loop mode, adversarial governance, PR utility scoring

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions