Task Bundle started as a small CLI MVP. This roadmap turns it into a practical foundation for replayable and comparable AI coding tasks.
- Done:
pack --configwith starter config support - Done: schema validation for bundle metadata, workspace manifests, and event logs
- Done: automatic git metadata detection during packing
- Done:
comparecommand - Done: richer
compareoutput with artifact hash differences and score deltas - Done:
archiveandextractcommands for.tar.gzbundles - Done:
validateandscancommands for replay checks and bundle collections - Done: artifact hashes and sizes in
bundle.json - Done: benchmark-style outcome fields in bundle metadata
- Done: benchmark report generation with ranking, leaderboard, and Markdown export
- Done: CLI smoke tests and GitHub Actions CI
- Done: Chinese and English documentation
- Done: self-contained HTML benchmark report export
- Done: SVG benchmark badges for README and Pages embeds
- Planned: machine-readable benchmark result fields and scoring conventions
- Planned: bundle collections and directory scans for multi-run comparisons
- Planned: more curated example bundles for benchmark-style demos
- Planned: replay contract tooling that validates whether a bundle is runnable
- Planned: batch runner support for executing many bundles across tools
- Planned: session viewer or benchmark playground on top of the bundle format
- Stabilize the format around
0.2.xand avoid unnecessary schema churn. - Add more examples that show the same task solved by different tools.
- Add deeper comparison features before expanding into UI.