This is a public GitHub repository. The implementation is intentionally conservative.
- No secrets, API keys, tokens, or private endpoints are included.
.env, virtual environments, caches, editor junk, large result blobs, and local logs are ignored.- Output directories are explicit and the CLI refuses to overwrite a populated run directory.
- The default backend assumption is local Ollama at
http://localhost:11434/apiwith no authentication. - The only built-in tool is a deterministic local calculator; it is offline, explicit, and budgeted.
- The harness does not auto-push, auto-sync, or call remote services.
- Configs are loaded from plain YAML with explicit schema expectations.
- The repo does not use unsafe
eval. - The repo does not use pickle loading from untrusted files.
- The Ollama adapter only targets a local base URL configured by the user; there is no cloud fallback.
- Official GSM8K, MBPP, and HumanEval assets are not redistributed here.
- The benchmark-style samples under
data/samples/are hand-authored miniature fixtures for plumbing tests only. - Real benchmark runs should document dataset provenance and licensing externally.
- The current live run artifacts under
results/runs/20260310_170449_*only contain hand-authored synthetic prompts, machine outputs, and structured logs. - Run artifacts are machine-readable and local-only:
runs.jsonl,events.jsonl,summary.json,task_diffs.jsonl,task_diffs.csv, andsummary.png.
The dependency set is intentionally small:
PyYAMLfor configsmatplotlibfor simple figurespytestfor development tests
No external orchestration service, vector database, or model provider SDK is required for the included pilots.