forked from karpathy/autoresearch
-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Dave Graham edited this page Mar 18, 2026
·
12 revisions
Apple Silicon port of karpathy/autoresearch with autonomous LLM-driven experiment optimization across multiple training datasets.
- TUI Dashboard — Real-time terminal dashboard for monitoring training runs
- Multi-Dataset Suite — Framework for running experiments across different training datasets
Autonomous agent runs on the same hardware (Apple M5 Max, 64 GB) reveal that optimal model configuration is dataset-dependent. The agent independently discovers different architectures, learning rates, and regularization strategies for each dataset.

| Date | Chip | Dataset | Experiments | Best val_bpb | Branch |
|---|---|---|---|---|---|
| Mar 17, 2026 — FineWeb-Edu (101 experiments) | Apple M5 Max (64 GB) | FineWeb-Edu 10BT | 101 | 1.295 | master |
| Mar 16, 2026 — Climbmix (81 experiments) | Apple M5 Max (64 GB) | climbmix-400b | 81 | 1.335 | autoresearch/mar16-agent |
| Date | Chip | Best val_bpb | Branch |
|---|---|---|---|
| Mar 15, 2026 — M5 Max | Apple M5 Max (64 GB) | 1.320 | autoresearch/mar14-m5max |
| Mar 14, 2026 — M4 Pro | Apple M4 Pro (24 GB) | 1.429 | autoresearch/mar14 |
| Mar 11, 2026 — M1 Max | Apple M1 Max (64 GB) | 1.621 | autoresearch/mar11 |
-
karpathy/autoresearch PR #303 — "Evaluating Experiment Results at Scale" by Dean Sharon. Guide for noise floor estimation, Pareto efficiency, and reading results.tsv at scale. Adapted for Apple Silicon in
docs/evaluating-results.md.