Skip to content
Dave Graham edited this page Mar 18, 2026 · 12 revisions

Autoresearch — Characterization & Experimentation

Apple Silicon port of karpathy/autoresearch with autonomous LLM-driven experiment optimization across multiple training datasets.

Tools

  • TUI Dashboard — Real-time terminal dashboard for monitoring training runs
  • Multi-Dataset Suite — Framework for running experiments across different training datasets

Cross-Dataset Comparison

Autonomous agent runs on the same hardware (Apple M5 Max, 64 GB) reveal that optimal model configuration is dataset-dependent. The agent independently discovers different architectures, learning rates, and regularization strategies for each dataset.

Cross-Dataset Comparison

Read the full analysis →


Autonomous Agent Runs

Date Chip Dataset Experiments Best val_bpb Branch
Mar 17, 2026 — FineWeb-Edu (101 experiments) Apple M5 Max (64 GB) FineWeb-Edu 10BT 101 1.295 master
Mar 16, 2026 — Climbmix (81 experiments) Apple M5 Max (64 GB) climbmix-400b 81 1.335 autoresearch/mar16-agent

Experiment Logs

Date Chip Best val_bpb Branch
Mar 15, 2026 — M5 Max Apple M5 Max (64 GB) 1.320 autoresearch/mar14-m5max
Mar 14, 2026 — M4 Pro Apple M4 Pro (24 GB) 1.429 autoresearch/mar14
Mar 11, 2026 — M1 Max Apple M1 Max (64 GB) 1.621 autoresearch/mar11

References

Clone this wiki locally