Home

Autoresearch — Characterization & Experimentation

Apple Silicon port of karpathy/autoresearch with autonomous LLM-driven experiment optimization across multiple training datasets.

Tools

TUI Dashboard — Real-time terminal dashboard for monitoring training runs
Multi-Dataset Suite — Framework for running experiments across different training datasets

Cross-Dataset Comparison

Autonomous agent runs on the same hardware (Apple M5 Max, 64 GB) reveal that optimal model configuration is dataset-dependent. The agent independently discovers different architectures, learning rates, and regularization strategies for each dataset.

Cross-Dataset Comparison

Read the full analysis →

Autonomous Agent Runs

Date	Chip	Dataset	Experiments	Best val_bpb	Branch
Mar 17, 2026 — FineWeb-Edu (101 experiments)	Apple M5 Max (64 GB)	FineWeb-Edu 10BT	101	1.295	`master`
Mar 16, 2026 — Climbmix (81 experiments)	Apple M5 Max (64 GB)	climbmix-400b	81	1.335	`autoresearch/mar16-agent`

Experiment Logs

Date	Chip	Best val_bpb	Branch
Mar 15, 2026 — M5 Max	Apple M5 Max (64 GB)	1.320	`autoresearch/mar14-m5max`
Mar 14, 2026 — M4 Pro	Apple M4 Pro (24 GB)	1.429	`autoresearch/mar14`
Mar 11, 2026 — M1 Max	Apple M1 Max (64 GB)	1.621	`autoresearch/mar11`

References

karpathy/autoresearch PR #303 — "Evaluating Experiment Results at Scale" by Dean Sharon. Guide for noise floor estimation, Pareto efficiency, and reading results.tsv at scale. Adapted for Apple Silicon in docs/evaluating-results.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Autoresearch — Characterization & Experimentation

Tools

Cross-Dataset Comparison

Autonomous Agent Runs

Experiment Logs

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally