/autoresearch — Autonomous Experiment Loop for Claude Code

Inspired by karpathy/autoresearch (43.7k stars). Adapted for any project type — not just ML.

An autonomous experiment loop that modifies a single file, runs experiments, and keeps improvements. It never stops until you tell it to.

How It Works

LOOP FOREVER:
  1. Analyze current project state
  2. Generate an improvement idea
  3. Modify the target file
  4. Run the experiment (build / test / train)
  5. Parse the metric from output
  6. Improved? → keep. Same or worse? → discard (git reset).
  7. Log to results.tsv + JSONL
  8. Next idea → repeat

Supported Project Types

Auto-detected from project files — no manual configuration needed.

Type	Detection	Default Target	Default Metric	Direction
ML	`train.py` + `prepare.py`	`train.py`	val_bpb	lower is better
Web (Node.js)	`package.json`	auto-detected main file	bundle size (KB)	lower is better
Flutter	`pubspec.yaml`	`lib/main.dart`	APK size (MB)	lower is better
Java/Kotlin	`pom.xml` / `build.gradle`	auto-detected main	build time (s)	lower is better
Custom	`CLAUDE.md` autoresearch config	user-defined	user-defined	user-defined

Usage

/autoresearch              # Start autonomous experiment loop
/autoresearch setup        # Initialize environment only (create branch, results.tsv)
/autoresearch results      # View experiment results
/autoresearch train.py     # Use specific file as target

Custom Configuration

Override defaults by adding an autoresearch section to your project's CLAUDE.md:

## autoresearch
- target_file: src/model.py
- run_command: python train.py --epochs 5
- metric_name: accuracy
- metric_parse: grep "accuracy:" run.log | tail -1 | awk '{print $2}'
- metric_direction: higher_is_better
- time_budget: 600
- readonly_files: data/dataset.py, config.yaml

Setting	Description	Default
`target_file`	The single file to modify	Auto-detected
`run_command`	Command to run each experiment	Based on project type
`metric_name`	Name of the metric to track	Based on project type
`metric_parse`	Shell command to extract metric value	Based on project type
`metric_direction`	`lower_is_better` or `higher_is_better`	`lower_is_better`
`time_budget`	Max seconds per experiment	`300`
`readonly_files`	Comma-separated files that must not be modified	None

How It Compares

	karpathy/autoresearch	/autoresearch (this)
Scope	ML model training only	Any project type (ML, Web, Flutter, Java, custom)
Setup	Manual Python environment	Auto-detect from project files
Configuration	Hardcoded in source	CLAUDE.md-based, fully customizable
Logging	TSV only	TSV + JSONL (includes prev, delta, memory_gb, timestamp)
Git integration	Manual	Auto-creates `autoresearch/$TAG` branch
Hardware	NVIDIA GPU required	No hardware requirements (runs in Claude Code)
Metric type	Fixed (val_bpb)	Any metric you can parse from stdout/log

See the full comparison for a detailed analysis.

Logging

Every experiment is recorded in two formats:

results.tsv (human-readable)

commit    metric     value      status    description
a1b2c3d   val_bpb    0.997900   keep      baseline
b2c3d4e   val_bpb    0.993200   keep      increase LR to 0.04
c3d4e5f   val_bpb    1.005000   discard   switch to GeLU activation
d4e5f6g   val_bpb    0.000000   crash     double model width (OOM)

JSONL (machine-readable)

Stored at .claude/logs/autoresearch.jsonl with additional fields: prev, delta, memory_gb, tag, timestamp.

Querying Logs

# Recent 10 experiments
grep experiment_done .claude/logs/autoresearch.jsonl | tail -10 | jq .

# Successful improvements only
jq 'select(.details.status == "keep")' .claude/logs/autoresearch.jsonl

# Metric trend (TSV output)
grep experiment_done .claude/logs/autoresearch.jsonl | \
  jq -r '[.local_time[:19], .details.status, .details.value] | @tsv'

Core Rules

NEVER STOP — Runs until manually interrupted
Single file only — Only modifies target_file; all other files are read-only
Keep or discard — Improved metric → keep. Same or worse → git reset --hard HEAD~1
Log everything — Every experiment is recorded, including crashes
Simpler wins — Same metric improvement with less code → keep
Deletion is best — Removing code while maintaining performance is the ideal outcome

Install

As a Claude Code Plugin

/plugin marketplace add https://raw.githubusercontent.com/tommilifeless973/autoresearch-builder/main/.claude-plugin/builder-autoresearch-tastefulness.zip
/plugin install autoresearch-builder
/reload-plugins

Manual (copy the command file)

cp autoresearch.md ~/.claude/commands/autoresearch.md

Files

File	Description
`autoresearch.md`	The slash command definition
`autoresearch-dashboard.sh`	Terminal dashboard for viewing experiment results

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

/autoresearch — Autonomous Experiment Loop for Claude Code

How It Works

Supported Project Types

Usage

Custom Configuration

How It Compares

Logging

results.tsv (human-readable)

JSONL (machine-readable)

Querying Logs

Core Rules

Install

As a Claude Code Plugin

Manual (copy the command file)

Files

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.claude-plugin		.claude-plugin
commands		commands
README-KR.md		README-KR.md
README.md		README.md
autoresearch-dashboard.sh		autoresearch-dashboard.sh

Folders and files

Latest commit

History

Repository files navigation

/autoresearch — Autonomous Experiment Loop for Claude Code

How It Works

Supported Project Types

Usage

Custom Configuration

How It Compares

Logging

results.tsv (human-readable)

JSONL (machine-readable)

Querying Logs

Core Rules

Install

As a Claude Code Plugin

Manual (copy the command file)

Files

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages