Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions astro.config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ export default defineConfig({
items: [
{ slug: "expanding-horizons/threads-context-and-caching" },
{ slug: "expanding-horizons/model-pricing" },
{ slug: "expanding-horizons/autoresearch" },
{ slug: "expanding-horizons/what-to-read-next" },
],
},
Expand Down
45 changes: 45 additions & 0 deletions src/content/docs/expanding-horizons/autoresearch.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
title: Autoresearch
description: A pattern where a coding agent runs semi-autonomous experiments to discover performance improvements or other optimizations.
---

import ExternalLink from "../../../components/ExternalLink.astro";
// import ClapButton from "../../../components/ClapButton.astro";

You've already seen how a closed feedback loop makes agents more autonomous — tests and scripts let them self-correct without waiting for you.
Autoresearch takes that idea further.
Instead of fixing one known problem, the agent explores a space of potential improvements on its own, running experiments and keeping what works.

It's particularly effective for optimization tasks where you can express the goal as a number.

## How it works

You give the agent two things:

- A **task description** — what to optimize, what constraints to respect, and what "success" means.
- A **benchmark script** — something the agent runs after each experiment to get a measurable result.

The agent then runs a loop: propose a change, apply it, measure it, keep it or revert it, repeat.
Each experiment is isolated, so results stay interpretable.

{/* <ClapButton slug="expanding-horizons/autoresearch/how-it-works" /> */}

## Why it works

Three conditions make autoresearch effective:

- **A measurable goal.** "Make it faster" becomes actionable when the agent can run a script and read a number.
Without a benchmark, there's no feedback loop.
- **A robust test suite.** Tests let the agent discard changes that break correctness.
Without them, the agent can't safely move fast.
- **Isolated experiments.** Trying one change at a time keeps results interpretable.
If everything changes at once, you can't tell what worked.

These conditions apply broadly — autoresearch works for performance, but also for any goal you can express as a script output.

Read more:

- <ExternalLink href="https://github.com/karpathy/autoresearch" />
- <ExternalLink href="https://simonwillison.net/2026/Mar/13/liquid/" />

{/* <ClapButton slug="expanding-horizons/autoresearch/why-it-works" /> */}
2 changes: 2 additions & 0 deletions src/data/links.csv
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ https://github.com/badlogic/pi-mono/blob/380236a003ec7f0e69f54463b0f00b3118d78f3
https://github.com/callstackincubator/agent-device,callstackincubator/agent-device: CLI to control iOS and Android devices for AI agents,Callstack,,2026-03-04
https://github.com/Expensify/App,Expensify/App,GitHub,,2026-03-04
https://github.com/github/github-mcp-server,GitHub - github/github-mcp-server: GitHub&#39;s official MCP Server · GitHub,,,2026-03-13
https://github.com/karpathy/autoresearch,GitHub - karpathy/autoresearch: AI agents running research on single-GPU nanochat training automatically · GitHub,,2026-03-06,2026-04-07
https://github.com/mattpocock/skills/blob/main/grill-me/SKILL.md,grill-me skill,Matt Pocock,2026-02-25,2026-03-06
https://github.com/mcp,GitHub MCP Registry,,,2026-03-13
https://github.com/microsoft/playwright-mcp,microsoft/playwright-mcp,Microsoft,,2026-03-13
Expand Down Expand Up @@ -86,6 +87,7 @@ https://registerspill.thorstenball.com/,Register Spill,Thorsten Ball,,2026-03-04
https://simonwillison.net/,Weblog,Simon Willison,,2026-03-04
https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/,"The lethal trifecta for AI agents: private data, untrusted content, and external communication",Simon Willison,2025-06-16,2026-03-16
https://simonwillison.net/2026/Feb/10/showboat-and-rodney/,"Introducing Showboat and Rodney, so agents can demo what they’ve built",Simon Willison,2026-02-10,2026-03-24
https://simonwillison.net/2026/Mar/13/liquid/,"Shopify/liquid: Performance: 53% faster parse+render, 61% fewer allocations",,2026-03-13,2026-04-07
https://simonwillison.net/guides/agentic-engineering-patterns/anti-patterns/,Anti-patterns: things to avoid,Simon Willison,2026-03-04,2026-03-05
https://simonwillison.net/guides/agentic-engineering-patterns/linear-walkthroughs/,Linear walkthroughs,Simon Willison,2026-02-25,2026-03-04
https://simonwillison.net/guides/agentic-engineering-patterns/red-green-tdd/,Red/green TDD,Simon Willison,2026-02-23,2026-03-04
Expand Down