A multi-agent pipeline that iteratively generates and validates mathematical proofs using LLMs, organized around a managing-editor model: agents draft a proof, a panel of reviewers evaluates it from multiple perspectives, and an editor synthesizes their feedback into a revision decision.
- Researcher — gathers relevant theorems and strategies (runs every loop)
- Mentor — formalizes the problem and proposes proof strategy
- Prover — writes the complete proof
- Editor Dispatch — assigns pool reviewers to perspectives
- Reviewers — one per perspective, from the configured pool
- Editor Decision — synthesizes reviews into
accept,right_track, orwrong_track
The loop repeats until the editor accepts or the loop budget is exhausted.
Problem folders live under runs/<N>/. When you run --problem 5, the
pipeline will:
- Create
runs/5/if it doesn't exist - Extract Question 5 from
first_proof.mdintoruns/5/QUESTION.md - Create a stub
runs/5/BACKGROUND.mdif missing
You can also create these files manually. Existing files are never overwritten.
Inputs (QUESTION.md, BACKGROUND.md) and outputs (transcripts, LaTeX
reports, metadata) live side by side in the same directory. The runs/
tree is gitignored on main and archived on the live_runs branch.
git clone https://github.com/mattrobball/first_proof.git
cd first_proofPython 3.11 or later is required (tomllib is used from the standard library).
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtRun the demo (no API keys needed):
python -m pipeline.runner --problem 5 --backend demoThis runs the full pipeline with a deterministic stub backend — researcher,
mentor, prover, reviewers, and editor — and writes a transcript, LaTeX report,
and metadata file to runs/5/.
Dry-run (validate inputs and prompt rendering only):
python -m pipeline.runner --problem 5 --dry-runRun with real backends via pipeline.toml (requires API keys or local CLI
tools as configured):
python -m pipeline.runner --problem 5Flags:
--max-loops 5— revision loop budget--rigor graduate— rigor target label included in prompts--seed 42— deterministic backend assignment whenrandomize_agentsis on--config pipeline.toml— explicit config file path
The pipeline reads pipeline.toml for per-agent backend and model selection.
When randomize_agents is enabled, each non-reviewer role is randomly
assigned a backend from the agent pool, reshuffled every loop.
The default pipeline.toml configures three backends:
randomize_agents = true
[agent_pool.claude_code]
backend = "cli"
provider = "claude"
model = "claude-opus-4-6"
[agent_pool.codex_cli]
backend = "cli"
provider = "codex"
model = "gpt-5.3-codex"
[agent_pool.gemini_api]
backend = "api"
provider = "gemini"
model = "gemini-3-pro-preview"| Pool entry | Type | Requires |
|---|---|---|
claude_code |
CLI | claude CLI installed and authenticated |
codex_cli |
CLI | codex CLI installed and authenticated |
gemini_api |
API | GEMINI_API_KEY environment variable set |
API backends read their key from the environment variable mapped to the
provider (GEMINI_API_KEY for Gemini, ANTHROPIC_API_KEY for Anthropic,
OPENAI_API_KEY for OpenAI/OpenAI-compatible).
The pipeline auto-loads a .secrets file (if found in the problem directory
or working directory) containing export KEY=value lines, so you can put
credentials there instead of exporting them in your shell.
0: proof accepted within loop budget1: not accepted after max loops2: input or prompt validation failure3: backend or runtime failure
main— production pipeline code (run artifacts are gitignored)digital_twin— a more realistic model of the academic peer review process 😉live_runs— development branch; also archives transcripts, LaTeX reports, and metadata from past pipeline runs
python -m pytest -q