AI Scientist

Chinese README: README.zh-CN.md

Teaching Project Notice

This repository is a teaching and research project for AI-assisted graduation-thesis drafting workflows.

Use and further development must stay within:

school graduation-thesis requirements
academic integrity and research ethics requirements
mandatory human review before any generated content is reused externally

Important boundaries:

generated text, experiments, citations, and LaTeX outputs are drafts, not submission-ready truth
users must manually verify facts, citations, results, and compliance before any thesis, report, or publication use
the project must not be presented as a one-click replacement for student authorship, advisor review, or academic evaluation
if local rules, school policy, or advisor requirements conflict with generated output, those rules take precedence

Overview

This repository runs an automated research pipeline:

idea generation
experiment planning and execution
paper writeup
optional improvement (citation + review)
single final LaTeX compile

Canonical entrypoints:

main.py (primary)
launcher.py (interactive)
main_idea.py, main_idea_ch.py, main_write.py (legacy compatibility wrappers)

Pipeline Architecture

Runtime flow is orchestrated in ai_scientist/pipeline/runner.py:

idea_stage
experiment_stage
writeup_stage
improvement_stage (optional)
compile_stage (single compile point)

Experiment execution is now real-benchmark-first:

classify a seed idea into a task family
discover public benchmark candidates from Hugging Face, GitHub, and Papers with Code
prefer real_template execution when a public dataset + public metric are available
fall back to proxy execution only after strict downgrade conditions

Core experiment modules:

ai_scientist/benchmark_discovery.py
ai_scientist/benchmark_adapters.py
ai_scientist/experiment_runtime.py
ai_scientist/perform_experiments.py

Experiment Flow

The experiment stage no longer treats a single run_1 as the whole truth. It now does:

Build or coerce experiment_spec.json
Discover benchmark candidates and decide:

experiment_kind=real_template
or experiment_kind=proxy

Build an execution matrix
Try direct experiment_impl.py
Repair direct code when possible
If needed, execute through a stable system path:

real_benchmark_adapter
or generic_proxy_fallback

Aggregate raw seeded matrix runs into compatibility outputs and structured result bundles

Default matrix policy:

real_template
baseline
main
ablation_1
ablation_2
optional ablation_3
3 seeds per arm
at least 2 stress settings when supported
proxy
baseline
main
at least 1 ablation
3 seeds
at least 2 stress settings

Raw runs are stored under matrix_runs/. Compatibility outputs are still written to:

run_0/final_info.json
run_1/final_info.json

Structured Result Contracts

Experiment outputs now have three layers:

Compatibility artifacts:

run_0/final_info.json
run_1/final_info.json
required keys: metrics, means, stds, notes

Structured experiment truth:

results_bundle.json
writeup_context.json
execution_manifest.json

results_bundle.json is the authoritative machine-readable experiment summary. It includes:

execution_mode
experiment_kind
fidelity_level
benchmark_summary
matrix_summary
comparisons
downgrade_reason
real_benchmark_sources

writeup_context.json is the authoritative writeup constraint file. It includes:

allowed_numbers
allowed_comparisons
mandatory_disclosures
forbidden_claims
chapter4_facts
chapter5_facts

Writeup must use results_bundle.json and writeup_context.json as the only factual source for experiment claims. notes.txt is not a source of truth for results.

Writeup and Compile Guardrails

Writeup is constrained by structured results:

experimental facts must come from results_bundle.json / writeup_context.json
mandatory_disclosures must be preserved verbatim
unsupported comparisons or numbers are not allowed

Compile stage performs a writeup audit before LaTeX compile:

missing disclosure -> fail
unauthorized number -> fail
forbidden claim token -> fail
unsupported comparison sentence -> fail

In quality mode, citation issues are also strict-fail.

Ops Profiles

Use --ops-profile {throttle,quality,custom}:

throttle
forced model: deepseek-chat
minimal runnable chain, low cost
improvement/citation improvement disabled by default
experiment LLM edits skipped by default
quality
forced model: deepseek-reasoner
full chain enabled, quality-first defaults
strict citation policy
thesis review refinement enabled
experiment repair budget enabled
custom
no model forcing
all flags user-controlled

Mode	Launcher preset args	Effective key settings
`throttle` (default)	`--template thesis --experiment thesis --engine openalex --ops-profile throttle`	`model=deepseek-chat`, `cost_profile=throttle`, `citation_quality_profile=light`, `citation_rounds<=1`, `citation_result_limit<=6`, `citation_max_selected<=2`, `llm_max_calls<=20`, `llm_retry_max>=3`, `llm_timeout_sec=20..60`, `review_reflections=1`, `improvement=False`, `improvement_citations=False`, `paper_template=False`, `AI_SCIENTIST_SKIP_EXPERIMENT_LLM=1`
`quality`	`--template thesis --experiment thesis --engine openalex --ops-profile quality`	`model=deepseek-reasoner`, `cost_profile=quality`, `citation_quality_profile=strict`, `citation_rounds>=5`, `citation_result_limit>=15`, `citation_max_selected>=5`, `llm_max_calls>=120`, `llm_retry_max>=8`, `llm_timeout_sec>=120`, `review_reflections=3`, `improvement=True`, `improvement_citations=True`, `paper_template=True`, `experiment_retries>=3`, `experiment_fallback=layered`, `strict_writeup_results=True`, `AI_SCIENTIST_SKIP_REFINEMENT=0`, `AI_SCIENTIST_SKIP_REVIEW_REFINEMENT=0`
`custom`	`--ops-profile custom` + interactive/custom inputs	No profile-level forcing. User-defined values are used (`review_reflections` is clamped to minimum `1`).

Notes:

throttle and quality launcher presets now start from idea generation.
Interactive launcher default selection is throttle.

Quick Start (Windows, Conda Env)

Throttle run:

D:/miniconda/envs/aiscientist/python.exe main.py --template thesis --experiment thesis --ops-profile throttle --engine openalex

Quality run:

D:/miniconda/envs/aiscientist/python.exe main.py --template thesis --experiment thesis --ops-profile quality --engine openalex

Launcher:

D:/miniconda/envs/aiscientist/python.exe launcher.py

Common Commands

Dry-run:

python main.py --template thesis --experiment thesis --dry-run

Idea-only:

python main.py --mode ideas --experiment thesis --ops-profile throttle --engine openalex

Writeup-only compatibility mode:

python main.py --mode writeup --template thesis --experiment thesis --ops-profile throttle

Run tests:

python -m pytest tests -q

Required Environment

Required:

DEEPSEEK_API_KEY

Optional:

OPENALEX_MAIL_ADDRESS
S2_API_KEY
GITHUB_TOKEN

See .env.example for runtime switches.

Core Flags

--mode {full,ideas,writeup}
--ops-profile {throttle,quality,custom}
--model
--paper-template / --no-paper-template
--experiment-retries
--experiment-fallback {layered,direct_only}
--strict-writeup-results / --no-strict-writeup-results
--improvement / --no-improvement
--improvement-citations / --no-improvement-citations
--citation-rounds
--citation-quality-profile {balanced,strict,light}
--llm-max-calls
--llm-retry-max
--llm-timeout-sec
--log-level {DEBUG,INFO,WARNING,ERROR}

Notes

Do not commit generated files in results/.
Runtime defaults and profile policy are centralized in ai_scientist/config.py.
Any behavior change should be reflected in tests under tests/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Scientist

Teaching Project Notice

Overview

Pipeline Architecture

Experiment Flow

Structured Result Contracts

Writeup and Compile Guardrails

Ops Profiles

Quick Start (Windows, Conda Env)

Common Commands

Required Environment

Core Flags

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ai_scientist		ai_scientist
docs		docs
templates		templates
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
README.zh-CN.md		README.zh-CN.md
launcher.py		launcher.py
main.py		main.py
main_idea.py		main_idea.py
main_idea_ch.py		main_idea_ch.py
main_write.py		main_write.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AI Scientist

Teaching Project Notice

Overview

Pipeline Architecture

Experiment Flow

Structured Result Contracts

Writeup and Compile Guardrails

Ops Profiles

Quick Start (Windows, Conda Env)

Common Commands

Required Environment

Core Flags

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages