Skip to content

whenpoem/aiscientist

Repository files navigation

AI Scientist

Chinese README: README.zh-CN.md

Teaching Project Notice

This repository is a teaching and research project for AI-assisted graduation-thesis drafting workflows.

Use and further development must stay within:

  • school graduation-thesis requirements
  • academic integrity and research ethics requirements
  • mandatory human review before any generated content is reused externally

Important boundaries:

  • generated text, experiments, citations, and LaTeX outputs are drafts, not submission-ready truth
  • users must manually verify facts, citations, results, and compliance before any thesis, report, or publication use
  • the project must not be presented as a one-click replacement for student authorship, advisor review, or academic evaluation
  • if local rules, school policy, or advisor requirements conflict with generated output, those rules take precedence

Overview

This repository runs an automated research pipeline:

  • idea generation
  • experiment planning and execution
  • paper writeup
  • optional improvement (citation + review)
  • single final LaTeX compile

Canonical entrypoints:

  • main.py (primary)
  • launcher.py (interactive)
  • main_idea.py, main_idea_ch.py, main_write.py (legacy compatibility wrappers)

Pipeline Architecture

Runtime flow is orchestrated in ai_scientist/pipeline/runner.py:

  • idea_stage
  • experiment_stage
  • writeup_stage
  • improvement_stage (optional)
  • compile_stage (single compile point)

Experiment execution is now real-benchmark-first:

  • classify a seed idea into a task family
  • discover public benchmark candidates from Hugging Face, GitHub, and Papers with Code
  • prefer real_template execution when a public dataset + public metric are available
  • fall back to proxy execution only after strict downgrade conditions

Core experiment modules:

  • ai_scientist/benchmark_discovery.py
  • ai_scientist/benchmark_adapters.py
  • ai_scientist/experiment_runtime.py
  • ai_scientist/perform_experiments.py

Experiment Flow

The experiment stage no longer treats a single run_1 as the whole truth. It now does:

  1. Build or coerce experiment_spec.json
  2. Discover benchmark candidates and decide:
  • experiment_kind=real_template
  • or experiment_kind=proxy
  1. Build an execution matrix
  2. Try direct experiment_impl.py
  3. Repair direct code when possible
  4. If needed, execute through a stable system path:
  • real_benchmark_adapter
  • or generic_proxy_fallback
  1. Aggregate raw seeded matrix runs into compatibility outputs and structured result bundles

Default matrix policy:

  • real_template
  • baseline
  • main
  • ablation_1
  • ablation_2
  • optional ablation_3
  • 3 seeds per arm
  • at least 2 stress settings when supported
  • proxy
  • baseline
  • main
  • at least 1 ablation
  • 3 seeds
  • at least 2 stress settings

Raw runs are stored under matrix_runs/. Compatibility outputs are still written to:

  • run_0/final_info.json
  • run_1/final_info.json

Structured Result Contracts

Experiment outputs now have three layers:

Compatibility artifacts:

  • run_0/final_info.json
  • run_1/final_info.json
  • required keys: metrics, means, stds, notes

Structured experiment truth:

  • results_bundle.json
  • writeup_context.json
  • execution_manifest.json

results_bundle.json is the authoritative machine-readable experiment summary. It includes:

  • execution_mode
  • experiment_kind
  • fidelity_level
  • benchmark_summary
  • matrix_summary
  • comparisons
  • downgrade_reason
  • real_benchmark_sources

writeup_context.json is the authoritative writeup constraint file. It includes:

  • allowed_numbers
  • allowed_comparisons
  • mandatory_disclosures
  • forbidden_claims
  • chapter4_facts
  • chapter5_facts

Writeup must use results_bundle.json and writeup_context.json as the only factual source for experiment claims. notes.txt is not a source of truth for results.

Writeup and Compile Guardrails

Writeup is constrained by structured results:

  • experimental facts must come from results_bundle.json / writeup_context.json
  • mandatory_disclosures must be preserved verbatim
  • unsupported comparisons or numbers are not allowed

Compile stage performs a writeup audit before LaTeX compile:

  • missing disclosure -> fail
  • unauthorized number -> fail
  • forbidden claim token -> fail
  • unsupported comparison sentence -> fail

In quality mode, citation issues are also strict-fail.

Ops Profiles

Use --ops-profile {throttle,quality,custom}:

  • throttle
  • forced model: deepseek-chat
  • minimal runnable chain, low cost
  • improvement/citation improvement disabled by default
  • experiment LLM edits skipped by default
  • quality
  • forced model: deepseek-reasoner
  • full chain enabled, quality-first defaults
  • strict citation policy
  • thesis review refinement enabled
  • experiment repair budget enabled
  • custom
  • no model forcing
  • all flags user-controlled
Mode Launcher preset args Effective key settings
throttle (default) --template thesis --experiment thesis --engine openalex --ops-profile throttle model=deepseek-chat, cost_profile=throttle, citation_quality_profile=light, citation_rounds<=1, citation_result_limit<=6, citation_max_selected<=2, llm_max_calls<=20, llm_retry_max>=3, llm_timeout_sec=20..60, review_reflections=1, improvement=False, improvement_citations=False, paper_template=False, AI_SCIENTIST_SKIP_EXPERIMENT_LLM=1
quality --template thesis --experiment thesis --engine openalex --ops-profile quality model=deepseek-reasoner, cost_profile=quality, citation_quality_profile=strict, citation_rounds>=5, citation_result_limit>=15, citation_max_selected>=5, llm_max_calls>=120, llm_retry_max>=8, llm_timeout_sec>=120, review_reflections=3, improvement=True, improvement_citations=True, paper_template=True, experiment_retries>=3, experiment_fallback=layered, strict_writeup_results=True, AI_SCIENTIST_SKIP_REFINEMENT=0, AI_SCIENTIST_SKIP_REVIEW_REFINEMENT=0
custom --ops-profile custom + interactive/custom inputs No profile-level forcing. User-defined values are used (review_reflections is clamped to minimum 1).

Notes:

  • throttle and quality launcher presets now start from idea generation.
  • Interactive launcher default selection is throttle.

Quick Start (Windows, Conda Env)

Throttle run:

D:/miniconda/envs/aiscientist/python.exe main.py --template thesis --experiment thesis --ops-profile throttle --engine openalex

Quality run:

D:/miniconda/envs/aiscientist/python.exe main.py --template thesis --experiment thesis --ops-profile quality --engine openalex

Launcher:

D:/miniconda/envs/aiscientist/python.exe launcher.py

Common Commands

Dry-run:

python main.py --template thesis --experiment thesis --dry-run

Idea-only:

python main.py --mode ideas --experiment thesis --ops-profile throttle --engine openalex

Writeup-only compatibility mode:

python main.py --mode writeup --template thesis --experiment thesis --ops-profile throttle

Run tests:

python -m pytest tests -q

Required Environment

Required:

  • DEEPSEEK_API_KEY

Optional:

  • OPENALEX_MAIL_ADDRESS
  • S2_API_KEY
  • GITHUB_TOKEN

See .env.example for runtime switches.

Core Flags

  • --mode {full,ideas,writeup}
  • --ops-profile {throttle,quality,custom}
  • --model
  • --paper-template / --no-paper-template
  • --experiment-retries
  • --experiment-fallback {layered,direct_only}
  • --strict-writeup-results / --no-strict-writeup-results
  • --improvement / --no-improvement
  • --improvement-citations / --no-improvement-citations
  • --citation-rounds
  • --citation-quality-profile {balanced,strict,light}
  • --llm-max-calls
  • --llm-retry-max
  • --llm-timeout-sec
  • --log-level {DEBUG,INFO,WARNING,ERROR}

Notes

  • Do not commit generated files in results/.
  • Runtime defaults and profile policy are centralized in ai_scientist/config.py.
  • Any behavior change should be reflected in tests under tests/.

About

An AI-driven research workflow for generating ideas, running experiments, and writing scientific papers.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors