Chinese README: README.zh-CN.md
This repository is a teaching and research project for AI-assisted graduation-thesis drafting workflows.
Use and further development must stay within:
- school graduation-thesis requirements
- academic integrity and research ethics requirements
- mandatory human review before any generated content is reused externally
Important boundaries:
- generated text, experiments, citations, and LaTeX outputs are drafts, not submission-ready truth
- users must manually verify facts, citations, results, and compliance before any thesis, report, or publication use
- the project must not be presented as a one-click replacement for student authorship, advisor review, or academic evaluation
- if local rules, school policy, or advisor requirements conflict with generated output, those rules take precedence
This repository runs an automated research pipeline:
- idea generation
- experiment planning and execution
- paper writeup
- optional improvement (citation + review)
- single final LaTeX compile
Canonical entrypoints:
main.py(primary)launcher.py(interactive)main_idea.py,main_idea_ch.py,main_write.py(legacy compatibility wrappers)
Runtime flow is orchestrated in ai_scientist/pipeline/runner.py:
idea_stageexperiment_stagewriteup_stageimprovement_stage(optional)compile_stage(single compile point)
Experiment execution is now real-benchmark-first:
- classify a seed idea into a task family
- discover public benchmark candidates from Hugging Face, GitHub, and Papers with Code
- prefer
real_templateexecution when a public dataset + public metric are available - fall back to
proxyexecution only after strict downgrade conditions
Core experiment modules:
ai_scientist/benchmark_discovery.pyai_scientist/benchmark_adapters.pyai_scientist/experiment_runtime.pyai_scientist/perform_experiments.py
The experiment stage no longer treats a single run_1 as the whole truth. It now does:
- Build or coerce
experiment_spec.json - Discover benchmark candidates and decide:
experiment_kind=real_template- or
experiment_kind=proxy
- Build an execution matrix
- Try direct
experiment_impl.py - Repair direct code when possible
- If needed, execute through a stable system path:
real_benchmark_adapter- or
generic_proxy_fallback
- Aggregate raw seeded matrix runs into compatibility outputs and structured result bundles
Default matrix policy:
real_templatebaselinemainablation_1ablation_2- optional
ablation_3 3seeds per arm- at least
2stress settings when supported proxybaselinemain- at least
1ablation 3seeds- at least
2stress settings
Raw runs are stored under matrix_runs/. Compatibility outputs are still written to:
run_0/final_info.jsonrun_1/final_info.json
Experiment outputs now have three layers:
Compatibility artifacts:
run_0/final_info.jsonrun_1/final_info.json- required keys:
metrics,means,stds,notes
Structured experiment truth:
results_bundle.jsonwriteup_context.jsonexecution_manifest.json
results_bundle.json is the authoritative machine-readable experiment summary. It includes:
execution_modeexperiment_kindfidelity_levelbenchmark_summarymatrix_summarycomparisonsdowngrade_reasonreal_benchmark_sources
writeup_context.json is the authoritative writeup constraint file. It includes:
allowed_numbersallowed_comparisonsmandatory_disclosuresforbidden_claimschapter4_factschapter5_facts
Writeup must use results_bundle.json and writeup_context.json as the only factual source for experiment claims. notes.txt is not a source of truth for results.
Writeup is constrained by structured results:
- experimental facts must come from
results_bundle.json/writeup_context.json mandatory_disclosuresmust be preserved verbatim- unsupported comparisons or numbers are not allowed
Compile stage performs a writeup audit before LaTeX compile:
- missing disclosure -> fail
- unauthorized number -> fail
- forbidden claim token -> fail
- unsupported comparison sentence -> fail
In quality mode, citation issues are also strict-fail.
Use --ops-profile {throttle,quality,custom}:
throttle- forced model:
deepseek-chat - minimal runnable chain, low cost
- improvement/citation improvement disabled by default
- experiment LLM edits skipped by default
quality- forced model:
deepseek-reasoner - full chain enabled, quality-first defaults
- strict citation policy
- thesis review refinement enabled
- experiment repair budget enabled
custom- no model forcing
- all flags user-controlled
| Mode | Launcher preset args | Effective key settings |
|---|---|---|
throttle (default) |
--template thesis --experiment thesis --engine openalex --ops-profile throttle |
model=deepseek-chat, cost_profile=throttle, citation_quality_profile=light, citation_rounds<=1, citation_result_limit<=6, citation_max_selected<=2, llm_max_calls<=20, llm_retry_max>=3, llm_timeout_sec=20..60, review_reflections=1, improvement=False, improvement_citations=False, paper_template=False, AI_SCIENTIST_SKIP_EXPERIMENT_LLM=1 |
quality |
--template thesis --experiment thesis --engine openalex --ops-profile quality |
model=deepseek-reasoner, cost_profile=quality, citation_quality_profile=strict, citation_rounds>=5, citation_result_limit>=15, citation_max_selected>=5, llm_max_calls>=120, llm_retry_max>=8, llm_timeout_sec>=120, review_reflections=3, improvement=True, improvement_citations=True, paper_template=True, experiment_retries>=3, experiment_fallback=layered, strict_writeup_results=True, AI_SCIENTIST_SKIP_REFINEMENT=0, AI_SCIENTIST_SKIP_REVIEW_REFINEMENT=0 |
custom |
--ops-profile custom + interactive/custom inputs |
No profile-level forcing. User-defined values are used (review_reflections is clamped to minimum 1). |
Notes:
throttleandqualitylauncher presets now start from idea generation.- Interactive launcher default selection is
throttle.
Throttle run:
D:/miniconda/envs/aiscientist/python.exe main.py --template thesis --experiment thesis --ops-profile throttle --engine openalexQuality run:
D:/miniconda/envs/aiscientist/python.exe main.py --template thesis --experiment thesis --ops-profile quality --engine openalexLauncher:
D:/miniconda/envs/aiscientist/python.exe launcher.pyDry-run:
python main.py --template thesis --experiment thesis --dry-runIdea-only:
python main.py --mode ideas --experiment thesis --ops-profile throttle --engine openalexWriteup-only compatibility mode:
python main.py --mode writeup --template thesis --experiment thesis --ops-profile throttleRun tests:
python -m pytest tests -qRequired:
DEEPSEEK_API_KEY
Optional:
OPENALEX_MAIL_ADDRESSS2_API_KEYGITHUB_TOKEN
See .env.example for runtime switches.
--mode {full,ideas,writeup}--ops-profile {throttle,quality,custom}--model--paper-template / --no-paper-template--experiment-retries--experiment-fallback {layered,direct_only}--strict-writeup-results / --no-strict-writeup-results--improvement / --no-improvement--improvement-citations / --no-improvement-citations--citation-rounds--citation-quality-profile {balanced,strict,light}--llm-max-calls--llm-retry-max--llm-timeout-sec--log-level {DEBUG,INFO,WARNING,ERROR}
- Do not commit generated files in
results/. - Runtime defaults and profile policy are centralized in
ai_scientist/config.py. - Any behavior change should be reflected in tests under
tests/.