Public runtime repository for reproducing CLBC experiments.
This repo intentionally excludes private drafting and internal report files (for example docs/, plans/, paper/) via .gitignore.
Everything referenced in this README exists in the tracked Git repo.
- macOS or Linux
- Python 3.11+
bash,curl,rgollamafor non-mock empirical evaluation lanes- CUDA GPU recommended for full 8.3/8.5/9.5/11.2/11.4 runs
Optional for ZK-heavy paths:
cargo install rzup
rzup install rust
rzup install r0vmpython -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install pytest matplotlibTracked prereg/evidence files used by scripts live in prereg/:
prereg/9.5_preregistered_analysis_plan.mdprereg/11.2_preregistered_analysis_plan.mdprereg/11.3_preregistered_analysis_plan.mdprereg/9.5_*.mdsupporting 9.5 gate completeness checks
bash scripts/run_9_5_smoke_cpu.sh
bash scripts/run_11_2_smoke_cpu.sh
bash scripts/run_11_4_smoke_cpu.shUse these for full runs:
bash scripts/run_8_3_gpu.shbash scripts/run_8_5_gpu.shbash scripts/run_9_5_gpu.shbash scripts/run_11_2_gpu.shbash scripts/run_11_4_gpu.shbash scripts/run_11_3_prereqs.shpython scripts/run_12_experiment_protocol.py --out experiments/results
Lower-level theorem/metric experiments are available via:
python bench/metrics/run_metrics.py --t2/--t3/--t4/--t5/--t81/--t83/--t84/--t85/--t94/--t101/--t102/--t103/--t104/--t105/--t111 ...python bench/semantic_slack_gate/run_teps_baselines.py ...python scripts/run_10_5_env_residuals.py ...
# 8.x
bash scripts/run_8_3_gpu.sh
bash scripts/run_8_5_gpu.sh
# 9.5 strong-accept battery + gate
bash scripts/run_9_5_gpu.sh
# 11.2 adaptive attacker
bash scripts/run_11_2_gpu.sh
# 11.4 baseline suite
bash scripts/run_11_4_gpu.sh
# 11.3 prerequisite artifacts (10.1-10.4 probes, 11.1 slack metrics, perf log)
bash scripts/run_11_3_prereqs.sh
# 11.3 report card
mkdir -p artifacts/11.3
python scripts/hash_11_3_prereg.py \
--prereg prereg/11.3_preregistered_analysis_plan.md \
--thresholds spec/report_card_thresholds.json \
--out artifacts/11.3/prereg_hash.txt \
--manifest-out artifacts/11.3/prereg_manifest.json
python scripts/run_11_3_report_card.py \
--prereg prereg/11.3_preregistered_analysis_plan.md \
--thresholds spec/report_card_thresholds.json \
--prereg-hash artifacts/11.3/prereg_hash.txt \
--manifest-out artifacts/11.3/runtime_manifest.json \
--out artifacts/11.3/t113_report_card.json | tee artifacts/11.3/metrics.log
# 12.x tables/plots/report
python scripts/run_12_experiment_protocol.py --out experiments/resultsartifacts/neurips_strong_accept/artifacts/11.2/t112_attacker_metrics.jsonartifacts/11.3/t113_report_card.jsonartifacts/11.4/t114_empirical_baselines_summary.jsonexperiments/results/tables/experiments/results/plots/experiments/results/report.md
- Most long-running scripts support resume behavior.
- Generated artifacts are intentionally Git-ignored.
scripts/run_1_12_paper_gate.pyexpects a private docs pack and is not part of the public-only reproduction path.