Skip to content

docs: restructure documentation and align eval/batch backends#88

Merged
andylizf merged 50 commits intomainfrom
docs/restructure-and-align-backend
Feb 5, 2026
Merged

docs: restructure documentation and align eval/batch backends#88
andylizf merged 50 commits intomainfrom
docs/restructure-and-align-backend

Conversation

@andylizf
Copy link
Contributor

@andylizf andylizf commented Feb 3, 2026

Summary

  • Restructure SUBMIT.md with clear 3-way evaluation workflow (solutions dir / custom dir / pairs file)
  • Add --bucket-url, --backend, --sync-bucket documentation
  • Move technical details (config.yaml, uv_overrides) to CONTRIBUTING.md
  • Simplify track READMEs for users
  • Align frontier eval defaults with frontier batch:
    • research → skypilot by default
    • algorithmic → docker by default
  • Replace --skypilot flag with --backend docker/skypilot

Test plan

  • Verify frontier eval research defaults to skypilot
  • Verify frontier eval algorithmic defaults to docker
  • Verify --backend flag works for both tracks

- Replace deprecated frontier-eval with frontier batch
- Simplify research/README.md Batch Evaluation (reference SUBMIT.md)
- Fix Python API example to show backend="docker" as override
- Fix "judge sever" → "judge server"
- Fix "1 -- 3 public test case" → "1-3 public test cases"
- Remove confusing --problems example in research/README.md
- Remove uv run prefix for consistency (frontier works after pip install)
- Fix problems.txt link path in SUBMIT.md
- Remove outdated variant count (numbers change)
- Add Solution Requirements section to research/README.md
- Remove contributor-focused note about evaluator.py detection
- Add language field to RuntimeConfig for per-problem language config
- Add LanguageConfig dataclass and registry (python, cpp) in config.py
- Update generate_solutions.py for language-aware code extraction
- Update check_solutions.py to scan all file types
- Update batch evaluator to use per-problem extensions
- Fix deepseek provider detection to use DEEPSEEK_API_KEY
- Add language: cpp to nbody_simulation configs
@andylizf andylizf closed this Feb 5, 2026
@andylizf andylizf reopened this Feb 5, 2026
@andylizf andylizf closed this Feb 5, 2026
@andylizf andylizf reopened this Feb 5, 2026
@andylizf andylizf merged commit 427cfdd into main Feb 5, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant