docs: restructure documentation and align eval/batch backends#88
Merged
docs: restructure documentation and align eval/batch backends#88
Conversation
- Replace deprecated frontier-eval with frontier batch - Simplify research/README.md Batch Evaluation (reference SUBMIT.md) - Fix Python API example to show backend="docker" as override
- Fix "judge sever" → "judge server" - Fix "1 -- 3 public test case" → "1-3 public test cases" - Remove confusing --problems example in research/README.md - Remove uv run prefix for consistency (frontier works after pip install)
- Fix problems.txt link path in SUBMIT.md - Remove outdated variant count (numbers change) - Add Solution Requirements section to research/README.md - Remove contributor-focused note about evaluator.py detection
… algorithmic->docker)
- Add language field to RuntimeConfig for per-problem language config - Add LanguageConfig dataclass and registry (python, cpp) in config.py - Update generate_solutions.py for language-aware code extraction - Update check_solutions.py to scan all file types - Update batch evaluator to use per-problem extensions - Fix deepseek provider detection to use DEEPSEEK_API_KEY - Add language: cpp to nbody_simulation configs
- Fix JSON parsing to work regardless of exit code - Add SSH key generation step for SkyPilot - Increase timeout and add verbose flag for debugging
…atibility" This reverts commit 9262040.
andylizf
added a commit
that referenced
this pull request
Feb 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--bucket-url,--backend,--sync-bucketdocumentationfrontier evaldefaults withfrontier batch:--skypilotflag with--backend docker/skypilotTest plan
frontier eval researchdefaults to skypilotfrontier eval algorithmicdefaults to docker--backendflag works for both tracks