feat(ci): add minimaxai/minimax-m2 to nightly benchmark#795
feat(ci): add minimaxai/minimax-m2 to nightly benchmark#795smfirmin wants to merge 8 commits intolightseekorg:mainfrom
Conversation
Signed-off-by: Sydney Firmin <sydney.firmin@oracle.com>
📝 WalkthroughWalkthroughAdds the minimaxai/minimax-m2 model to nightly benchmark configuration, test matrix, and model specifications (including tp, features, and runtime args). Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates the Highlights
Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds the minimaxai/minimax-m2 model to the nightly benchmark suite. The changes correctly configure the model specification and register it for benchmarking. My review includes one suggestion to update the model's feature list in e2e_test/infra/model_specs.py to more accurately reflect its capabilities, ensuring it's included in all relevant tests. This comment aligns with general best practices and does not contradict any specific rules.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/nightly-benchmark.yml:
- Line 124: Remove the spaces immediately inside the curly braces for the YAML
list item containing the minimaxai-minimax-m2 entry; update the line "- { id:
minimaxai/minimax-m2, slug: minimaxai-minimax-m2, test_class:
TestNightlyMinimaxM2Single }" to remove the space after "{" and before "}" (so
the entry becomes "- {id: minimaxai/minimax-m2, slug: minimaxai-minimax-m2,
test_class: TestNightlyMinimaxM2Single}") to satisfy the YAMLlint `braces` rule
while keeping the same keys and values (identifiers: minimaxai/minimax-m2, slug
minimaxai-minimax-m2, test_class TestNightlyMinimaxM2Single).
- Line 124: The multi-worker matrix is missing the generated
TestNightlyMinimaxM2Multi entry, so add an entry with the same repo identifiers
but test_class: TestNightlyMinimaxM2Multi to the multi-worker matrix;
specifically add an item matching id: minimaxai/minimax-m2, slug:
minimaxai-minimax-m2, test_class: TestNightlyMinimaxM2Multi so the multi-worker
nightly path runs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 74e1960f-0ea7-4ee7-956d-df40b7f55e82
📒 Files selected for processing (3)
.github/workflows/nightly-benchmark.ymle2e_test/benchmarks/test_nightly_perf.pye2e_test/infra/model_specs.py
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8941f815f2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Sydney Firmin <sydney.firmin@oracle.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 48fd9d607e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| - { id: Qwen/Qwen2.5-7B-Instruct, slug: Qwen-Qwen2.5-7B-Instruct, test_class: TestNightlyQwen7bMulti } | ||
| - { id: Qwen/Qwen3-30B-A3B, slug: Qwen-Qwen3-30B-A3B, test_class: TestNightlyQwen30bMulti } | ||
| - { id: openai/gpt-oss-20b, slug: openai-gpt-oss-20b, test_class: TestNightlyGptOss20bMulti } | ||
| - { id: minimaxai/minimax-m2, slug: minimaxai-minimax-m2, test_class: TestNightlyMinimaxM2Multi } |
There was a problem hiding this comment.
Move Minimax multi-worker benchmark off 4-GPU runners
In the multi-worker workflow matrix this new entry schedules TestNightlyMinimaxM2Multi on runs-on: 4-gpu-h100, but the generated test class uses workers(count=2) from e2e_test/benchmarks/test_nightly_perf.py and the new model spec sets tp=4 in e2e_test/infra/model_specs.py; start_workers allocates GPUs sequentially by tp, so the second worker is assigned GPUs 4-7 (see e2e_test/infra/worker.py), which exceeds a 4-GPU host and causes that matrix leg to fail consistently instead of producing benchmark data.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@e2e_test/infra/model_specs.py`:
- Around line 87-88: Update the model mapping key/value to use the canonical
HuggingFace identifier: replace the string "minimaxai/minimax-m2" with
"MiniMaxAI/MiniMax-M2" where the mapping entry currently calls
_resolve_model_path (the dict entry containing "minimaxai/minimax-m2" and the
call to _resolve_model_path should be updated to
_resolve_model_path("MiniMaxAI/MiniMax-M2")). Ensure both the dictionary key (if
used) and the argument passed to _resolve_model_path are corrected to the exact
capitalization shown.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a49a6bf2-407a-47e5-83e5-df0732ff5b56
📒 Files selected for processing (2)
.github/workflows/nightly-benchmark.ymle2e_test/infra/model_specs.py
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eeee7ac1b0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 23b3a5537f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ("Qwen/Qwen2.5-7B-Instruct", "Qwen7b", 4, ["http", "grpc"], {}), | ||
| ("Qwen/Qwen3-30B-A3B", "Qwen30b", 4, ["http", "grpc"], {}), | ||
| ("openai/gpt-oss-20b", "GptOss20b", 1, ["http", "grpc"], {}), | ||
| ("minimaxai/minimax-m2", "MinimaxM2", 2, ["http", "grpc"], {}), |
There was a problem hiding this comment.
Enable the MiniMax reasoning parser for nightly runs
MiniMax-M2 is an interleaved-thinking model, and this repo already has dedicated MiniMax reasoning support (crates/reasoning_parser/src/factory.rs maps minimax-m2, while reasoning E2Es only enable these models behind --reasoning-parser). The new nightly class still uses the generic gateway config from test_nightly_perf.py, so these benchmark jobs will return raw <think>…</think> text in message.content instead of separating it from the final answer. Since genai-bench measures token/latency on the API response, the Minimax artifacts will overcount hidden reasoning tokens and won’t be comparable to the rest of the nightly matrix.
Useful? React with 👍 / 👎.
• ## Description
Problem
minimaxai/minimax-m2needs to be benchmarked.Solution
Add
minimaxai/minimax-m2to the nightly benchmark path end to end: define its E2E model spec, register nightly single/multi benchmark test classes, and include it in the nightly benchmark workflow matrix.Changes
minimaxai/minimax-m2toe2e_test/infra/model_specs.py.tp=4and--trust-remote-codefor both worker and vLLM startup.("minimaxai/minimax-m2", "MinimaxM2", 2, ["http", "grpc"], {})to the nightly benchmark model list ine2e_test/benchmarks/test_nightly_perf.py.minimaxai/minimax-m2to the single-worker matrix in.github/workflows/nightly-benchmark.yml.Summary by CodeRabbit