Skip to content

feat(ci): add minimaxai/minimax-m2 to nightly benchmark#795

Open
smfirmin wants to merge 8 commits intolightseekorg:mainfrom
smfirmin:sfirmin/add-minimax-to-nightly-bench
Open

feat(ci): add minimaxai/minimax-m2 to nightly benchmark#795
smfirmin wants to merge 8 commits intolightseekorg:mainfrom
smfirmin:sfirmin/add-minimax-to-nightly-bench

Conversation

@smfirmin
Copy link

@smfirmin smfirmin commented Mar 17, 2026

• ## Description

Problem

minimaxai/minimax-m2 needs to be benchmarked.

Solution

Add minimaxai/minimax-m2 to the nightly benchmark path end to end: define its E2E model spec, register nightly single/multi benchmark test classes, and include it in the nightly benchmark workflow matrix.

Changes

  • Added minimaxai/minimax-m2 to e2e_test/infra/model_specs.py.
  • Configured the model with tp=4 and --trust-remote-code for both worker and vLLM startup.
  • Added ("minimaxai/minimax-m2", "MinimaxM2", 2, ["http", "grpc"], {}) to the nightly benchmark model list in e2e_test/benchmarks/test_nightly_perf.py.
  • Added minimaxai/minimax-m2 to the single-worker matrix in .github/workflows/nightly-benchmark.yml.

Summary by CodeRabbit

  • Tests
    • Nightly benchmark testing now includes the minimax-m2 model. It is exercised in single-worker GPU and multi-worker configurations, using both HTTP and gRPC backends. The model is validated for chat, streaming, function-calling, and reasoning capabilities and is configured for multi-worker parallelism and runtime optimizations.

Signed-off-by: Sydney Firmin <sydney.firmin@oracle.com>
@github-actions github-actions bot added ci CI/CD configuration changes tests Test changes labels Mar 17, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 17, 2026

📝 Walkthrough

Walkthrough

Adds the minimaxai/minimax-m2 model to nightly benchmark configuration, test matrix, and model specifications (including tp, features, and runtime args).

Changes

Cohort / File(s) Summary
CI workflow
.github/workflows/nightly-benchmark.yml
Added minimax-m2 to the nightly benchmark matrix with slug and test_class entries.
Nightly benchmark tests
e2e_test/benchmarks/test_nightly_perf.py
Extended _NIGHTLY_MODELS to include minimaxai/minimax-m2 with a 2-worker setup testing both http and grpc backends.
Model specifications
e2e_test/infra/model_specs.py
Added minimaxai/minimax-m2 MODEL_SPECS entry: resolved model path, tp=4, features ["chat","streaming","function_calling","reasoning"], and worker_args/vllm_args including --trust-remote-code.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • key4ng
  • CatherineSue
  • slin1237

Poem

🐰 Hopping through CI at night,
A new model joins the flight,
Minimax M2, brave and spry,
Tests and specs now side by side,
Pipelines hum — a carrot-sweet delight! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title 'feat(ci): add minimaxai/minimax-m2 to nightly benchmark' accurately and concisely describes the main change: adding a new model to the nightly benchmark infrastructure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the minimaxai/minimax-m2 model into the existing nightly benchmarking system. This ensures that the performance and stability of this new model are regularly monitored, providing crucial data for its ongoing development and deployment. The changes involve defining the model's operational parameters and registering it within the automated testing framework.

Highlights

  • Model Specification: Defined the minimaxai/minimax-m2 model's end-to-end specification, including tp=4 and --trust-remote-code for both worker and vLLM startup configurations.
  • Nightly Benchmark Inclusion: Added minimaxai/minimax-m2 to the list of models designated for nightly performance benchmarks.
  • Workflow Integration: Included minimaxai/minimax-m2 in the single-worker matrix of the nightly benchmark workflow, ensuring its regular evaluation.
Changelog
  • e2e_test/benchmarks/test_nightly_perf.py
    • Added ("minimaxai/minimax-m2", "MinimaxM2", 2, ["http", "grpc"], {}) to the NIGHTLY_MODELS list.
  • e2e_test/infra/model_specs.py
    • Introduced a new entry for minimaxai/minimax-m2 with its model path, tp=4, features=["chat", "streaming"], and worker_args/vllm_args set to ["--trust-remote-code"].
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/nightly-benchmark.yml
Activity
  • No review comments or other human activity have been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds the minimaxai/minimax-m2 model to the nightly benchmark suite. The changes correctly configure the model specification and register it for benchmarking. My review includes one suggestion to update the model's feature list in e2e_test/infra/model_specs.py to more accurately reflect its capabilities, ensuring it's included in all relevant tests. This comment aligns with general best practices and does not contradict any specific rules.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/nightly-benchmark.yml:
- Line 124: Remove the spaces immediately inside the curly braces for the YAML
list item containing the minimaxai-minimax-m2 entry; update the line "- { id:
minimaxai/minimax-m2, slug: minimaxai-minimax-m2, test_class:
TestNightlyMinimaxM2Single }" to remove the space after "{" and before "}" (so
the entry becomes "- {id: minimaxai/minimax-m2, slug: minimaxai-minimax-m2,
test_class: TestNightlyMinimaxM2Single}") to satisfy the YAMLlint `braces` rule
while keeping the same keys and values (identifiers: minimaxai/minimax-m2, slug
minimaxai-minimax-m2, test_class TestNightlyMinimaxM2Single).
- Line 124: The multi-worker matrix is missing the generated
TestNightlyMinimaxM2Multi entry, so add an entry with the same repo identifiers
but test_class: TestNightlyMinimaxM2Multi to the multi-worker matrix;
specifically add an item matching id: minimaxai/minimax-m2, slug:
minimaxai-minimax-m2, test_class: TestNightlyMinimaxM2Multi so the multi-worker
nightly path runs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 74e1960f-0ea7-4ee7-956d-df40b7f55e82

📥 Commits

Reviewing files that changed from the base of the PR and between 976f1a6 and dfa8ad1.

📒 Files selected for processing (3)
  • .github/workflows/nightly-benchmark.yml
  • e2e_test/benchmarks/test_nightly_perf.py
  • e2e_test/infra/model_specs.py

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8941f815f2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: Sydney Firmin <sydney.firmin@oracle.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 48fd9d607e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

- { id: Qwen/Qwen2.5-7B-Instruct, slug: Qwen-Qwen2.5-7B-Instruct, test_class: TestNightlyQwen7bMulti }
- { id: Qwen/Qwen3-30B-A3B, slug: Qwen-Qwen3-30B-A3B, test_class: TestNightlyQwen30bMulti }
- { id: openai/gpt-oss-20b, slug: openai-gpt-oss-20b, test_class: TestNightlyGptOss20bMulti }
- { id: minimaxai/minimax-m2, slug: minimaxai-minimax-m2, test_class: TestNightlyMinimaxM2Multi }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Move Minimax multi-worker benchmark off 4-GPU runners

In the multi-worker workflow matrix this new entry schedules TestNightlyMinimaxM2Multi on runs-on: 4-gpu-h100, but the generated test class uses workers(count=2) from e2e_test/benchmarks/test_nightly_perf.py and the new model spec sets tp=4 in e2e_test/infra/model_specs.py; start_workers allocates GPUs sequentially by tp, so the second worker is assigned GPUs 4-7 (see e2e_test/infra/worker.py), which exceeds a 4-GPU host and causes that matrix leg to fail consistently instead of producing benchmark data.

Useful? React with 👍 / 👎.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@e2e_test/infra/model_specs.py`:
- Around line 87-88: Update the model mapping key/value to use the canonical
HuggingFace identifier: replace the string "minimaxai/minimax-m2" with
"MiniMaxAI/MiniMax-M2" where the mapping entry currently calls
_resolve_model_path (the dict entry containing "minimaxai/minimax-m2" and the
call to _resolve_model_path should be updated to
_resolve_model_path("MiniMaxAI/MiniMax-M2")). Ensure both the dictionary key (if
used) and the argument passed to _resolve_model_path are corrected to the exact
capitalization shown.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a49a6bf2-407a-47e5-83e5-df0732ff5b56

📥 Commits

Reviewing files that changed from the base of the PR and between dfa8ad1 and 48fd9d6.

📒 Files selected for processing (2)
  • .github/workflows/nightly-benchmark.yml
  • e2e_test/infra/model_specs.py

@smfirmin smfirmin changed the title add minimaxai/minimax-m2 to nightly benchmark (feat)add minimaxai/minimax-m2 to nightly benchmark Mar 18, 2026
@smfirmin smfirmin changed the title (feat)add minimaxai/minimax-m2 to nightly benchmark feat(ci) add minimaxai/minimax-m2 to nightly benchmark Mar 18, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eeee7ac1b0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@smfirmin smfirmin changed the title feat(ci) add minimaxai/minimax-m2 to nightly benchmark feat(ci): add minimaxai/minimax-m2 to nightly benchmark Mar 18, 2026
@key4ng key4ng self-assigned this Mar 18, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 23b3a5537f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

("Qwen/Qwen2.5-7B-Instruct", "Qwen7b", 4, ["http", "grpc"], {}),
("Qwen/Qwen3-30B-A3B", "Qwen30b", 4, ["http", "grpc"], {}),
("openai/gpt-oss-20b", "GptOss20b", 1, ["http", "grpc"], {}),
("minimaxai/minimax-m2", "MinimaxM2", 2, ["http", "grpc"], {}),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enable the MiniMax reasoning parser for nightly runs

MiniMax-M2 is an interleaved-thinking model, and this repo already has dedicated MiniMax reasoning support (crates/reasoning_parser/src/factory.rs maps minimax-m2, while reasoning E2Es only enable these models behind --reasoning-parser). The new nightly class still uses the generic gateway config from test_nightly_perf.py, so these benchmark jobs will return raw <think>…</think> text in message.content instead of separating it from the final answer. Since genai-bench measures token/latency on the API response, the Minimax artifacts will overcount hidden reasoning tokens and won’t be comparable to the rest of the nightly matrix.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci CI/CD configuration changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants