[Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 #27740

toulzx · 2025-10-29T11:02:16Z

run with:

# vllm==0.10.2
# benchmark_moe.py refer to https://github.com/vllm-project/vllm/blob/v0.10.2/benchmarks/kernels/benchmark_moe.py
uv run benchmark_moe.py --model Qwen/Qwen3-Next-80B-A3B-Instruct -tp 4  --tune  --trust-remote-code

github-actions · 2025-10-29T11:02:24Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request introduces a tuned MoE kernel configuration for the Qwen3-Next model on A100 GPUs with TP=4. While the intent is to improve performance, a critical issue in the naming of the configuration file will prevent it from being used. The N=128 parameter in the filename implies an intermediate_size of 512, which is incorrect for a model of this size. This must be corrected for the performance benefits to be realized.

gemini-code-assist · 2025-10-29T11:04:30Z

vllm/model_executor/layers/fused_moe/configs/E=512,N=128,device_name=NVIDIA_A100-SXM4-80GB.json

@@ -0,0 +1,146 @@
+{


The filename E=512,N=128,device_name=NVIDIA_A100-SXM4-80GB.json is incorrect. The N=128 parameter is derived from intermediate_size / tp_size. With tp_size=4 as specified in the pull request, this implies intermediate_size = 128 * 4 = 512. This value is exceptionally small for a model of Qwen3-Next-80B-A3B-Instruct's scale.

An incorrect filename will prevent vLLM from loading this tuned configuration at runtime. The system will fall back to default, suboptimal settings, defeating the purpose of this contribution.

The moe_intermediate_size from the model's HuggingFace configuration must be used to generate the correct filename. A correct N value is calculated as moe_intermediate_size / tp_size. The file must be renamed with the correct N value to be used by the system.

jeejeelee · 2025-10-29T15:01:30Z

Could you tune using the latest main branch? On one hand, the tuned config will include Triton version information, and on the other hand, the tuning results might be different with the latest Triton version

toulzx · 2025-10-30T04:09:43Z

OK. I'll rerun it with vLLM==0.11.0 when my machine is less busy.

--- vllm 0.10.2 `uv run benchmark_moe.py --model Qwen/Qwen3-Next-80B-A3B-Instruct -tp 4 --tune --trust-remote-code` --- Signed-off-by: tou <57480529+toulzx@users.noreply.github.com>

--- vllm 0.11.0 `python3 benchmark_moe.py --model Qwen/Qwen3-Next-80B-A3B-Instruct -tp 8 --dtype auto --tune --trust-remote-code` --- Signed-off-by: tou <57480529+toulzx@users.noreply.github.com>

toulzx · 2025-10-31T03:27:44Z

I have added A100-SXM4-80GB TP4 TP8 for Qwen3-Next-80B-A3B-Instruct.

I failed to run script with H100_80GB_HBM3 TP8 for Qwen3-Next-80B-A3B-Instruct and H100_80GB_HBM3 TP4 for Qwen3-Next-80B-A3B-Instruct-FP8 due to an unexpected stuck while started a local ray instance.

--- vllm 0.11.0 `python3 benchmark_moe.py --model Qwen/Qwen3-Next-80B-A3B-Instruct -tp 4 --dtype auto --tune --trust-remote-code` --- Signed-off-by: tou <57480529+toulzx@users.noreply.github.com>

toulzx · 2025-10-31T07:57:35Z

@jeejeelee qwen3-next tuned moe configs PR for A100 with vllm==0.11.0 is ready now :)

toulzx requested review from mgoin and pavanimajety as code owners October 29, 2025 11:02

mergify bot added the qwen Related to Qwen models label Oct 29, 2025

gemini-code-assist bot reviewed Oct 29, 2025

View reviewed changes

Qwen3-Next-80B-A3B-Instruct's MOE configs for A100-SXM4-80GB TP4

7572b7a

--- vllm 0.10.2 `uv run benchmark_moe.py --model Qwen/Qwen3-Next-80B-A3B-Instruct -tp 4 --tune --trust-remote-code` --- Signed-off-by: tou <57480529+toulzx@users.noreply.github.com>

toulzx force-pushed the qwen3-next_moe-config_a100-sxm4-80gb branch from e4ad3eb to 7572b7a Compare October 30, 2025 08:44

Add Qwen3-Next-80B-A3B-Instruct's MOE configs for A100-SXM4-80GB TP8

7cd0fc7

--- vllm 0.11.0 `python3 benchmark_moe.py --model Qwen/Qwen3-Next-80B-A3B-Instruct -tp 8 --dtype auto --tune --trust-remote-code` --- Signed-off-by: tou <57480529+toulzx@users.noreply.github.com>

Update Qwen3-Next-80B-A3B-Instruct's MOE configs for A100-SXM4-80GB TP4

35991f0

--- vllm 0.11.0 `python3 benchmark_moe.py --model Qwen/Qwen3-Next-80B-A3B-Instruct -tp 4 --dtype auto --tune --trust-remote-code` --- Signed-off-by: tou <57480529+toulzx@users.noreply.github.com>

toulzx changed the title ~~[Qwen3-Next] MOE config for A100-SXM4-80GB TP4~~ [Qwen3-Next] MOE config for A100-SXM4-80GB TP4 TP8 Oct 31, 2025

toulzx changed the title ~~[Qwen3-Next] MOE config for A100-SXM4-80GB TP4 TP8~~ [Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 Oct 31, 2025

jeejeelee approved these changes Oct 31, 2025

View reviewed changes

jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 #27740

[Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 #27740

toulzx commented Oct 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 29, 2025

Uh oh!

jeejeelee commented Oct 29, 2025

Uh oh!

toulzx commented Oct 30, 2025

Uh oh!

toulzx commented Oct 31, 2025 •

edited

Loading

Uh oh!

toulzx commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 #27740

Are you sure you want to change the base?

[Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 #27740

Conversation

toulzx commented Oct 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee commented Oct 29, 2025

Uh oh!

toulzx commented Oct 30, 2025

Uh oh!

toulzx commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

toulzx commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

toulzx commented Oct 29, 2025 •

edited by github-actions bot

Loading

toulzx commented Oct 31, 2025 •

edited

Loading