-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 #27740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 #27740
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a tuned MoE kernel configuration for the Qwen3-Next model on A100 GPUs with TP=4. While the intent is to improve performance, a critical issue in the naming of the configuration file will prevent it from being used. The N=128 parameter in the filename implies an intermediate_size of 512, which is incorrect for a model of this size. This must be corrected for the performance benefits to be realized.
| @@ -0,0 +1,146 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filename E=512,N=128,device_name=NVIDIA_A100-SXM4-80GB.json is incorrect. The N=128 parameter is derived from intermediate_size / tp_size. With tp_size=4 as specified in the pull request, this implies intermediate_size = 128 * 4 = 512. This value is exceptionally small for a model of Qwen3-Next-80B-A3B-Instruct's scale.
An incorrect filename will prevent vLLM from loading this tuned configuration at runtime. The system will fall back to default, suboptimal settings, defeating the purpose of this contribution.
The moe_intermediate_size from the model's HuggingFace configuration must be used to generate the correct filename. A correct N value is calculated as moe_intermediate_size / tp_size. The file must be renamed with the correct N value to be used by the system.
|
Could you tune using the latest main branch? On one hand, the tuned config will include Triton version information, and on the other hand, the tuning results might be different with the latest Triton version |
|
OK. I'll rerun it with vLLM==0.11.0 when my machine is less busy. |
--- vllm 0.10.2 `uv run benchmark_moe.py --model Qwen/Qwen3-Next-80B-A3B-Instruct -tp 4 --tune --trust-remote-code` --- Signed-off-by: tou <57480529+toulzx@users.noreply.github.com>
e4ad3eb to
7572b7a
Compare
--- vllm 0.11.0 `python3 benchmark_moe.py --model Qwen/Qwen3-Next-80B-A3B-Instruct -tp 8 --dtype auto --tune --trust-remote-code` --- Signed-off-by: tou <57480529+toulzx@users.noreply.github.com>
|
I have added I failed to run script with |
--- vllm 0.11.0 `python3 benchmark_moe.py --model Qwen/Qwen3-Next-80B-A3B-Instruct -tp 4 --dtype auto --tune --trust-remote-code` --- Signed-off-by: tou <57480529+toulzx@users.noreply.github.com>
|
@jeejeelee qwen3-next tuned moe configs PR for A100 with vllm==0.11.0 is ready now :) |
run with: