Skip to content

Conversation

@MrZ20
Copy link
Contributor

@MrZ20 MrZ20 commented Oct 28, 2025

What this PR does / why we need it?

Add accuracy test for multiple models:

  • Meta_Llama_3.1_8B_Instruct
  • Qwen2.5-Omni-7B
  • Qwen3-VL-8B-Instruct

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds accuracy test configurations for seven new models. The configurations are mostly well-defined. However, I have identified significant security concerns in two of the new model configurations (Mistral-7B-Instruct-v0.1.yaml and Phi-4-mini-instruct.yaml). Both use re-uploaded models from third-party sources (AI-ModelScope and LLM-Research) while also enabling trust_remote_code: True. This practice poses a security risk by allowing the execution of arbitrary code from un-vetted repositories. My review comments highlight these issues and recommend using official model sources to mitigate the risks. Please address these high-severity security concerns.

Comment on lines 1 to 15
model_name: "AI-ModelScope/Mistral-7B-Instruct-v0.1"
runner: "linux-aarch64-a2-1"
hardware: "Atlas A2 Series"
tasks:
- name: "gsm8k"
metrics:
- name: "exact_match,strict-match"
value: 0.35
- name: "exact_match,flexible-extract"
value: 0.38
trust_remote_code: True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The configuration for Mistral-7B-Instruct-v0.1 uses a model from AI-ModelScope and sets trust_remote_code: True. The original mistralai/Mistral-7B-Instruct-v0.1 model does not require remote code execution. Using a third-party copy of the model with trust_remote_code enabled introduces a security risk, as it allows arbitrary code from the model repository to be executed.

I suggest using the official model and removing trust_remote_code. Please note that you may need to re-evaluate and update the expected accuracy values after this change.

model_name: "mistralai/Mistral-7B-Instruct-v0.1"
runner: "linux-aarch64-a2-1"
hardware: "Atlas A2 Series"
tasks:
- name: "gsm8k"
  metrics:
  - name: "exact_match,strict-match"
    value: 0.35
  - name: "exact_match,flexible-extract"
    value: 0.38

Comment on lines 1 to 14
model_name: "LLM-Research/Phi-4-mini-instruct"
runner: "linux-aarch64-a2-1"
hardware: "Atlas A2 Series"
tasks:
- name: "gsm8k"
metrics:
- name: "exact_match,strict-match"
value: 0.81
- name: "exact_match,flexible-extract"
value: 0.81
trust_remote_code: True
num_fewshot: 5
batch_size: 32
gpu_memory_utilization: 0.8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This configuration uses a re-uploaded model LLM-Research/Phi-4-mini-instruct and sets trust_remote_code: True. While official Microsoft Phi models often require trust_remote_code, using a third-party repository introduces a security risk from executing un-vetted code. It is highly recommended to use the official model from the microsoft organization on Hugging Face (e.g., microsoft/Phi-3-mini-4k-instruct or the correct official name for Phi-4 if available) to ensure code integrity and security. If this specific re-upload is necessary, please document the reason.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@vllm-ascend-ci vllm-ascend-ci added accuracy-test enable all accuracy test for PR ready-for-test start test by label for PR labels Oct 28, 2025
@MrZ20 MrZ20 force-pushed the n_model_acc_test branch 2 times, most recently from 9936499 to a6f8422 Compare October 30, 2025 07:18
model_name: Qwen3-VL-30B-A3B-Instruct
# This model has a bug that needs to be fixed and re added
# - runner: a2-2
# model_name: Qwen3-VL-30B-A3B-Instruct
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this test?

num_fewshot: 5
tensor_parallel_size: 2
batch_size: 16
gpu_memory_utilization: 0.6
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: why setting gpu_memory_utilization to 0.6 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are issues with the accuracy testing of this model, and it has been cancelled.

@@ -0,0 +1,11 @@
model_name: "LLM-Research/Meta-Llama-3.1-8B-Instruct"
hardware: "Atlas A2 Series"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed some of the yaml specified the hardware, and some didn't. I think this is no need to specify? cc @zhangxinyuehfad plz also take a look

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unified modifications have been completed.

- name: "gsm8k"
metrics:
- name: "exact_match,strict-match"
value: 0.35
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The accuracy indecate that there is some accuracy issues with this model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are issues with the accuracy testing of this model, and it has been cancelled.

- name: "mmmu_val"
metrics:
- name: "acc,none"
value: 0.52
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MMMU is an extremely challenging benchmark for multidisciplinary and multimodal reasoning, and this test value falls within a reasonable range.

- name: "mmmu_val"
metrics:
- name: "acc,none"
value: 0.55
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@MrZ20 MrZ20 force-pushed the n_model_acc_test branch 2 times, most recently from 435f1ab to 817603c Compare November 3, 2025 09:22
model_name: Qwen3-VL-30B-A3B-Instruct
# To do: This model has a bug that needs to be fixed and readded
# - runner: a2-2
# model_name: Qwen3-VL-30B-A3B-Instruct
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rebase your code and revert this change after #3897 is merged

@github-actions
Copy link

github-actions bot commented Nov 4, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

MrZ20 added 4 commits November 4, 2025 10:21
Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: MrZ20 <2609716663@qq.com>
@MrZ20 MrZ20 force-pushed the n_model_acc_test branch 2 times, most recently from 4eb667a to c683981 Compare November 4, 2025 02:31
Signed-off-by: MrZ20 <2609716663@qq.com>
Copy link
Collaborator

@MengqingCao MengqingCao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks for your work!

@MengqingCao
Copy link
Collaborator

plz also take a look and help merge @wangxiyuan

@wangxiyuan wangxiyuan merged commit dc1a6cb into vllm-project:main Nov 4, 2025
30 checks passed
Pz1116 pushed a commit to Pz1116/vllm-ascend that referenced this pull request Nov 5, 2025
### What this PR does / why we need it?
Add accuracy test for multiple models:
- Meta_Llama_3.1_8B_Instruct
- Qwen2.5-Omni-7B
- Qwen3-VL-8B-Instruct

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: Pz1116 <zpbzpb123123@gmail.com>
@MrZ20 MrZ20 deleted the n_model_acc_test branch November 18, 2025 08:23
luolun pushed a commit to luolun/vllm-ascend that referenced this pull request Nov 19, 2025
### What this PR does / why we need it?
Add accuracy test for multiple models:
- Meta_Llama_3.1_8B_Instruct
- Qwen2.5-Omni-7B
- Qwen3-VL-8B-Instruct

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
hwhaokun pushed a commit to hwhaokun/vllm-ascend that referenced this pull request Nov 19, 2025
### What this PR does / why we need it?
Add accuracy test for multiple models:
- Meta_Llama_3.1_8B_Instruct
- Qwen2.5-Omni-7B
- Qwen3-VL-8B-Instruct

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: hwhaokun <haokun0405@163.com>
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
### What this PR does / why we need it?
Add accuracy test for multiple models:
- Meta_Llama_3.1_8B_Instruct
- Qwen2.5-Omni-7B
- Qwen3-VL-8B-Instruct

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: nsdie <yeyifan@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

accuracy-test enable all accuracy test for PR module:tests ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants