Skip to content

Commit a69b4ff

Browse files
MrZ20hwhaokun
authored andcommitted
[Test]Add accuracy test for multiple models (vllm-project#3823)
### What this PR does / why we need it? Add accuracy test for multiple models: - Meta_Llama_3.1_8B_Instruct - Qwen2.5-Omni-7B - Qwen3-VL-8B-Instruct - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b --------- Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: hwhaokun <haokun0405@163.com>
1 parent 7cfe0a6 commit a69b4ff

File tree

9 files changed

+46
-8
lines changed

9 files changed

+46
-8
lines changed

.github/workflows/accuracy_test.yaml

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,9 @@ jobs:
4949
model_name: Qwen3-8B
5050
- runner: a2-1
5151
model_name: Qwen2.5-VL-7B-Instruct
52-
- runner: a2-1
53-
model_name: Qwen2-Audio-7B-Instruct
52+
# To do: This model has a bug that needs to be fixed and readded
53+
# - runner: a2-1
54+
# model_name: Qwen2-Audio-7B-Instruct
5455
- runner: a2-2
5556
model_name: Qwen3-30B-A3B
5657
- runner: a2-2
@@ -61,6 +62,12 @@ jobs:
6162
model_name: Qwen3-Next-80B-A3B-Instruct
6263
- runner: a2-1
6364
model_name: Qwen3-8B-W8A8
65+
- runner: a2-1
66+
model_name: Qwen3-VL-8B-Instruct
67+
- runner: a2-1
68+
model_name: Qwen2.5-Omni-7B
69+
- runner: a2-1
70+
model_name: Meta-Llama-3.1-8B-Instruct
6471
fail-fast: false
6572
# test will be triggered when tag 'accuracy-test' & 'ready-for-test'
6673
if: >-

tests/e2e/models/configs/DeepSeek-V2-Lite.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
model_name: "deepseek-ai/DeepSeek-V2-Lite"
2-
runner: "linux-aarch64-a2-2"
32
hardware: "Atlas A2 Series"
43
tasks:
54
- name: "gsm8k"
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
model_name: "LLM-Research/Meta-Llama-3.1-8B-Instruct"
2+
hardware: "Atlas A2 Series"
3+
tasks:
4+
- name: "gsm8k"
5+
metrics:
6+
- name: "exact_match,strict-match"
7+
value: 0.82
8+
- name: "exact_match,flexible-extract"
9+
value: 0.84
10+
11+
num_fewshot: 5
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
model_name: "Qwen/Qwen2.5-Omni-7B"
2+
hardware: "Atlas A2 Series"
3+
model: "vllm-vlm"
4+
tasks:
5+
- name: "mmmu_val"
6+
metrics:
7+
- name: "acc,none"
8+
value: 0.52
9+
max_model_len: 8192
10+
gpu_memory_utilization: 0.7
Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
11
model_name: "Qwen/Qwen2.5-VL-7B-Instruct"
2-
runner: "linux-aarch64-a2-1"
32
hardware: "Atlas A2 Series"
43
model: "vllm-vlm"
54
tasks:
65
- name: "mmmu_val"
76
metrics:
87
- name: "acc,none"
98
value: 0.51
10-
max_model_len: 8192
9+
max_model_len: 8192

tests/e2e/models/configs/Qwen3-30B-A3B.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
model_name: "Qwen/Qwen3-30B-A3B"
2-
runner: "linux-aarch64-a2-2"
32
hardware: "Atlas A2 Series"
43
tasks:
54
- name: "gsm8k"
@@ -17,4 +16,4 @@ gpu_memory_utilization: 0.6
1716
enable_expert_parallel: True
1817
tensor_parallel_size: 2
1918
apply_chat_template: False
20-
fewshot_as_multiturn: False
19+
fewshot_as_multiturn: False

tests/e2e/models/configs/Qwen3-8B-Base.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
model_name: "Qwen/Qwen3-8B-Base"
2-
runner: "linux-aarch64-a2-1"
32
hardware: "Atlas A2 Series"
43
tasks:
54
- name: "gsm8k"
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
model_name: "Qwen/Qwen3-VL-8B-Instruct"
2+
hardware: "Atlas A2 Series"
3+
model: "vllm-vlm"
4+
tasks:
5+
- name: "mmmu_val"
6+
metrics:
7+
- name: "acc,none"
8+
value: 0.55
9+
max_model_len: 8192
10+
batch_size: 32
11+
gpu_memory_utilization: 0.7

tests/e2e/models/configs/accuracy.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,6 @@ Qwen2-7B.yaml
66
Qwen2-VL-7B-Instruct.yaml
77
Qwen2-Audio-7B-Instruct.yaml
88
Qwen3-VL-30B-A3B-Instruct.yaml
9+
Qwen3-VL-8B-Instruct.yaml
10+
Qwen2.5-Omni-7B.yaml
11+
Meta-Llama-3.1-8B-Instruct.yaml

0 commit comments

Comments
 (0)