Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions .github/workflows/accuracy_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,9 @@ jobs:
model_name: Qwen3-8B
- runner: a2-1
model_name: Qwen2.5-VL-7B-Instruct
- runner: a2-1
model_name: Qwen2-Audio-7B-Instruct
# To do: This model has a bug that needs to be fixed and readded
# - runner: a2-1
# model_name: Qwen2-Audio-7B-Instruct
- runner: a2-2
model_name: Qwen3-30B-A3B
- runner: a2-2
Expand All @@ -61,6 +62,12 @@ jobs:
model_name: Qwen3-Next-80B-A3B-Instruct
- runner: a2-1
model_name: Qwen3-8B-W8A8
- runner: a2-1
model_name: Qwen3-VL-8B-Instruct
- runner: a2-1
model_name: Qwen2.5-Omni-7B
- runner: a2-1
model_name: Meta-Llama-3.1-8B-Instruct
fail-fast: false
# test will be triggered when tag 'accuracy-test' & 'ready-for-test'
if: >-
Expand Down
1 change: 0 additions & 1 deletion tests/e2e/models/configs/DeepSeek-V2-Lite.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
model_name: "deepseek-ai/DeepSeek-V2-Lite"
runner: "linux-aarch64-a2-2"
hardware: "Atlas A2 Series"
tasks:
- name: "gsm8k"
Expand Down
11 changes: 11 additions & 0 deletions tests/e2e/models/configs/Meta-Llama-3.1-8B-Instruct.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
model_name: "LLM-Research/Meta-Llama-3.1-8B-Instruct"
hardware: "Atlas A2 Series"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed some of the yaml specified the hardware, and some didn't. I think this is no need to specify? cc @zhangxinyuehfad plz also take a look

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unified modifications have been completed.

tasks:
- name: "gsm8k"
metrics:
- name: "exact_match,strict-match"
value: 0.82
- name: "exact_match,flexible-extract"
value: 0.84

num_fewshot: 5
10 changes: 10 additions & 0 deletions tests/e2e/models/configs/Qwen2.5-Omni-7B.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
model_name: "Qwen/Qwen2.5-Omni-7B"
hardware: "Atlas A2 Series"
model: "vllm-vlm"
tasks:
- name: "mmmu_val"
metrics:
- name: "acc,none"
value: 0.52
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MMMU is an extremely challenging benchmark for multidisciplinary and multimodal reasoning, and this test value falls within a reasonable range.

max_model_len: 8192
gpu_memory_utilization: 0.7
3 changes: 1 addition & 2 deletions tests/e2e/models/configs/Qwen2.5-VL-7B-Instruct.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
model_name: "Qwen/Qwen2.5-VL-7B-Instruct"
runner: "linux-aarch64-a2-1"
hardware: "Atlas A2 Series"
model: "vllm-vlm"
tasks:
- name: "mmmu_val"
metrics:
- name: "acc,none"
value: 0.51
max_model_len: 8192
max_model_len: 8192
3 changes: 1 addition & 2 deletions tests/e2e/models/configs/Qwen3-30B-A3B.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
model_name: "Qwen/Qwen3-30B-A3B"
runner: "linux-aarch64-a2-2"
hardware: "Atlas A2 Series"
tasks:
- name: "gsm8k"
Expand All @@ -17,4 +16,4 @@ gpu_memory_utilization: 0.6
enable_expert_parallel: True
tensor_parallel_size: 2
apply_chat_template: False
fewshot_as_multiturn: False
fewshot_as_multiturn: False
1 change: 0 additions & 1 deletion tests/e2e/models/configs/Qwen3-8B-Base.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
model_name: "Qwen/Qwen3-8B-Base"
runner: "linux-aarch64-a2-1"
hardware: "Atlas A2 Series"
tasks:
- name: "gsm8k"
Expand Down
11 changes: 11 additions & 0 deletions tests/e2e/models/configs/Qwen3-VL-8B-Instruct.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
model_name: "Qwen/Qwen3-VL-8B-Instruct"
hardware: "Atlas A2 Series"
model: "vllm-vlm"
tasks:
- name: "mmmu_val"
metrics:
- name: "acc,none"
value: 0.55
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

max_model_len: 8192
batch_size: 32
gpu_memory_utilization: 0.7
3 changes: 3 additions & 0 deletions tests/e2e/models/configs/accuracy.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,6 @@ Qwen2-7B.yaml
Qwen2-VL-7B-Instruct.yaml
Qwen2-Audio-7B-Instruct.yaml
Qwen3-VL-30B-A3B-Instruct.yaml
Qwen3-VL-8B-Instruct.yaml
Qwen2.5-Omni-7B.yaml
Meta-Llama-3.1-8B-Instruct.yaml
Loading