Skip to content

Question about achievable XDNA2 NPU throughput with IRON #55

@yichao-yuan-99

Description

@yichao-yuan-99

Hi, I’m experimenting with IRON on an AI Max+ 395 and noticed that I’m only seeing ~700 GFLOP/s, like below, when running the example gemm pytest with 8 columns.
I am using the devel branch.

operators/gemm/test.py::test_gemm[iter0-gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0_0] 
Latency (us): 24679.3
Effective Bandwidth: 1.019712e+00 GB/s
Throughput: 6.961237e+02 GFLOP/s

PASSED

Here is the output of xrt-smi validate on my machine

Validate Device           : [0000:c6:00.1]
    Platform              : NPU Strix Halo
    Power Mode            : Default
-------------------------------------------------------------------------------
Test 1 [0000:c6:00.1]     : gemm                                                
    Details               : TOPS: 51.0
    Test Status           : [PASSED]
-------------------------------------------------------------------------------
Test 2 [0000:c6:00.1]     : latency                                             
    Details               : Average latency: 52.0 us
    Test Status           : [PASSED]
-------------------------------------------------------------------------------
Test 3 [0000:c6:00.1]     : throughput                                          
    Details               : Average throughput: 78771.0 op/s
    Test Status           : [PASSED]
-------------------------------------------------------------------------------

Based on the spec of AIE2p cores, which shows the throughput of BF16 is half of INT8, I am expecting a throughput that is close to the order of magnitude of 50/2=25TOPS.

0.7TOPS looks far away from my expectations, thus I wonder if this is due to some configuration problems, or wrong understanding, or something else?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions