Question about achievable XDNA2 NPU throughput with IRON

Hi, I’m experimenting with IRON on an AI Max+ 395 and noticed that I’m only seeing ~700 GFLOP/s, like below, when running the example gemm `pytest` with 8 columns. 
I am using the `devel` branch.
```
operators/gemm/test.py::test_gemm[iter0-gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0_0] 
Latency (us): 24679.3
Effective Bandwidth: 1.019712e+00 GB/s
Throughput: 6.961237e+02 GFLOP/s

PASSED
```
Here is the output of `xrt-smi validate` on my machine
```
Validate Device           : [0000:c6:00.1]
    Platform              : NPU Strix Halo
    Power Mode            : Default
-------------------------------------------------------------------------------
Test 1 [0000:c6:00.1]     : gemm                                                
    Details               : TOPS: 51.0
    Test Status           : [PASSED]
-------------------------------------------------------------------------------
Test 2 [0000:c6:00.1]     : latency                                             
    Details               : Average latency: 52.0 us
    Test Status           : [PASSED]
-------------------------------------------------------------------------------
Test 3 [0000:c6:00.1]     : throughput                                          
    Details               : Average throughput: 78771.0 op/s
    Test Status           : [PASSED]
-------------------------------------------------------------------------------
```
Based on the [spec](https://www.amd.com/en/products/adaptive-socs-and-fpgas/technologies/ai-engine.html) of AIE2p cores, which shows the throughput of BF16 is half of INT8, I am expecting a throughput that is close to the order of magnitude of 50/2=25TOPS.


0.7TOPS looks far away from my expectations, thus I wonder if this is due to some configuration problems, or wrong understanding, or something else?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about achievable XDNA2 NPU throughput with IRON #55

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about achievable XDNA2 NPU throughput with IRON #55

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions