-
Notifications
You must be signed in to change notification settings - Fork 1
Add nvidia-sdpa workaround for aarch64 #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| FROM nvcr.io/nvidia/pytorch:25.09-py3 | ||
|
|
||
| RUN pip install --upgrade pip && \ | ||
| pip install seaborn | ||
|
|
||
| RUN apt-get update && \ | ||
| wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/sbsa/cuda-keyring_1.1-1_all.deb && \ | ||
| dpkg -i cuda-keyring_1.1-1_all.deb && \ | ||
| apt-get update && \ | ||
| apt-get -y install cudnn9-cuda-13 | ||
|
Comment on lines
+6
to
+10
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To optimize the Docker image size, it's important to clean up temporary files and caches within the same |
||
|
|
||
| RUN pip uninstall -y cudnn | ||
|
|
||
| COPY benchmark_bf16_sdpa.py . | ||
|
|
||
| COPY benchmark_fp8_sdpa.py . | ||
|
|
||
| COPY benchmark_single_sdpa.py . | ||
|
Comment on lines
+14
to
+18
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| ENV LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu/:$LD_LIBRARY_PATH | ||
|
|
||
| WORKDIR /workspace | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| ## Scaled Dot Product Attention Benchmark | ||
|
|
||
| The upstream NVIDIA benchmark, which is part of the cudnn-frontend packages (found at https://github.com/NVIDIA/cudnn-frontend/tree/main/benchmark/sdpa_benchmark_training) is using x86_64 specific packages, which doesn't work on GB300 as Grace CPUs are arm (aarch64). | ||
|
|
||
| In this repository you'll find a simple fixed Dockerfile which can be used on Nvidia Grace based systems. | ||
|
|
||
| Steps: | ||
| 1. Clone the repository | ||
| - `git clone https://github.com/NVIDIA/cudnn-frontend` | ||
| 2. Replace the Dockerfile at `cudnn-frontend/benchmark/sdpa_benchmark_training/Dockerfile` with the one from this repo. | ||
| 3. Follow the instructions as normal after this | ||
| - `docker build -t cudnn_attention_benchmark .` | ||
| - `docker run -it --gpus all --rm -v $(pwd):/workspace cudnn_attention_benchmark` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To improve Docker image efficiency and reduce the number of layers, it's a good practice to combine
RUNinstructions. The twopip installcommands can be merged into one. Using--no-cache-dirwithpipprevents caching and further reduces the final image size.