Description:
After upgrading the NVIDIA driver from 555 to 570 on an Azure Standard NC96ads_A100_v4 Linux VM (4 × NVIDIA A100), my cuQuantum-based workload (qsimcirq) fails at startup with the following error:
RuntimeError: Peer-to-peer device memory access is not supported
This regression only appears after moving to driver 570 — the same workload ran fine under 555.
Repro details:
Environment: Ubuntu VM on Azure
VM SKU: Standard NC96ads_A100_v4 (4 × NVIDIA A100)
Driver: upgraded from 555 → 570
cuQuantum: cuQuantum Appliance 25.06 (FROM nvcr.io/nvidia/cuquantum-appliance:25.06-x86_64)
Library: qsimcirq
Trace snippet:
RuntimeError: Peer-to-peer device memory access is not supported
File "qsimcirq/qsim_simulator.py", line 262, in __init__
qsim_mgpu.qsim_initialize_devices(gpu_mode)
Topology info:
nvidia-smi topo -m
GPU0 GPU1 GPU2 GPU3 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV12 SYS SYS 0-23 0 N/A
GPU1 NV12 X SYS SYS 24-47 1 N/A
GPU2 SYS SYS X NV12 48-71 2 N/A
GPU3 SYS SYS NV12 X 72-95 3 N/A
nvidia-smi topo -p2p w
GPU0 GPU1 GPU2 GPU3
GPU0 X OK NS NS
GPU1 OK X NS NS
GPU2 NS NS X OK
GPU3 NS NS OK X
Question:
Is there a way to work around this issue without downgrading the NVIDIA driver? It looks like with driver 570, P2P over PCIe is no longer enabled when there is no NVLINK connection between GPU pairs. As a result, cuQuantum (via qsimcirq) fails to initialize because it assumes P2P support across all devices.