-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
WSL2 2.7.0 enables CUDA graph capture on RTX 5090 (Blackwell sm_120) — community finding
Just wanted to document a positive finding for the WSL2 + Blackwell community.
Short version: CUDA graph capture (previously crashing with cudaErrorUnknown on RTX 5090 + WSL2) now works correctly on WSL2 2.7.0 with the right system configuration. This unlocks full vLLM performance on Blackwell under WSL2.
Hardware: RTX 5090 32GB (sm_120, Blackwell), Windows 11, WSL2 2.7.0, CUDA 12.8
What was crashing: Any attempt to run vLLM (or other CUDA graph-capturing workloads) would fail with cudaErrorUnknown during graph capture. The workaround was --enforce-eager mode which disables CUDA graphs and results in ~8x throughput loss.
What fixed it (beyond 2.7.0 itself):
Two system-level issues were causing instability that compounded the dxgkrnl issues:
- nvidia-cdi-refresh probes CUDA devices at boot ~11 seconds in, racing with Blackwell driver initialization. Hard-masking it resolves the instability:
ln -sf /dev/null /etc/systemd/system/nvidia-cdi-refresh.path
ln -sf /dev/null /etc/systemd/system/nvidia-cdi-refresh.service
systemctl daemon-reload- Boot timing — CUDA services (Ollama, etc.) need a
ExecStartPre=/bin/sleep 45delay to avoid racing with dxgkrnl initialization on Blackwell.
Result with 2.7.0 + above fixes: vLLM runs with full CUDA graphs, ~140 tok/s on Qwen3-14B-AWQ. Stable across reboots. No enforce-eager needed.
Still not working: FP8 quantization falls back to an emulated path (3x slower than INT4 AWQ). Blackwell FP8 tensor cores appear not yet exposed through dxgkrnl.
Full benchmark writeup: vllm-project/vllm#37242
Thanks to the WSL team for the 2.7.0 improvements — this is a meaningful unlock for the AI/ML community on Windows + Blackwell.