SpawnDev.ILGPU runs ILGPU across both Blazor WebAssembly and desktop environments. Each platform introduces specific constraints. This page documents all known limitations and their workarounds.
The #1 rule of Blazor WASM: Never block the main thread.
Blazor WebAssembly runs on a single thread. Blocking that thread prevents JavaScript promises from resolving, causing a permanent deadlock.
buffer.GetAsArray1D(); // DEADLOCKS — calls synchronous readback internally
var result = task.Result; // DEADLOCKS — blocks on async result
task.Wait(); // DEADLOCKS
task.GetAwaiter().GetResult(); // DEADLOCKS// Synchronize() flushes commands to the backend (non-blocking, safe in WASM)
accelerator.Synchronize();
// SynchronizeAsync() flushes AND waits for completion
await accelerator.SynchronizeAsync();
// CopyToHostAsync() is the only way to read GPU data back to CPU
var results = await buffer.CopyToHostAsync<float>();Rule: Always propagate
async/awaitthrough your entire call stack. Never use.Result,.Wait(), or.GetAwaiter().GetResult()on the main thread.
The WGSL and GLSL transpilers cannot translate the IL throw instruction. If any code in your kernel (or methods it calls) contains a throw, compilation will fail.
Many System.Math methods contain implicit argument validation with throw. All browser backends (WebGPU, WebGL, Wasm) include throw-free redirects that handle the most common cases automatically:
| Method | Contains throw? |
Auto-redirected? | Notes |
|---|---|---|---|
Math.Clamp(val, min, max) |
✅ Yes | ✅ Yes | Redirected to Min(Max(val, min), max) |
Math.Round(x) |
✅ Yes | ✅ Yes | Redirected to throw-free wrapper |
Math.Truncate(x) |
✅ Yes | ✅ Yes | Redirected to throw-free wrapper |
Math.Sign(x) |
✅ Yes | ✅ Yes | Redirected to throw-free wrapper |
MathF.FusedMultiplyAdd |
✅ Yes | ✅ Yes | Redirected to throw-free wrapper |
XMath.Rsqrt(x) |
✅ Yes | ✅ Yes | Redirected to throw-free wrapper |
XMath.Rcp(x) |
✅ Yes | ✅ Yes | Redirected to throw-free wrapper |
MathF.Sin(x) |
❌ No | — | Safe to use directly |
MathF.Sqrt(x) |
❌ No | — | Safe to use directly |
Math.Min(a, b) |
❌ No | — | Safe to use directly |
Math.Max(a, b) |
❌ No | — | Safe to use directly |
Auto-redirects: The
RegisterMathIntrinsics()infrastructure in each browser backend automatically intercepts calls to problematic .NET methods and replaces them with throw-free equivalents at compile time. You can useMath.Clamp,Math.Round,Math.Truncate, andMath.Signdirectly in kernels — they will work on all backends.
Avoid calling any helper method that might throw exceptions. If you're not sure, check the .NET source for the method — if it contains throw new ArgumentException(...) or similar, it won't work unless a redirect is registered for it.
Kernels can only work with value types (structs, primitives). Reference types (class, string, arrays) are not supported.
| ❌ Not Allowed | ✅ Allowed |
|---|---|
string |
int, float, double, long |
class instances |
struct instances |
int[] (managed array) |
ArrayView<int> |
object |
Primitives and value-type structs |
Kernel parameters are passed by value. Use ArrayView<T> for output:
// ❌ Won't work
static void Bad(Index1D i, ref int result) { }
// ✅ Use a buffer
static void Good(Index1D i, ArrayView<int> result) { result[0] = 42; }GPU hardware doesn't support call stacks. Recursive functions must be rewritten as iterative loops.
WGSL and GLSL natively support 32-bit floats (f32 / float). Using float and MathF in kernels gives native GPU precision and performance.
double is not natively supported on most GPU hardware. Both GPU backends provide software emulation, controlled by F64EmulationMode:
- Dekker (default):
vec2<f32>— ~48–53 bits mantissa, fast - Ozaki:
vec4<f32>— full IEEE 754, ~2x slower - Disabled:
doublepromoted tofloat— max performance, loses precision
Emulated doubles work well for many use cases (fractals, scientific compute) but have performance overhead. For rendering and visual applications, use F64EmulationMode.Disabled.
Deep zoom limitation: f32 precision limits useful Mandelbrot zoom to ~10⁶× magnification. Emulated f64 extends this significantly.
GPU shader compilers may optimize away val != val self-comparisons or flush NaN during arithmetic. The WebGPU backend uses bit-level detection via bitcast<u32>() for reliable float.IsNaN() and float.IsInfinity() support. In kernels, always use float.IsNaN(val) and float.IsInfinity(val) — do not rely on manual patterns like val != val or val * 0.0f != 0.0f.
long and ulong are always emulated as vec2<u32>. This cannot be disabled — ILGPU's IR uses Int64 for ArrayView.Length and indices, so i64 emulation is required for correctness.
ILGPU compiles kernels at runtime by reading .NET IL (Intermediate Language). Both trimming and AOT compilation will break this:
<PropertyGroup>
<!-- REQUIRED: ILGPU needs IL reflection at runtime -->
<PublishTrimmed>false</PublishTrimmed>
<RunAOTCompilation>false</RunAOTCompilation>
</PropertyGroup>The Wasm backend uses Web Workers for parallel dispatch. SharedArrayBuffer is required for zero-copy data sharing between workers.
The page must be served with these HTTP headers:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
The demo includes coi-serviceworker.js which auto-injects these headers via a service worker — no server configuration needed for development.
Without SharedArrayBuffer, the Wasm backend still works but falls back to a single off-thread worker (no multi-worker parallelism).
Not all ILGPU features work on all backends:
| Feature | WebGPU | WebGL | Wasm | Cuda | OpenCL | CPU (Desktop) |
|---|---|---|---|---|---|---|
| Basic kernels | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| 1D/2D/3D index | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Scalar params | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Struct params | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| SharedMemory | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ |
| Group.Barrier() | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ |
| Dynamic SharedMemory | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ |
| Group.Broadcast | ✅ | ❌ | ✅ | ✅ | ✅¹ | ✅ |
| Atomics | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ |
| Warp/Subgroup ops | ✅² | ❌ | ❌ | ✅ | ✅¹ | ✅ |
| f64 emulation | ✅ | ✅ | N/A (native) | N/A (native) | N/A (native) | N/A (native) |
| i64 emulation | ✅ | ✅ | N/A (native) | N/A (native) | N/A (native) | N/A (native) |
| ILGPU Algorithms | ✅⁴ | ❌³ | ✅ | ✅ | ✅ |
¹ Requires device subgroup support. OpenCL shuffle needs cl_intel_subgroups (Intel) or cl_khr_subgroup_shuffle + cl_khr_subgroup_shuffle_relative (NVIDIA/AMD). Dynamically detected.
² Requires subgroups WebGPU extension
³ Most algorithms require shared memory or atomics
⁴ WebGPU: RadixSort, Scan, Reduce, Histogram fully supported and tested
⁵ Wasm: RadixSort, Scan, Reduce, Histogram fully supported with fiber-based phase dispatch and pure spin barriers at full hardwareConcurrency (v4.6.0). 249 pass / 0 fail / 3 skip.
| Browser | WebGPU | WebGL | Wasm |
|---|---|---|---|
| Chrome 113+ | ✅ | ✅ | ✅ |
| Edge 113+ | ✅ | ✅ | ✅ |
| Firefox 128+ | ✅ (Nightly) | ✅ | ✅ |
| Safari 18+ | 🧪 Experimental | ✅ | ✅ |
| Mobile Chrome | ✅ (Android) | ✅ | ✅ |
| Mobile Safari | ❌ | ✅ | ✅ |
SpawnDev.ILGPU uses the SpawnDev.ILGPU namespace, which can collide with the ILGPU namespace from the forked ILGPU library. When both are in scope, use the global:: prefix:
using SpawnDev.ILGPU; // SpawnDev extensions
using global::ILGPU; // Original ILGPU types
using global::ILGPU.Runtime; // Original ILGPU runtimeThis is a known issue preserved for backward compatibility.
ILGPU supports up to ~19 kernel parameters. If you approach this limit, pack related values into structs:
// ❌ Too many parameters
static void Bad(Index1D i, ArrayView<float> d, float a, float b, float c, float d2, ...) { }
// ✅ Pack into a struct
public struct Config { public float A; public float B; public float C; public float D; }
static void Good(Index1D i, ArrayView<float> data, Config config) { }If you see an "Unknown hard error" dialog when running ILGPU kernels on the CPU backend, this is caused by the Windows Error Reporting service being disabled.
Fix: Enable the Windows Error Reporting service:
- Open Services (Win+R, type
services.msc) - Find "Windows Error Reporting Service"
- Set startup type to "Manual" or "Automatic"
- Start the service
This allows Windows to handle process crashes properly instead of showing the raw error dialog. The underlying issue is typically an assertion failure in the CPU backend from out-of-bounds array access — check your kernel index calculations and buffer sizes.