-
Notifications
You must be signed in to change notification settings - Fork 1
SPH Simulation Kernel #81
Description
Motivation
FunGT has a GPU rigid body physics pipeline with spatial hashing, radix sort, and Jacobi impulse solving. A fluid simulation system extends the engine into a new domain (continuum mechanics) while reusing the same spatial infrastructure. The SPH kernel is the simulation backbone that all rendering and coupling work depends on. Nothing else in the fluid pipeline can proceed without it.
Summary
Implement a GPU-accelerated SPH (Smoothed Particle Hydrodynamics) simulation kernel targeting 50K to 100K particles at interactive frame rates on Intel Arc via SYCL. The kernel follows the same architectural patterns as PhysicsKernel (SoA device data, host staging, bulk transfer) but with a fundamentally different simulation pipeline: density estimation, pressure solve, force accumulation, integration. No constraint solver. Forces are computed directly from the Navier-Stokes equations approximated through smoothing kernels.
Implementation Tasks
-
Define
SPHDeviceDatastruct with per-particle SoA arrays: position (x, y, z), velocity (vx, vy, vz), velocity half-step (vx_half, vy_half, vz_half), force (fx, fy, fz), density, pressure. No orientation, inertia tensor, or angular velocity. The half-step velocity arrays are required by leapfrog integration. -
Define
SPHHostDatastruct with matchingstd::vectorfields and aresize(int capacity)method, mirroring the rigid bodyHostDatapattern for batch initialization andsendToDevicebulk transfer. -
Instantiate a
SpatialGrid(extracted in the prerequisite task) with cell size equal to the smoothing radiush. This is separate from the rigid body grid. PassnullptrforbodyModesince all fluid particles participate in neighbor search. -
Implement
computeDensitykernel. For each particle, iterate 27 neighbor cells viaSpatialGrid, accumulate density contributions using the Poly6 smoothing kernel:W_poly6(r, h) = (315 / (64 * pi * h^9)) * (h^2 - r^2)^3for0 <= r <= h, zero otherwise. Each particle also contributes to its own density (self-contribution). Write result to the density array. -
Implement
computePressurekernel. Convert density to pressure using the Tait equation of state:P = B * ((rho / rho_rest)^gamma - 1)whereB = rho_rest * c_s^2 / gamma,gamma = 7,c_sis the numerical speed of sound, andrho_restis the rest density. The Tait EOS penalizes compression nonlinearly, providing visual incompressibility at lower stiffness values than the linear formP = k * (rho - rho_rest), which reduces CFL timestep constraints. -
Implement
computeForceskernel. For each particle, iterate 27 neighbor cells and accumulate three force contributions. Pressure force uses the Spiky kernel gradient:grad_W_spiky(r, h) = -(45 / (pi * h^6)) * (h - r)^2 * (r_vec / r). Viscosity force uses the viscosity kernel Laplacian:lap_W_visc(r, h) = (45 / (pi * h^6)) * (h - r). Gravity is applied as a constant body force. Pressure force is antisymmetric (Newton's third law): use the average pressure(P_i + P_j) / (2 * rho_j)formulation to ensure momentum conservation. -
Implement
integratekernel using leapfrog (velocity Verlet) integration. The scheme is:v(t + dt/2) = v(t - dt/2) + a(t) * dt, thenx(t + dt) = x(t) + v(t + dt/2) * dt. The first timestep requires a half-step kickoff:v(dt/2) = v(0) + a(0) * dt/2. Leapfrog is symplectic (conserves energy over long runs) and prevents the visible energy drift that semi-implicit Euler introduces in free-surface flows. -
Implement boundary handling with penalty forces. Define an axis-aligned box container. For each particle within a boundary margin of a wall, apply a repulsive force proportional to penetration depth:
F_boundary = k_boundary * (margin - d) * n_wallwheredis the distance to the wall andn_wallis the inward normal. This keeps fluid contained without requiring special boundary particles. -
Expose configurable simulation parameters through a
SPHParamsstruct: smoothing radiush, rest densityrho_0, speed of soundc_s(determines Tait stiffnessB), viscosity coefficientmu, gravity vector, timestepdt, boundary extents, boundary stiffnessk_boundary. Default values for water-like behavior:h = 0.1,rho_0 = 1000.0,c_s = 80.0(for Tait with gamma=7, this givesB = rho_0 * c_s^2 / 7 ≈ 914285),mu = 0.1,gravity = (0, -9.81, 0),dt = 0.0005. Particle mass is computed fromrho_0 * (particle_spacing)^3whereparticle_spacing = h / 2or similar.
Technical Considerations
The SPHKernel class follows the same patterns as PhysicsKernel: SoA device data, staging buffers for initialization, bulk sendToDevice, SYCL kernel dispatch with DeviceData struct capture. The smoothing kernels (Poly6, Spiky gradient, Viscosity Laplacian) are implemented as inline device functions in a gpu_sph_smoothing_kernels.hpp header, same pattern as gpu_impulse_solver.hpp.
The neighbor search reuses SpatialGrid with cell size equal to h. All particles within the kernel support radius are guaranteed to be in the current cell or its 26 neighbors. The bodyMode pointer is passed as nullptr since all fluid particles are dynamic.
The Tait EOS with gamma = 7 and c_s = 80 gives a CFL limit of dt < h / c_s = 0.1 / 80 = 0.00125. The chosen default timestep of 0.0005 provides a safety margin of 2.5x. Multiple substeps per frame may be required depending on the rendering framerate.
Leapfrog integration requires storing the half-step velocity. This adds three float arrays to SPHDeviceData compared to Euler. The memory overhead is negligible relative to the energy conservation benefit.
The force accumulation kernel must handle the r = 0 case (self-interaction) by skipping particles where i == j. The Spiky gradient is undefined at r = 0 (division by zero in the r_vec / r term).
Acceptance Criteria
SPHKernelcompiles and links with the existing FunGT build system- 50K particles initialized in a block pattern, transferred to device via
sendToDevice, and simulated for 1000 timesteps without NaN or Inf in any array - Density computation produces values within 10% of
rho_restfor particles in the interior of a uniform block (verifies Poly6 kernel correctness and neighbor search completeness) - A dam break test (block of particles released under gravity in a box) shows visually correct behavior: particles fall, hit the floor, spread laterally, splash against walls
- Energy does not visibly drift over 10 seconds of simulation time (leapfrog validation)
- All parameters in
SPHParamscan be modified between frames without reinitialization - Performance: 50K particles complete one full timestep (grid build + density + pressure + forces + integrate) in under 5ms on Intel Arc
Dependencies
SpatialGridextraction (must be completed first soSPHKernelcan instantiate its own grid without duplicating spatial hashing code)