Skip to content

SPH Simulation Kernel #81

@juanchuletas

Description

@juanchuletas

Motivation

FunGT has a GPU rigid body physics pipeline with spatial hashing, radix sort, and Jacobi impulse solving. A fluid simulation system extends the engine into a new domain (continuum mechanics) while reusing the same spatial infrastructure. The SPH kernel is the simulation backbone that all rendering and coupling work depends on. Nothing else in the fluid pipeline can proceed without it.

Summary

Implement a GPU-accelerated SPH (Smoothed Particle Hydrodynamics) simulation kernel targeting 50K to 100K particles at interactive frame rates on Intel Arc via SYCL. The kernel follows the same architectural patterns as PhysicsKernel (SoA device data, host staging, bulk transfer) but with a fundamentally different simulation pipeline: density estimation, pressure solve, force accumulation, integration. No constraint solver. Forces are computed directly from the Navier-Stokes equations approximated through smoothing kernels.

Implementation Tasks

  1. Define SPHDeviceData struct with per-particle SoA arrays: position (x, y, z), velocity (vx, vy, vz), velocity half-step (vx_half, vy_half, vz_half), force (fx, fy, fz), density, pressure. No orientation, inertia tensor, or angular velocity. The half-step velocity arrays are required by leapfrog integration.

  2. Define SPHHostData struct with matching std::vector fields and a resize(int capacity) method, mirroring the rigid body HostData pattern for batch initialization and sendToDevice bulk transfer.

  3. Instantiate a SpatialGrid (extracted in the prerequisite task) with cell size equal to the smoothing radius h. This is separate from the rigid body grid. Pass nullptr for bodyMode since all fluid particles participate in neighbor search.

  4. Implement computeDensity kernel. For each particle, iterate 27 neighbor cells via SpatialGrid, accumulate density contributions using the Poly6 smoothing kernel: W_poly6(r, h) = (315 / (64 * pi * h^9)) * (h^2 - r^2)^3 for 0 <= r <= h, zero otherwise. Each particle also contributes to its own density (self-contribution). Write result to the density array.

  5. Implement computePressure kernel. Convert density to pressure using the Tait equation of state: P = B * ((rho / rho_rest)^gamma - 1) where B = rho_rest * c_s^2 / gamma, gamma = 7, c_s is the numerical speed of sound, and rho_rest is the rest density. The Tait EOS penalizes compression nonlinearly, providing visual incompressibility at lower stiffness values than the linear form P = k * (rho - rho_rest), which reduces CFL timestep constraints.

  6. Implement computeForces kernel. For each particle, iterate 27 neighbor cells and accumulate three force contributions. Pressure force uses the Spiky kernel gradient: grad_W_spiky(r, h) = -(45 / (pi * h^6)) * (h - r)^2 * (r_vec / r). Viscosity force uses the viscosity kernel Laplacian: lap_W_visc(r, h) = (45 / (pi * h^6)) * (h - r). Gravity is applied as a constant body force. Pressure force is antisymmetric (Newton's third law): use the average pressure (P_i + P_j) / (2 * rho_j) formulation to ensure momentum conservation.

  7. Implement integrate kernel using leapfrog (velocity Verlet) integration. The scheme is: v(t + dt/2) = v(t - dt/2) + a(t) * dt, then x(t + dt) = x(t) + v(t + dt/2) * dt. The first timestep requires a half-step kickoff: v(dt/2) = v(0) + a(0) * dt/2. Leapfrog is symplectic (conserves energy over long runs) and prevents the visible energy drift that semi-implicit Euler introduces in free-surface flows.

  8. Implement boundary handling with penalty forces. Define an axis-aligned box container. For each particle within a boundary margin of a wall, apply a repulsive force proportional to penetration depth: F_boundary = k_boundary * (margin - d) * n_wall where d is the distance to the wall and n_wall is the inward normal. This keeps fluid contained without requiring special boundary particles.

  9. Expose configurable simulation parameters through a SPHParams struct: smoothing radius h, rest density rho_0, speed of sound c_s (determines Tait stiffness B), viscosity coefficient mu, gravity vector, timestep dt, boundary extents, boundary stiffness k_boundary. Default values for water-like behavior: h = 0.1, rho_0 = 1000.0, c_s = 80.0 (for Tait with gamma=7, this gives B = rho_0 * c_s^2 / 7 ≈ 914285), mu = 0.1, gravity = (0, -9.81, 0), dt = 0.0005. Particle mass is computed from rho_0 * (particle_spacing)^3 where particle_spacing = h / 2 or similar.

Technical Considerations

The SPHKernel class follows the same patterns as PhysicsKernel: SoA device data, staging buffers for initialization, bulk sendToDevice, SYCL kernel dispatch with DeviceData struct capture. The smoothing kernels (Poly6, Spiky gradient, Viscosity Laplacian) are implemented as inline device functions in a gpu_sph_smoothing_kernels.hpp header, same pattern as gpu_impulse_solver.hpp.

The neighbor search reuses SpatialGrid with cell size equal to h. All particles within the kernel support radius are guaranteed to be in the current cell or its 26 neighbors. The bodyMode pointer is passed as nullptr since all fluid particles are dynamic.

The Tait EOS with gamma = 7 and c_s = 80 gives a CFL limit of dt < h / c_s = 0.1 / 80 = 0.00125. The chosen default timestep of 0.0005 provides a safety margin of 2.5x. Multiple substeps per frame may be required depending on the rendering framerate.

Leapfrog integration requires storing the half-step velocity. This adds three float arrays to SPHDeviceData compared to Euler. The memory overhead is negligible relative to the energy conservation benefit.

The force accumulation kernel must handle the r = 0 case (self-interaction) by skipping particles where i == j. The Spiky gradient is undefined at r = 0 (division by zero in the r_vec / r term).

Acceptance Criteria

  1. SPHKernel compiles and links with the existing FunGT build system
  2. 50K particles initialized in a block pattern, transferred to device via sendToDevice, and simulated for 1000 timesteps without NaN or Inf in any array
  3. Density computation produces values within 10% of rho_rest for particles in the interior of a uniform block (verifies Poly6 kernel correctness and neighbor search completeness)
  4. A dam break test (block of particles released under gravity in a box) shows visually correct behavior: particles fall, hit the floor, spread laterally, splash against walls
  5. Energy does not visibly drift over 10 seconds of simulation time (leapfrog validation)
  6. All parameters in SPHParams can be modified between frames without reinitialization
  7. Performance: 50K particles complete one full timestep (grid build + density + pressure + forces + integrate) in under 5ms on Intel Arc

Dependencies

  1. SpatialGrid extraction (must be completed first so SPHKernel can instantiate its own grid without duplicating spatial hashing code)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions