SPH Simulation Kernel


## Motivation

FunGT has a GPU rigid body physics pipeline with spatial hashing, radix sort, and Jacobi impulse solving. A fluid simulation system extends the engine into a new domain (continuum mechanics) while reusing the same spatial infrastructure. The SPH kernel is the simulation backbone that all rendering and coupling work depends on. Nothing else in the fluid pipeline can proceed without it.

## Summary

Implement a GPU-accelerated SPH (Smoothed Particle Hydrodynamics) simulation kernel targeting 50K to 100K particles at interactive frame rates on Intel Arc via SYCL. The kernel follows the same architectural patterns as `PhysicsKernel` (SoA device data, host staging, bulk transfer) but with a fundamentally different simulation pipeline: density estimation, pressure solve, force accumulation, integration. No constraint solver. Forces are computed directly from the Navier-Stokes equations approximated through smoothing kernels.

## Implementation Tasks

1. Define `SPHDeviceData` struct with per-particle SoA arrays: position (x, y, z), velocity (vx, vy, vz), velocity half-step (vx_half, vy_half, vz_half), force (fx, fy, fz), density, pressure. No orientation, inertia tensor, or angular velocity. The half-step velocity arrays are required by leapfrog integration.

2. Define `SPHHostData` struct with matching `std::vector` fields and a `resize(int capacity)` method, mirroring the rigid body `HostData` pattern for batch initialization and `sendToDevice` bulk transfer.

3. Instantiate a `SpatialGrid` (extracted in the prerequisite task) with cell size equal to the smoothing radius `h`. This is separate from the rigid body grid. Pass `nullptr` for `bodyMode` since all fluid particles participate in neighbor search.

4. Implement `computeDensity` kernel. For each particle, iterate 27 neighbor cells via `SpatialGrid`, accumulate density contributions using the Poly6 smoothing kernel: `W_poly6(r, h) = (315 / (64 * pi * h^9)) * (h^2 - r^2)^3` for `0 <= r <= h`, zero otherwise. Each particle also contributes to its own density (self-contribution). Write result to the density array.

5. Implement `computePressure` kernel. Convert density to pressure using the Tait equation of state: `P = B * ((rho / rho_rest)^gamma - 1)` where `B = rho_rest * c_s^2 / gamma`, `gamma = 7`, `c_s` is the numerical speed of sound, and `rho_rest` is the rest density. The Tait EOS penalizes compression nonlinearly, providing visual incompressibility at lower stiffness values than the linear form `P = k * (rho - rho_rest)`, which reduces CFL timestep constraints.

6. Implement `computeForces` kernel. For each particle, iterate 27 neighbor cells and accumulate three force contributions. Pressure force uses the Spiky kernel gradient: `grad_W_spiky(r, h) = -(45 / (pi * h^6)) * (h - r)^2 * (r_vec / r)`. Viscosity force uses the viscosity kernel Laplacian: `lap_W_visc(r, h) = (45 / (pi * h^6)) * (h - r)`. Gravity is applied as a constant body force. Pressure force is antisymmetric (Newton's third law): use the average pressure `(P_i + P_j) / (2 * rho_j)` formulation to ensure momentum conservation.

7. Implement `integrate` kernel using leapfrog (velocity Verlet) integration. The scheme is: `v(t + dt/2) = v(t - dt/2) + a(t) * dt`, then `x(t + dt) = x(t) + v(t + dt/2) * dt`. The first timestep requires a half-step kickoff: `v(dt/2) = v(0) + a(0) * dt/2`. Leapfrog is symplectic (conserves energy over long runs) and prevents the visible energy drift that semi-implicit Euler introduces in free-surface flows.

8. Implement boundary handling with penalty forces. Define an axis-aligned box container. For each particle within a boundary margin of a wall, apply a repulsive force proportional to penetration depth: `F_boundary = k_boundary * (margin - d) * n_wall` where `d` is the distance to the wall and `n_wall` is the inward normal. This keeps fluid contained without requiring special boundary particles.

9. Expose configurable simulation parameters through a `SPHParams` struct: smoothing radius `h`, rest density `rho_0`, speed of sound `c_s` (determines Tait stiffness `B`), viscosity coefficient `mu`, gravity vector, timestep `dt`, boundary extents, boundary stiffness `k_boundary`. Default values for water-like behavior: `h = 0.1`, `rho_0 = 1000.0`, `c_s = 80.0` (for Tait with gamma=7, this gives `B = rho_0 * c_s^2 / 7 ≈ 914285`), `mu = 0.1`, `gravity = (0, -9.81, 0)`, `dt = 0.0005`. Particle mass is computed from `rho_0 * (particle_spacing)^3` where `particle_spacing = h / 2` or similar.

## Technical Considerations

The `SPHKernel` class follows the same patterns as `PhysicsKernel`: SoA device data, staging buffers for initialization, bulk `sendToDevice`, SYCL kernel dispatch with `DeviceData` struct capture. The smoothing kernels (Poly6, Spiky gradient, Viscosity Laplacian) are implemented as inline device functions in a `gpu_sph_smoothing_kernels.hpp` header, same pattern as `gpu_impulse_solver.hpp`.

The neighbor search reuses `SpatialGrid` with cell size equal to `h`. All particles within the kernel support radius are guaranteed to be in the current cell or its 26 neighbors. The `bodyMode` pointer is passed as `nullptr` since all fluid particles are dynamic.

The Tait EOS with `gamma = 7` and `c_s = 80` gives a CFL limit of `dt < h / c_s = 0.1 / 80 = 0.00125`. The chosen default timestep of `0.0005` provides a safety margin of 2.5x. Multiple substeps per frame may be required depending on the rendering framerate.

Leapfrog integration requires storing the half-step velocity. This adds three float arrays to `SPHDeviceData` compared to Euler. The memory overhead is negligible relative to the energy conservation benefit.

The force accumulation kernel must handle the `r = 0` case (self-interaction) by skipping particles where `i == j`. The Spiky gradient is undefined at `r = 0` (division by zero in the `r_vec / r` term).

## Acceptance Criteria

1. `SPHKernel` compiles and links with the existing FunGT build system
2. 50K particles initialized in a block pattern, transferred to device via `sendToDevice`, and simulated for 1000 timesteps without NaN or Inf in any array
3. Density computation produces values within 10% of `rho_rest` for particles in the interior of a uniform block (verifies Poly6 kernel correctness and neighbor search completeness)
4. A dam break test (block of particles released under gravity in a box) shows visually correct behavior: particles fall, hit the floor, spread laterally, splash against walls
5. Energy does not visibly drift over 10 seconds of simulation time (leapfrog validation)
6. All parameters in `SPHParams` can be modified between frames without reinitialization
7. Performance: 50K particles complete one full timestep (grid build + density + pressure + forces + integrate) in under 5ms on Intel Arc

## Dependencies

1. `SpatialGrid` extraction (must be completed first so `SPHKernel` can instantiate its own grid without duplicating spatial hashing code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPH Simulation Kernel #81

Motivation

Summary

Implementation Tasks

Technical Considerations

Acceptance Criteria

Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SPH Simulation Kernel #81

Description

Motivation

Summary

Implementation Tasks

Technical Considerations

Acceptance Criteria

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions