Skip to content

ENH: Face-neighbor filter OOC dispatch (Direct/Scanline)#1561

Open
joeykleingers wants to merge 5 commits intoBlueQuartzSoftware:developfrom
joeykleingers:worktree-OptimizeGroupB
Open

ENH: Face-neighbor filter OOC dispatch (Direct/Scanline)#1561
joeykleingers wants to merge 5 commits intoBlueQuartzSoftware:developfrom
joeykleingers:worktree-OptimizeGroupB

Conversation

@joeykleingers
Copy link
Contributor

@joeykleingers joeykleingers commented Mar 5, 2026

Summary

Split 4 single-pass face-neighbor filters into Direct (in-core) and Scanline (OOC) algorithm classes using DispatchAlgorithm:

  • ComputeBoundaryCells — Direct/Scanline dispatch for boundary cell counting
  • ComputeSurfaceFeatures — Direct/Scanline dispatch with 2D/3D coordinate remapping
  • ComputeFeatureNeighbors — Direct/Scanline dispatch (Phase 1 voxel iteration only)
  • ComputeSurfaceAreaToVolume — Direct/Scanline dispatch for surface area ratio computation

Each filter's original algorithm is preserved exactly in a *Direct class (zero in-core regression). The *Scanline class wraps the same logic in chunk-sequential iteration for OOC storage backends.

Algorithm Decision

These filters already have naturally sequential Z→Y→X access patterns. The ZarrStore FIFO cache handles this efficiently, so the Scanline optimization provides structural guarantees (chunk-sequential access) rather than measurable speed improvements. The OOC bottleneck is ZarrStore's per-element operator[] overhead (~55-75ns per call: mutex lock + chunk lookup + cache check), which no algorithm-level change can avoid.

BadDataNeighborOrientationCheck (included from a previous commit on this branch) achieves 1.9x OOC speedup because its bottleneck was algorithmic redundancy (full-volume rescans), not access pattern. ComputeFeatureNeighbors achieves 1.4x OOC speedup from eliminating a redundant std::minmax_element full-scan that performed an extra OOC traversal before the main loop.

PR Review Fixes

  • ComputeSurfaceFeaturesDirect: Added missing return before MakeErrorResult for 1D geometry error path (pre-existing bug)
  • ComputeSurfaceFeaturesScanline: Added the same 1D geometry error path for consistency
  • ComputeFeatureNeighborsDirect/Scanline: Replaced std::minmax_element full-scan with deferred max tracking during the main loop (1.4x OOC speedup)
  • All 8 new algorithm classes: Added Doxygen @brief comments to operator()() implementations

Benchmark Results (200x200x200 programmatic datasets)

Filter IC Before IC After OOC Before OOC After
ComputeBoundaryCells 0.19s 0.13s (1.5x) 6.69s 6.67s
ComputeSurfaceFeatures 0.10s 0.11s 4.01s 4.11s
ComputeFeatureNeighbors 0.25s 0.23s 8.93s 6.44s (1.4x)
ComputeSurfaceAreaToVolume 0.14s 0.14s 8.59s 8.67s
BadDataNeighborOrientationCheck 1.78s 0.68s (2.6x) 97.1s 51.01s (1.9x)

ComputeBoundaryCells 1.5x in-core improvement is from fixing an ImageGeom copy-by-value bug (const ImageGeomconst auto&). ComputeFeatureNeighbors 1.4x OOC improvement is from replacing std::minmax_element with deferred validation.

ZarrStore Per-Element Overhead (Optimization Ceiling)

Every getValue()/setValue() call on ZarrStore incurs:

  • Mutex lock/unlock: ~20ns
  • Chunk lookup (flat→N-D, FIFO scan): ~30-50ns
  • Data read/write: ~5ns

Total: ~55-75ns per element vs ~1ns for in-core DataStore. This 55-75x overhead applies regardless of access pattern and cannot be reduced by filter-level changes. Infrastructure improvements needed:

  1. Bulk read/write API on AbstractDataStore (single mutex lock around batch copy)
  2. Chunk-level bulk transfer in FileCore (bypass per-element chunk lookup)
  3. Raw pointer/span API for loaded chunks

Test Changes

  • Added PreferencesSentinel with computed byte thresholds to all correctness tests
  • Added ForceOocAlgorithmGuard + GENERATE(false, true) for dual-path test coverage
  • Added 200x200x200 benchmark TEST_CASE for each filter

Test Plan

  • All 12 correctness tests pass (in-core config)
  • All 12 correctness tests pass (OOC config with ZarrStore)
  • All 4 benchmark tests pass (both configs)
  • chunk shape: confirmed in OOC verbose output (ZarrStore active)

@joeykleingers joeykleingers changed the title ENH: Group B face-neighbor filter OOC dispatch (Direct/Scanline) ENH: Face-neighbor filter OOC dispatch (Direct/Scanline) Mar 5, 2026
…ithms

Split BadDataNeighborOrientationCheck into two dispatched algorithms
using DispatchAlgorithm for optimal performance in both configurations:

- Worklist (in-core): Uses std::deque worklist for Phase 2 to process
  only eligible voxels with fast random access. ~5x speedup vs original.

- Scanline (OOC): Uses chunk-sequential multi-pass scans for Phase 2
  to avoid random access chunk thrashing. Includes chunk-skip
  optimization that checks in-memory neighborCount before loading
  chunks, skipping those with no eligible voxels. ~1.8x speedup vs
  original.

Both algorithms share Phase 1 (chunk-sequential neighbor counting)
and use only a single neighborCount vector (4 bytes/voxel) with no
additional large allocations.

Updated tests with GENERATE + ForceOocAlgorithmGuard to exercise both
algorithm paths in in-core builds. Added 200x200x200 benchmark test.

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
…patch

Split ComputeBoundaryCells, ComputeSurfaceFeatures, ComputeFeatureNeighbors,
and ComputeSurfaceAreaToVolume into Direct (in-core) and Scanline (OOC)
algorithm classes using DispatchAlgorithm pattern. Direct classes preserve
original code for zero in-core regression. Scanline classes add
chunk-sequential iteration for OOC storage backends. Added benchmark tests
(200x200x200) and OOC test coverage with PreferencesSentinel +
ForceOocAlgorithmGuard for all four filters.

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
The 4 base algorithm .cpp files (ComputeBoundaryCells, ComputeSurfaceFeatures,
ComputeFeatureNeighbors, ComputeSurfaceAreaToVolume) still contained the
original algorithm code instead of dispatching to the Direct/Scanline classes.
This replaces each operator()() body with a DispatchAlgorithm call.

Also fixes ImageGeom copy-by-value in ComputeBoundaryCellsDirect/Scanline
(const ImageGeom -> const auto&) and adds Doxygen to all 8 new headers.
- Fix missing return before MakeErrorResult in ComputeSurfaceFeaturesDirect
  for 1D geometry error path (pre-existing bug). Add same error path to
  ComputeSurfaceFeaturesScanline.
- Replace std::minmax_element full-scan validation with deferred max tracking
  during the main loop in ComputeFeatureNeighborsDirect and Scanline,
  eliminating a redundant OOC full-scan (1.4x OOC speedup).
- Add Doxygen @brief comments to operator()() in all 8 new algorithm classes.
@joeykleingers joeykleingers force-pushed the worktree-OptimizeGroupB branch from e505193 to 0aabf6d Compare March 5, 2026 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant