ENH: Face-neighbor filter OOC dispatch (Direct/Scanline)#1561
Open
joeykleingers wants to merge 5 commits intoBlueQuartzSoftware:developfrom
Open
ENH: Face-neighbor filter OOC dispatch (Direct/Scanline)#1561joeykleingers wants to merge 5 commits intoBlueQuartzSoftware:developfrom
joeykleingers wants to merge 5 commits intoBlueQuartzSoftware:developfrom
Conversation
…ithms Split BadDataNeighborOrientationCheck into two dispatched algorithms using DispatchAlgorithm for optimal performance in both configurations: - Worklist (in-core): Uses std::deque worklist for Phase 2 to process only eligible voxels with fast random access. ~5x speedup vs original. - Scanline (OOC): Uses chunk-sequential multi-pass scans for Phase 2 to avoid random access chunk thrashing. Includes chunk-skip optimization that checks in-memory neighborCount before loading chunks, skipping those with no eligible voxels. ~1.8x speedup vs original. Both algorithms share Phase 1 (chunk-sequential neighbor counting) and use only a single neighborCount vector (4 bytes/voxel) with no additional large allocations. Updated tests with GENERATE + ForceOocAlgorithmGuard to exercise both algorithm paths in in-core builds. Added 200x200x200 benchmark test. Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
…patch Split ComputeBoundaryCells, ComputeSurfaceFeatures, ComputeFeatureNeighbors, and ComputeSurfaceAreaToVolume into Direct (in-core) and Scanline (OOC) algorithm classes using DispatchAlgorithm pattern. Direct classes preserve original code for zero in-core regression. Scanline classes add chunk-sequential iteration for OOC storage backends. Added benchmark tests (200x200x200) and OOC test coverage with PreferencesSentinel + ForceOocAlgorithmGuard for all four filters. Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
The 4 base algorithm .cpp files (ComputeBoundaryCells, ComputeSurfaceFeatures, ComputeFeatureNeighbors, ComputeSurfaceAreaToVolume) still contained the original algorithm code instead of dispatching to the Direct/Scanline classes. This replaces each operator()() body with a DispatchAlgorithm call. Also fixes ImageGeom copy-by-value in ComputeBoundaryCellsDirect/Scanline (const ImageGeom -> const auto&) and adds Doxygen to all 8 new headers.
- Fix missing return before MakeErrorResult in ComputeSurfaceFeaturesDirect for 1D geometry error path (pre-existing bug). Add same error path to ComputeSurfaceFeaturesScanline. - Replace std::minmax_element full-scan validation with deferred max tracking during the main loop in ComputeFeatureNeighborsDirect and Scanline, eliminating a redundant OOC full-scan (1.4x OOC speedup). - Add Doxygen @brief comments to operator()() in all 8 new algorithm classes.
e505193 to
0aabf6d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Split 4 single-pass face-neighbor filters into Direct (in-core) and Scanline (OOC) algorithm classes using
DispatchAlgorithm:Each filter's original algorithm is preserved exactly in a
*Directclass (zero in-core regression). The*Scanlineclass wraps the same logic in chunk-sequential iteration for OOC storage backends.Algorithm Decision
These filters already have naturally sequential Z→Y→X access patterns. The ZarrStore FIFO cache handles this efficiently, so the Scanline optimization provides structural guarantees (chunk-sequential access) rather than measurable speed improvements. The OOC bottleneck is ZarrStore's per-element
operator[]overhead (~55-75ns per call: mutex lock + chunk lookup + cache check), which no algorithm-level change can avoid.BadDataNeighborOrientationCheck (included from a previous commit on this branch) achieves 1.9x OOC speedup because its bottleneck was algorithmic redundancy (full-volume rescans), not access pattern. ComputeFeatureNeighbors achieves 1.4x OOC speedup from eliminating a redundant
std::minmax_elementfull-scan that performed an extra OOC traversal before the main loop.PR Review Fixes
returnbeforeMakeErrorResultfor 1D geometry error path (pre-existing bug)std::minmax_elementfull-scan with deferred max tracking during the main loop (1.4x OOC speedup)@briefcomments tooperator()()implementationsBenchmark Results (200x200x200 programmatic datasets)
ComputeBoundaryCells 1.5x in-core improvement is from fixing an ImageGeom copy-by-value bug (
const ImageGeom→const auto&). ComputeFeatureNeighbors 1.4x OOC improvement is from replacingstd::minmax_elementwith deferred validation.ZarrStore Per-Element Overhead (Optimization Ceiling)
Every
getValue()/setValue()call on ZarrStore incurs:Total: ~55-75ns per element vs ~1ns for in-core DataStore. This 55-75x overhead applies regardless of access pattern and cannot be reduced by filter-level changes. Infrastructure improvements needed:
AbstractDataStore(single mutex lock around batch copy)Test Changes
PreferencesSentinelwith computed byte thresholds to all correctness testsForceOocAlgorithmGuard+GENERATE(false, true)for dual-path test coverageTEST_CASEfor each filterTest Plan
chunk shape:confirmed in OOC verbose output (ZarrStore active)