Skip to content

ENH: Segmentation Filters OoC Optimization#1559

Draft
joeykleingers wants to merge 8 commits intoBlueQuartzSoftware:developfrom
joeykleingers:worktree-OptimizeGroupD
Draft

ENH: Segmentation Filters OoC Optimization#1559
joeykleingers wants to merge 8 commits intoBlueQuartzSoftware:developfrom
joeykleingers:worktree-OptimizeGroupD

Conversation

@joeykleingers
Copy link
Contributor

@joeykleingers joeykleingers commented Mar 5, 2026

Summary

  • Optimizes 5 Group D filters (IdentifySample, ScalarSegmentFeatures, EBSDSegmentFeatures, CAxisSegmentFeatures, FillBadData) for out-of-core (ZarrStore) performance
  • Adds chunk-sequential Connected Component Labeling (CCL) with Union-Find to SegmentFeatures base class, replacing DFS flood-fill for OOC paths
  • Runtime dispatch selects BFS/DFS for in-core and CCL for OOC based on storage type
  • Splits FillBadData and IdentifySample into separate BFS/CCL algorithm files
  • Adds UnionFind.hpp utility class with path-halving compression and union-by-rank
  • Adds 200x200x200 benchmark tests and ForceOocAlgorithmGuard correctness tests for all 5 filters

Algorithm Decision

Problem

The original DFS flood-fill in SegmentFeatures uses random data access — each stack pop reads an arbitrary voxel then checks 6 scattered neighbors. For in-core DataStore this is fast, but for ZarrStore-backed arrays every random access can trigger a chunk load/evict cycle, causing 50x–621x slowdowns.

Solution: Chunk-Sequential CCL with Union-Find

A two-phase scanline algorithm that processes the grid in strict Z-Y-X order:

Phase 1 — Forward Labeling: Iterate every voxel in scanline order. For each valid voxel, check only backward neighbors (-X, -Y, -Z). Assign provisional labels and union equivalences in a Union-Find structure. A 2-slice rolling buffer (current + previous Z-slice) keeps all neighbor reads in RAM.

Phase 2 — Resolution and Relabeling: Flatten the Union-Find, then two sequential passes remap provisional labels to final feature IDs (assigned in seed-discovery order for compatibility with the original DFS numbering).

Why This Algorithm

  1. Eliminates random access entirely — every read is from the rolling buffer (RAM) or a sequential chunk load
  2. Memory scales with slice size, not volume — O(dimX × dimY) for the buffer + O(num_features) for Union-Find
  3. Preserves feature numbering order — Phase 2 remaps to match original DFS seed-discovery order
  4. Shared base classexecuteCCL() in SegmentFeatures serves all 4 subclasses via isValidVoxel() and areNeighborsSimilar() virtual methods

Dispatch

Each filter checks IsOutOfCore(*featureIdsArray) || ForceOocAlgorithm():

  • TrueexecuteCCL() (chunk-sequential CCL)
  • Falseexecute() (original DFS flood-fill)

FillBadData (special case)

FillBadData uses its own 4-phase CCL (not the base class executeCCL()):

  1. CCL with negative labels for bad-data regions
  2. Flatten Union-Find and accumulate region sizes
  3. Classify regions by size (small → fill, large → preserve as voids)
  4. Iterative morphological dilation to fill marked regions

Split into FillBadDataBFS.cpp (in-core) and FillBadDataCCL.cpp (OOC) with dispatch in the main algorithm class.

Performance Results (200x200x200)

Filter In-Core Speedup OOC Speedup
IdentifySample 1.6x faster >367x faster
ScalarSegmentFeatures 1.5x faster >116x faster
EBSDSegmentFeatures 1.8x faster >42x faster
CAxisSegmentFeatures 1.7x faster >46x faster
FillBadData 3.2x faster 3.0x faster

OOC baselines for 4 of 5 filters timed out at 1500s (DFS on ZarrStore). FillBadData's baseline already had CCL from PR #1515; this PR refactored it into separate BFS/CCL dispatch files.

Test plan

  • All existing IdentifySample, ScalarSegmentFeatures, EBSDSegmentFeatures, CAxisSegmentFeatures, and FillBadData unit tests pass with both in-core and OOC algorithm paths (GENERATE(false, true) + ForceOocAlgorithmGuard)
  • New 200x200x200 benchmark tests pass for all 5 filters in both in-core and OOC configurations
  • In-core benchmarks show no regression (all filters same speed or faster)
  • Builds clean on both simplnx-Rel and simplnx-ooc-Rel presets

@joeykleingers joeykleingers force-pushed the worktree-OptimizeGroupD branch 2 times, most recently from 48f1eea to 93b4565 Compare March 5, 2026 14:57
Consolidate OOC filter optimizations from identify-sample-optimizations worktree:

- Add AlgorithmDispatch.hpp and UnionFind.hpp utilities
- SegmentFeatures: Add executeCCL() with 2-slice rolling buffer + Union-Find
- ScalarSegmentFeatures: CCL dispatch + CompareFunctor::compare()
- EBSDSegmentFeatures: CCL dispatch + isValidVoxel/areNeighborsSimilar
- CAxisSegmentFeatures: CCL dispatch + isValidVoxel/areNeighborsSimilar
- Tests: PreferencesSentinel, ForceOocAlgorithmGuard, 200^3 benchmarks
Update IdentifySample and FillBadData to use the AlgorithmDispatch
BFS/CCL split pattern instead of monolithic inlined algorithms.
Add BFS/CCL split files, update tests with ForceOocAlgorithmGuard,
PreferencesSentinel, and 200x200x200 benchmark test cases.
Benchmark tests should let the dispatch happen naturally based on
storage type, not force the OOC algorithm path. ForceOocAlgorithmGuard
remains in correctness tests to exercise both code paths.
@joeykleingers joeykleingers force-pushed the worktree-OptimizeGroupD branch from 93b4565 to 935f5de Compare March 5, 2026 14:58
@imikejackson imikejackson changed the title ENH: OOC optimization for Group D (CCL/Segmentation) ENH: OoC optimization for Segmentation Filters Mar 5, 2026
- Fix incorrect global index in IdentifySampleCommon hole-fill for
  XZ/YZ planes (was using flat local index instead of stride-based
  global index)
- Move dp1/dp2 arrays to static constexpr class members in
  IdentifySampleSliceBySliceFunctor
- Fix stale temp file reads in FillBadDataCCL phase 4 by tracking
  write position (rewind doesn't truncate)
- Fix int32 truncation of uint64 region size in FillBadDataCCL
  phase 3 small-defect comparison
- Propagate tmpfile() failure as Result<> error instead of silent
  message-only failure
- Remove const from FillBadDataCCL::operator()() to match non-const
  member usage
… bugs

- Fix k_ prefix on benchmark test constants (kDimX → k_DimX, etc.) in 5 test files
- Add const to FillBadDataInputValues* across FillBadData/BFS/CCL headers and sources
- Add ftell error check in FillBadDataCCL phase 4
- Replace // comment with /// @copydoc Doxygen on CCL virtual method overrides
- Hoist GetAllChildDataPaths() out of iterative fill loop in FillBadDataBFS (pre-existing)
- Fix float32 boundary checks to int64 in FillBadDataBFS iterative fill (pre-existing)


Replace /// @copydoc shorthand with proper multi-line /** */ Doxygen
blocks including @brief, @param, and @return for isValidVoxel() and
areNeighborsSimilar() overrides.
- ScalarSegmentFeatures.hpp: Replace // comment with proper /** */ blocks
  for isValidVoxel() and areNeighborsSimilar() overrides
- IdentifySampleCommon.hpp: Add @class/@struct tags, add @brief/@param/@return
  to VectorUnionFind public methods and IdentifySampleSliceBySliceFunctor
- SegmentFeatures.hpp: Fill in empty @param/@return descriptions on executeCCL()
…asses

Add @brief, @param, and @return documentation to public constructors
and operator()() methods on FillBadData, FillBadDataBFS, FillBadDataCCL,
IdentifySampleBFS, and IdentifySampleCCL per doxygen-comments skill.
@joeykleingers joeykleingers force-pushed the worktree-OptimizeGroupD branch 2 times, most recently from 0007b93 to ca4ef6a Compare March 5, 2026 17:53
@imikejackson imikejackson changed the title ENH: OoC optimization for Segmentation Filters ENH: Segmentation Filters OoC Optimization Mar 9, 2026
@joeykleingers joeykleingers marked this pull request as draft March 10, 2026 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant