ENH: OoC optimizations for CCL/Segmentation filters by joeykleingers · Pull Request #1557 · BlueQuartzSoftware/simplnx

joeykleingers · 2026-03-04T19:40:12Z

Depends on: #1545 (AlgorithmDispatch infrastructure) — merge #1545 first, then rebase this PR onto develop.

Summary

Optimize 5 CCL/Segmentation filters for out-of-core performance using BFS/CCL algorithm dispatch:

IdentifySample: Split into BFS flood fill (in-core) and scanline CCL with union-find (OOC, 2-slice rolling buffer)
FillBadData: Split into BFS (in-core) and CCL with on-disk deferred fill (OOC, O(slice) memory)
ScalarSegmentFeatures: Add executeCCL() to shared SegmentFeatures base with 2-slice rolling buffer + union-find
EBSDSegmentFeatures: CCL dispatch via isValidVoxel/areNeighborsSimilar overrides
CAxisSegmentFeatures: CCL dispatch via isValidVoxel/areNeighborsSimilar overrides

All filters use DispatchAlgorithm from #1545 to select the optimal path at runtime. Tests updated with ForceOocAlgorithmGuard + GENERATE(false, true) to exercise both algorithm paths, plus 200x200x200 benchmark test cases.

Algorithm Details

Original Algorithm: DFS Flood-Fill

All five filters used a depth-first search (DFS) flood-fill to find connected components:

Scan forward to find the next unlabeled valid voxel (the "seed")
Push the seed onto a stack, pop voxels, check 6 neighbors via determineGrouping()
If a neighbor matches, push it onto the stack
Repeat until the stack is empty (entire component labeled)

Why it's slow OOC: Each stack pop accesses an arbitrary voxel, then reads 6 scattered neighbors. With chunked storage, each jump may evict the current chunk and load a new one from disk, causing 50x-621x slowdown.

Optimized Algorithm: Chunk-Sequential CCL with Union-Find

A two-phase scanline algorithm that processes the grid in strict Z-Y-X order, never accessing data out of sequence.

Phase 1 — Forward Labeling with Rolling Buffer

Iterate every voxel in Z-Y-X order, checking only backward neighbors (-X, -Y, -Z)
A rolling 2-slice buffer (size = 2 × dimX × dimY) holds labels for current and previous Z-slices
Union-Find tracks label equivalences with path-halving compression and union-by-rank

Phase 2 — Resolution and Relabeling

Flatten the Union-Find (single O(K) pass)
Sequential pass to build provisional-to-final label mapping (preserves seed-discovery order)
Sequential pass to replace provisional labels with final IDs

Dispatch Strategy

Each filter checks IsOutOfCore(*featureIdsArray) || ForceOocAlgorithm():

True → executeCCL() (chunk-sequential CCL)
False → execute() (original DFS flood-fill)

Per-Filter Notes

Filter	`isValidVoxel()`	`areNeighborsSimilar()`
ScalarSegmentFeatures	Mask check	Type-dispatched `CompareFunctor::compare()` (11 data types)
EBSDSegmentFeatures	Mask + phase > 0	Same phase + quaternion misorientation via `LaueOps`
CAxisSegmentFeatures	Mask + phase > 0	Same phase + c-axis angle (handles directional ambiguity)
IdentifySample	Uses base CCL + optional hole-filling phase
FillBadData	Own 4-phase CCL: negative labels for bad-data, Union-Find, size classification, iterative morphological dilation with on-disk deferred fill

Tradeoffs

Aspect	Original DFS	Optimized CCL
In-core speed	Excellent (good cache locality)	Good (~5-10% overhead from buffer management)
OOC speed	Catastrophic (50x-621x slowdown)	Excellent (strictly sequential I/O)
RAM usage	All arrays must fit in RAM	Rolling buffer = O(2 slices) + Union-Find = O(features)
Code complexity	Simple DFS loop (~70 lines)	Three-phase algorithm + Union-Find (~300 lines in base class)
Feature ID ordering	Deterministic seed-discovery order	Matches DFS ordering via Phase 2 remapping

Performance (200x200x200 programmatic datasets)

Per-Filter Results

IdentifySample — BFS (in-core) / scanline CCL with union-find (OOC)

Config	Before	After	Speedup
In-core	0.23s	0.16s (BFS)	1.4x
OOC	841s	4.14s (CCL)	203x

ScalarSegmentFeatures — base executeCCL() with type-dispatched comparator

Config	Before (DFS)	After (CCL)	Speedup
In-core	0.36s	0.23s	1.6x
OOC	>1500s (timeout)	12.9s	>115x

EBSDSegmentFeatures — base executeCCL() with quaternion misorientation

Config	Before (DFS)	After (CCL)	Speedup
In-core	0.77s	0.62s	1.2x
OOC	>1500s (timeout)	35.9s	>42x

CAxisSegmentFeatures — base executeCCL() with c-axis angle

Config	Before (DFS)	After (CCL)	Speedup
In-core	0.60s	0.55s	~1.1x
OOC	>1500s (timeout)	32.8s	>46x

FillBadData — BFS (in-core) / 4-phase CCL with on-disk deferred fill (OOC)

Config	Before (BFS)	After (CCL)	Speedup	Notes
In-core	0.18s	0.28s	0.6x	CCL adds ~0.1s overhead; BFS still used for in-core
OOC	6.02s	6.05s	~1.0x	Equivalent speed, O(slice) RAM instead of O(N)

FillBadData's OOC baseline was already fast (6s), so the optimization is primarily RAM reduction (O(N) → O(slice)) rather than speed.

Group Summary

Filter	OOC Speedup	In-Core Impact	Key Benefit
IdentifySample	203x	1.4x faster	Eliminated random access flood-fill
ScalarSegmentFeatures	>115x	1.6x faster	Chunk-sequential CCL replaces random DFS
EBSDSegmentFeatures	>42x	1.2x faster	Same CCL base, misorientation math dominates
CAxisSegmentFeatures	>46x	~1.1x faster	Same CCL base, c-axis angle math dominates
FillBadData	~1.0x	0.6x slower (CCL)	RAM: O(N) → O(slice); BFS still used in-core

Test Plan

All existing correctness tests pass on both in-core and OOC configurations
Both BFS and CCL algorithm paths tested via ForceOocAlgorithmGuard + GENERATE(false, true)
200x200x200 benchmark tests pass on both configurations
OOC verified via "chunk shape:" printouts in verbose test output

Add reusable AlgorithmDispatch.hpp utility with IsOutOfCore(), AnyOutOfCore(), ForceOocAlgorithm(), ForceOocAlgorithmGuard, and DispatchAlgorithm<InCore, OOC>() so filters can dispatch to separate in-core and out-of-core algorithm implementations at runtime. Includes documentation in docs/AlgorithmDispatch.md. No filters are using this infrastructure yet — it is provided as reusable scaffolding for future OOC optimization work.

Consolidate OOC filter optimizations from identify-sample-optimizations worktree: - Add AlgorithmDispatch.hpp and UnionFind.hpp utilities - SegmentFeatures: Add executeCCL() with 2-slice rolling buffer + Union-Find - ScalarSegmentFeatures: CCL dispatch + CompareFunctor::compare() - EBSDSegmentFeatures: CCL dispatch + isValidVoxel/areNeighborsSimilar - CAxisSegmentFeatures: CCL dispatch + isValidVoxel/areNeighborsSimilar - Tests: PreferencesSentinel, ForceOocAlgorithmGuard, 200^3 benchmarks

Update IdentifySample and FillBadData to use the AlgorithmDispatch BFS/CCL split pattern instead of monolithic inlined algorithms. Add BFS/CCL split files, update tests with ForceOocAlgorithmGuard, PreferencesSentinel, and 200x200x200 benchmark test cases.

joeykleingers · 2026-03-05T01:11:47Z

Reopening this under a different PR.

joeykleingers added 3 commits March 4, 2026 14:30

joeykleingers added the Out-of-Core label Mar 4, 2026

joeykleingers requested a review from imikejackson March 4, 2026 19:40

imikejackson changed the title ~~ENH: OOC optimizations for Group D (CCL/Segmentation filters)~~ ENH: OOC optimizations for CCL/Segmentation filters Mar 4, 2026

imikejackson changed the title ~~ENH: OOC optimizations for CCL/Segmentation filters~~ ENH: OoC optimizations for CCL/Segmentation filters Mar 4, 2026

joeykleingers closed this Mar 5, 2026

joeykleingers deleted the worktree-FilterOOCOptimizations branch March 5, 2026 01:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: OoC optimizations for CCL/Segmentation filters#1557

ENH: OoC optimizations for CCL/Segmentation filters#1557
joeykleingers wants to merge 3 commits intoBlueQuartzSoftware:developfrom
joeykleingers:worktree-FilterOOCOptimizations

joeykleingers commented Mar 4, 2026 •

edited

Loading

Uh oh!

joeykleingers commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joeykleingers commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Algorithm Details

Original Algorithm: DFS Flood-Fill

Optimized Algorithm: Chunk-Sequential CCL with Union-Find

Dispatch Strategy

Per-Filter Notes

Tradeoffs

Performance (200x200x200 programmatic datasets)

Per-Filter Results

Group Summary

Test Plan

Uh oh!

joeykleingers commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

joeykleingers commented Mar 4, 2026 •

edited

Loading