-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Hey yall,
So I was digging into the cell_boundaries function in src/cajal/sample_seg.py recently and noticed a logic issue that seems to block the usage of cajal.utilities.avg_shape_spt
I checked the 3D sampling logic (sample_mesh.py and sample_swc.py) and those appear to handle their structures differently, so this seems isolated to any applications of Cajal towards 2D masks
Since the code uses skimage.measure.find_contours, the returned boundaries are closed loops where the first and last points are identical (pts[0] == pts[-1]) but the current sampling line includes both endpoints:
# src/cajal/sample_seg.py
indices = np.linspace(0, boundary_pts.shape[0] - 1, n_sample)Because np.linspace defaults to endpoint=True this ends up resulting in us sampling the geometric "start/end" point twice (once at index 0 and once at the end) which leads to:
- Sampling Bias: as it creates a non-uniform distribution of points with an artificial cluster at the "seam" of the polygon, skewing GW distances
- Runtime Errors: This is what I ran into and led me on this. Since both endpoints are included it results in a pairwise distance of
0.0in the output distance matrices which breakscajal.utilities.avg_shapebecause thestep_sizefunction returns0causing a ZeroDivisionError when the code attempts to normalize the matrix (medoid_matrix / step_size)
I patched my version of CAJAL by treating the boundary as a periodic cycle and excluding the endpoint:
# Fixed line 83 in src/cajal/sample_seg.py
indices = np.linspace(0, boundary_pts.shape[0] - 1, n_sample, endpoint=False)This fixed the runtime issue for me. But let me know what yall think