Skip to content

Sampling bias in cell_boundaries causing division by zero in avg_shape #20

@mrjholt

Description

@mrjholt

Hey yall,

So I was digging into the cell_boundaries function in src/cajal/sample_seg.py recently and noticed a logic issue that seems to block the usage of cajal.utilities.avg_shape_spt

I checked the 3D sampling logic (sample_mesh.py and sample_swc.py) and those appear to handle their structures differently, so this seems isolated to any applications of Cajal towards 2D masks

Since the code uses skimage.measure.find_contours, the returned boundaries are closed loops where the first and last points are identical (pts[0] == pts[-1]) but the current sampling line includes both endpoints:

# src/cajal/sample_seg.py
indices = np.linspace(0, boundary_pts.shape[0] - 1, n_sample)

Because np.linspace defaults to endpoint=True this ends up resulting in us sampling the geometric "start/end" point twice (once at index 0 and once at the end) which leads to:

  1. Sampling Bias: as it creates a non-uniform distribution of points with an artificial cluster at the "seam" of the polygon, skewing GW distances
  2. Runtime Errors: This is what I ran into and led me on this. Since both endpoints are included it results in a pairwise distance of 0.0 in the output distance matrices which breaks cajal.utilities.avg_shape because the step_size function returns 0 causing a ZeroDivisionError when the code attempts to normalize the matrix (medoid_matrix / step_size)

I patched my version of CAJAL by treating the boundary as a periodic cycle and excluding the endpoint:

# Fixed line 83 in src/cajal/sample_seg.py
indices = np.linspace(0, boundary_pts.shape[0] - 1, n_sample, endpoint=False)

This fixed the runtime issue for me. But let me know what yall think

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions