Skip to content

Conversation

@ethanbb
Copy link
Contributor

@ethanbb ethanbb commented Nov 14, 2024

Description

This fixes an indexing error in the "Eliminating empty and nan components" step that occurs when using seeded CNMF. The error I was getting is as follows:

Traceback (most recent call last):
  File "/u/ethan/mesmerize-core/mesmerize_core/algorithms/cnmf.py", line 94, in run_algo
    cnm = cnm.fit(images)
          ^^^^^^^^^^^^^^^
  File "/u/ethan/CaImAn/caiman/source_extraction/cnmf/cnmf.py", line 523, in fit
    self.update_spatial(Yr, use_init=True)
  File "/u/ethan/CaImAn/caiman/source_extraction/cnmf/cnmf.py", line 915, in update_spatial
    update_spatial_components(Y, C=self.estimates.C, f=self.estimates.f, A_in=self.estimates.A,
  File "/u/ethan/CaImAn/caiman/source_extraction/cnmf/spatial.py", line 189, in update_spatial_components
    ind2_ = [ind_list[np.setdiff1d(a,ff)] if len(a) else a for a in ind2_]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/u/ethan/CaImAn/caiman/source_extraction/cnmf/spatial.py", line 189, in <listcomp>
    ind2_ = [ind_list[np.setdiff1d(a,ff)] if len(a) else a for a in ind2_]
             ~~~~~~~~^^^^^^^^^^^^^^^^^^^^
IndexError: index 1637 is out of bounds for axis 0 with size 1637

It looks like this is happening because when A_in is provided and has type bool, the following lines add indices corresponding to the spatial components (>= nr) to non-empty entries of ind2_:

ind2_ = [np.hstack((np.where(iid_)[0], nr + np.arange(f.shape[0])))
if np.size(np.where(iid_)[0]) > 0 else [] for iid_ in dist_indicator]

However, the code to remove empty components assumes that the highest original component number is nr-1:

ind_list = list(range(nr-np.size(ff)))
for i in ff:
ind_list.insert(i, 0)
ind_list = np.array(ind_list, dtype=int)
ind2_ = [ind_list[np.setdiff1d(a,ff)] if len(a) else a for a in ind2_]

To fix this, I changed this to use nr + nb. However, I'm not 100% sure that the behavior of adding indices for background components is correct in the first place, since it seems inconsistent with what happens with no masks passed in.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

Has your PR been tested?

The CNMF run I that was failing for me before now succeeds. Since there wasn't a failing test before, I'm guessing this part of the code (under the condition of seeded CNMF) wasn't being tested, which could be changed.

@pgunn
Copy link
Member

pgunn commented Dec 18, 2024

We're likely to merge this after the next release; if you can test this the best you can with a variety of settings in the meantime that'd be great. Sorry for the delay.

@ethanbb
Copy link
Contributor Author

ethanbb commented Nov 6, 2025

However, I'm not 100% sure that the behavior of adding indices for background components is correct in the first place, since it seems inconsistent with what happens with no masks passed in.

I'm pretty sure now that this is correct and the code works as designed with this fix. It's pretty clearly intentional later in update_spatial_components that the background components are included in the regression step:

# we create a memory map file if not already the case, we send Cf, a
# matrix that include background components
C_name, Y_name, folder = creatememmap(Y, np.vstack((C, f)), dview)

And this makes sense based on the problem statement in regression_ipyparallel, which fits each pixel's component weights including background components:

for each pixel i solve the problem
[A(i,:),b(i)] = argmin sum(A(i,:))
subject to
|| Y(i,:) - A(i,:)*C + b(i)*f || <= sn(i)*sqrt(T);

Furthermore, in the case where A_in is not boolean, the input background spatial components b are also included when computing the indices (if non-None), so this fix is probably sometimes necessary for this case as well. (This could probably be simplified to just adding all ones for the background components though, since I think each background component can have a nonzero weight on any pixel):

if b is None:
dist_indicator = determine_search_location(
A_in, dims, method=method, min_size=min_size, max_size=max_size, dist=dist, expandCore=expandCore, dview=dview)
else:
dist_indicator = determine_search_location(
scipy.sparse.hstack([A_in, scipy.sparse.coo_matrix(b)]), dims, method=method, min_size=min_size, max_size=max_size, dist=dist, expandCore=expandCore,
dview=dview)

I usually don't supply b_in when I use a boolean A_in, but it looks like for non-boolean A_in, b_in is required or else it's assumed that none of the background components load onto any of the pixels. I believe b and f will come in as None unless provided by the user because the initialization step is skipped, and neither is initialized in the non-bool case. Overall it seems like handling of the background components in the non-bool case when only A_in is given is not well-tested/potentially broken and should probably work more like the bool case.

By the way, in #1539, @zyyk78 you state that this line in computing_indicator removes indices >= nr for the non-bool case:

ind2_ = [iid_ if (np.size(iid_) > 0) and (np.min(iid_) < nr) else [] for iid_ in ind2_]

However, this isn't true. Note that the condition is np.min(iid_) < nr - the minimum index can't be one of the background components, which means the component can't be all background, but it can include background, as it would be expected to if a reasonable b_in is provided.

@zyyk78
Copy link
Contributor

zyyk78 commented Nov 6, 2025

However, this isn't true. Note that the condition is np.min(iid_) < nr - the minimum index can't be one of the background components, which means the component can't be all background, but it can include background, as it would be expected to if a reasonable b_in is provided.

To be honest, this ind2_ may include background components, and this may cause errors like it .

But i don't understand why i didn't meet this while using without binary masks (maybe removing empty component is very rare in that case ?)

@ethanbb
Copy link
Contributor Author

ethanbb commented Nov 6, 2025

But i don't understand why i didn't meet this while using without binary masks (maybe removing empty component is very rare in that case ?)

I think because if you pass A_in but not b_in, it does this:

 if b is None: 
     dist_indicator = determine_search_location( 
         A_in, dims, method=method, min_size=min_size, max_size=max_size, dist=dist, expandCore=expandCore, dview=dview) 

which just looks at the spatial components and doesn't try to add any indices for the background components. But as I said above, this doesn't seem correct to me.

@pgunn
Copy link
Member

pgunn commented Nov 6, 2025

FYI, I'll keep an eye on the discussion and most likely will merge this after Manuel's current PR lands, before the next release, unless the discussion moves towards a conclusion to fix it a different way.

@ethanbb
Copy link
Contributor Author

ethanbb commented Nov 7, 2025

I think what I would like to do at some point is add tests for both the boolean and non-boolean seed cases, confirm that it currently fails for the boolean case, and see whether the results are also weird (or erroring) for the non-boolean case due to not starting with the background components included. I suspect the code for that case can be simplified to make it more like the boolean case, aside from binarizing the masks, but I want to see whether it's actually currently broken first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants