Skip to content

STACAS semi-supervised integration completes but during anchor finding throws Error in if (totalCols == 0) return(NULL) : argument is of length zero #34

@lcoletto

Description

@lcoletto

Hi,
I am running STACAS on a large Seurat object (≈281k cells) with some batches and very unbalanced sample sizes and I would like to clarify whether the behaviour I observe is expected or indicates a problem in the integration.
Below I summarize three different approaches I tried. In all cases, the pipeline runs to completion, but in some cases STACAS prints an error during anchor finding, even though the integration and downstream UMAP are produced.

Dataset / setup
Seurat / SeuratObject recently updated to SeuratV5, RNA assay, log-normalized
Total cells of the object: ~281,600
Number of samples (orig.ident, used as batch): 57
Highly unbalanced batches
11/57 samples have < 1,000 cells
Smallest sample: 265 cells
Largest sample: 13,669 cells
anchor.features = 1000

Cell labels used for semi-supervised mode
"clusters" metadata contains ~38 annotated clusters
Annotated cells: ~38,800
Unannotated cells (NA): 242,828 (majority of the dataset)

Method 1 – Stepwise STACAS, semisupervised, ndim = 28
ndim was chosen based on PCA variance (95% cumulative variance).

library(STACAS)
nfeatures = 1000
ndim = 28
obj.list <- SplitObject(All1, split.by = "orig.ident")
for (n in 1:length(obj.list)) {
  print(n)
  print(obj.list[[n]])
  Idents(obj.list[[n]]) <- "clusters"
}

stacas_anchors <- FindAnchors.STACAS(obj.list, 
                                     anchor.features = nfeatures,
                                     dims = 1:ndim, 
                                     cell.labels = "clusters")

st1 <- SampleTree.STACAS(
  anchorset = stacas_anchors,
  obj.names = names(obj.list))
object_integrated <- IntegrateData.STACAS(stacas_anchors,
                                          sample.tree = st1,
                                          dims=1:ndim)

object_integrated <- object_integrated %>% ScaleData() %>%
  RunPCA(npcs=28) %>% RunUMAP(dims=1:28)

This finishes successfully, but during FindAnchors.STACAS I observe errors like:
Error in if (totalCols == 0) return(NULL) : argument is of length zero
The pipeline does not stop and produces an integrated object.

Method 2: One-liner Run.STACAS, semi-supervised, ndim = 20

library(STACAS)
nfeatures = 1000
ndim = 20
Idents(All1) = "clusters"
object_integrated1 <- All1 %>% SplitObject(split.by = "orig.ident") %>%Run.STACAS(dims = 1:ndim, anchor.features = nfeatures, cell.labels = "clusters") %>% RunUMAP(dims = 1:ndim) 

This also finishes to the end, but I again see the same error message during the run:
Warning: sparse->dense coercion: allocating vector of size 1.0 GiBWarning: pseudoinverse used at -2.2162Warning: neighborhood radius 0.30103Warning: reciprocal condition number 1.4523e-14Preparing PCA embeddings for objects...
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=44s
|+++++++ | 13% ~07h 26m 18s Error in if (totalCols == 0) return(NULL) : argument is of length zero
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=06h 03m 47s

Method 3: One-liner Run.STACAS, unsupervised, ndim = 20

object_integrated2 <- All1 %>% SplitObject(split.by = "orig.ident") %>%Run.STACAS(dims = 1:ndim, anchor.features = nfeatures) %>% RunUMAP(dims = 1:ndim) 

This version:
finishes without explicit errors
produces an integrated object and UMAP

Questions
Is it expected that:
STACAS completes even when FindAnchors.STACAS encounters cases where totalCols == 0?
When using cell.labels, does this error indicate that some batch pairs have no compatible anchors and are effectively skipped?
Is there a recommended way to diagnose which datasets or batch pairs fail to form anchors?
For large, heterogeneous datasets, is semi-supervised STACAS still recommended, or should the unsupervised mode be preferred?

Thank you very much for your help, and for developing STACAS!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions