Skip to content

Error in quality control barplots, if sample number is divisible by sample.batch.size #61

@FranziskaKimmig

Description

@FranziskaKimmig

Dear RnBeads developers,

Thank you so much for all the work you put into this package!

I have encountered the following error during a preprocessing run of my newest dataset. (The full log file is linked below.)

> ##### Run Analysis
> rnb.run.analysis(dir.reports=report.dir,
+                  sample.sheet=sample.annotation,
+                  data.dir=idat.dir,
+                  data.type=data.type)
2025-01-23 13:26:24     1.7  STATUS STARTED RnBeads Pipeline
2025-01-23 13:26:24     1.7    INFO     Analysis Title: xxx
2025-01-23 13:26:24     1.7    INFO     Initialized report index and saved to index.html
2025-01-23 13:26:24     1.7  STATUS     STARTED Loading Data
2025-01-23 13:26:24     1.7    INFO         Number of cores: 1
2025-01-23 13:26:24     1.7    INFO         Loading data of type "idat.dir"
2025-01-23 13:26:24     1.7  STATUS         STARTED Loading Data from IDAT Files
2025-01-23 13:26:24     1.7    INFO             Added column barcode to the provided sample annotation table
2025-01-23 13:26:27     1.7    INFO             Detected platform: MethylationEPIC
2025-01-23 13:47:12     2.3  STATUS         COMPLETED Loading Data from IDAT Files
2025-01-23 14:46:12     3.2  STATUS         Loaded data from /xxx/
2025-01-23 14:46:28     9.7  STATUS         Added data loading section to the report
2025-01-23 14:46:28     9.7  STATUS         Loaded 500 samples and 866895 sites
2025-01-23 14:46:28     9.7    INFO         Output object is of type RnBeadRawSet
2025-01-23 14:46:28     9.7  STATUS     COMPLETED Loading Data
2025-01-23 15:21:37     3.2    INFO     Initialized report index and saved to index.html
2025-01-23 15:21:54     3.2  STATUS     STARTED Quality Control
2025-01-23 15:21:54     3.2    INFO         Number of cores: 1
2025-01-23 15:21:54     3.2  STATUS         STARTED Quality Control Section
2025-01-23 15:22:21     3.2  STATUS             Added quality control box plots
Error in qc(rnb.set)$Cy3[, sample.subset, drop = FALSE] : 
  subscript out of bounds
Calls: rnb.run.analysis ... FUN -> lapply -> lapply -> FUN -> lapply -> lapply -> FUN
Execution halted

I believe that this error occurs when add.qc.barplots() calls the function rnb.plot.control.barplot() in qualityControl.R, L412-L456. My dataset consists of exactly 500 samples and in this case the sample subsets, that are computed by add.qc.barplots(), are out of bounds considering my sample numbers with the default sample.batch.size==50

RnBeads/R/qualityControl.R

Lines 412 to 422 in 3ab7825

nsamp<-length(samples(object))
plot.names<-NULL
if(nsamp %% sample.batch.size==1){
sample.batch.size<-sample.batch.size-5
}
portion.starts<-0:(nsamp %/% sample.batch.size)*sample.batch.size+1
portion.ends<-portion.starts+sample.batch.size-1
portion.ends[length(portion.ends)]<-nsamp
portions<-paste(portion.starts, portion.ends, sep="-")

# nsamp==500,
# sample.batch.size==50
> portions
[1] "1-50"    "51-100"  "101-150" "151-200" "201-250" "251-300" "301-350" "351-400" "401-450" "451-500" "501-500"

In the following code (lines 424 to 456) rnb.plot.control.barplot() is called for all portions in turn, which gives the subscript out of bounds error above for sample.subset == portions[11]:

green<-qc(rnb.set)$Cy3[,sample.subset, drop=FALSE]

Looking at the code this error would occur for all sample sizes divisible by sample.batch.size, in this case 50. My suggestion would be to change that part for example to

nsamp<-length(samples(object))
plot.names<-NULL

if(nsamp %% sample.batch.size == 0){
  portion.starts<-0:(nsamp %/% sample.batch.size - 1)*sample.batch.size+1
}else if (nsamp %% sample.batch.size != 0){
  portion.starts<-0:(nsamp %/% sample.batch.size)*sample.batch.size+1
}
portion.ends<-portion.starts+sample.batch.size-1
portion.ends[length(portion.ends)]<-nsamp
portions<-paste(portion.starts, portion.ends, sep="-")

I have checked that all the iDat files are present and are loaded properly (expected dimensions and no NAs in qc(rnb.set)$Cy3 in rnbSet_unnormalized). The error is reproduced when I split my samples into two batches of 250 samples each for preprocessing, but it runs without error when I use 499 out of my 500 samples. So I am confident that there is no faulty file among the raw data files. As you can see in sessionInfo.txt I used the older version RnBeads 2.20.0, but the code snippets are taken from the most recent version.

Thank you in advance for your help!

sessionInfo.txt
Log_RnBeads_preprocessing.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions