Skip to content

Update with alterantive read.bam.tags function#43

Open
fferrari wants to merge 1 commit intohms-dbmi:masterfrom
fferrari:master
Open

Update with alterantive read.bam.tags function#43
fferrari wants to merge 1 commit intohms-dbmi:masterfrom
fferrari:master

Conversation

@fferrari
Copy link
Contributor

@fferrari fferrari commented Mar 2, 2021

Hi @pkharchenko, we came across an issue with BAM files containing paired end reads.
The current read.bam.tags() function has an "if" conditional statement checking for paired end reads in the input BAM
if(any(bitwAnd(bam$flag,0x1))) { # paired-end data

However, there are 2 problems with this

  1. the code inside the if statement is taking only reads mapped on the positive strand
    posvi <- which(strm==1); rl <- list(tags=tapply(posvi,bam$rname[posvi],function(ii) as.numeric(bam$pos[ii])), flen=tapply(posvi,bam$rname[posvi],function(ii) as.numeric(abs(bam$isize[ii]))))
    and this will not be compliant with cross-correlation and peak calling algorithms which expect instead reads distributed on both strands

  2. the code handling quality scores is outside the if statement
    rl <- c(rl,list(quality=tapply(1:length(bam$pos),bam$rname,function(ii) bam$mapq[ii])))
    thus resulting in an output taglist object with twice as many quality scores as compared to actual reads positions

I'm proposing an alternative function in a pull request.
The alternative function will randomly pick only one of the reads in the pair when both are aligned, thus resulting in a taglist object with characteristics similar to a single end BAM file. however, I am not sure this was the intended purpose of the original code handling BAM files with paired end reads.

The alternative function will randomly pick only one of the reads in the pair when both are aligned, thus resulting in a taglist object with characteristics similar to a single end BAM file. however, I am not sure this was the intended purpose of the original code handling BAM files with paired end reads.
@MateuszChilinski
Copy link

Hey,

in version 1.16, when parsing pair-read data, we encountered an error where tag.scc was returning -inf and inf from some max/min functions because all tags were positive. This pull request fixes that, so I'd suggest merging to master. I think the fix presented is also appropriate. Good job! :)

For compatibility, I'd rename f_read.bam.tags to read.bam.tags though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants