Skip to content

find.overlapping.alignments.R

Simon Crameri edited this page Apr 3, 2022 · 4 revisions

Description

Identify target regions with physical overlap based on BLAST+ hits among alignment consensus sequences.

Usage

find.overlapping.alignments.R <file> <BOOLEAN> <string> <string> \
                              <BOOLEAN> <positive integer> <BOOLEAN> \
                              <positive integer> <positive fraction>

Dependencies


Arguments

# Required
1) <bf|CHR>:                Path to filtered blast vs. self results (*.blast.filtered).
     
# Optional [DEFAULT] (if one or more are given, they need to be given in this order)
2) <check.lg|BOOLEAN>:      If TRUE, checks linkage group (LG) conformity based on query and subject
                            identifiers, string.lg.before and string.lg.after [DEFAULT: TRUE].
3) <string.lg.before|CHR>:  Unique string in locus name that comes just BEFORE the linkage group ID
                            [DEFAULT: 'LG_'].
4) <string.lg.after|CHR>:   String in locus name that comes just AFTER the linkage group ID
                            [DEFAULT: '_'].
5) <check.overlap|BOOLEAN>: If TRUE, checks whether hits are at alignment ends based on q/s.start/end
                            and query/subject.length and <tol>/<both.ends> [DEFAULT: TRUE].
6) <tol|NUM>:               Alignments ending at <tol> basepairs from a query/subject start/end end
                            will still be considered as being at an alignment end [DEFAULT: 5].
7) <both.ends|BOOLEAN>:     If TRUE, both query and subject alignments need to be at an alignment end ;
                            if FALSE, one is enough [DEFAULT: FALSE].
8) <min.alnlen|NUM>:        Minimum alignment length for hit to be considered [DEFAULT: 0].
9) <min.percid|NUM>:        Minimum percent identity for hit to be considered [DEFAULT: 0].

Details

Arguments 2) - 7) are used to only consider hits between alignment ends located on the same genomic region (LG = linkage group). The genomic region is determined from the target region name. If such information is missing, argument 2) needs to be set to FALSE.

Value

A file *.overlap, detailling the overlaps between target regions.

A file *.list, listing overlapping target regions to be merged.

Examples

## identify consensus sequence (alignment) overlaps assuming that these two sequences are 
## on the same linkage group (01):
# '>NC_033804.1_LG_01_10292880_10292979_ID_658.merged'
# '>NC_033804.1_LG_01_12382290_12382509_ID_1154'

find.overlapping.alignments.R consensus.vs.self.blast.filtered TRUE 'LG_' '_'

Clone this wiki locally