-
Notifications
You must be signed in to change notification settings - Fork 0
find.overlapping.alignments.R
Simon Crameri edited this page Apr 3, 2022
·
4 revisions
Identify target regions with physical overlap based on BLAST+ hits among alignment consensus sequences.
find.overlapping.alignments.R <file> <BOOLEAN> <string> <string> \
<BOOLEAN> <positive integer> <BOOLEAN> \
<positive integer> <positive fraction>
# Required
1) <bf|CHR>: Path to filtered blast vs. self results (*.blast.filtered).
# Optional [DEFAULT] (if one or more are given, they need to be given in this order)
2) <check.lg|BOOLEAN>: If TRUE, checks linkage group (LG) conformity based on query and subject
identifiers, string.lg.before and string.lg.after [DEFAULT: TRUE].
3) <string.lg.before|CHR>: Unique string in locus name that comes just BEFORE the linkage group ID
[DEFAULT: 'LG_'].
4) <string.lg.after|CHR>: String in locus name that comes just AFTER the linkage group ID
[DEFAULT: '_'].
5) <check.overlap|BOOLEAN>: If TRUE, checks whether hits are at alignment ends based on q/s.start/end
and query/subject.length and <tol>/<both.ends> [DEFAULT: TRUE].
6) <tol|NUM>: Alignments ending at <tol> basepairs from a query/subject start/end end
will still be considered as being at an alignment end [DEFAULT: 5].
7) <both.ends|BOOLEAN>: If TRUE, both query and subject alignments need to be at an alignment end ;
if FALSE, one is enough [DEFAULT: FALSE].
8) <min.alnlen|NUM>: Minimum alignment length for hit to be considered [DEFAULT: 0].
9) <min.percid|NUM>: Minimum percent identity for hit to be considered [DEFAULT: 0].
Arguments 2) - 7) are used to only consider hits between alignment ends located on the same genomic region (LG = linkage group). The genomic region is determined from the target region name. If such information is missing, argument 2) needs to be set to FALSE.
A file *.overlap, detailling the overlaps between target regions.
A file *.list, listing overlapping target regions to be merged.
## identify consensus sequence (alignment) overlaps assuming that these two sequences are
## on the same linkage group (01):
# '>NC_033804.1_LG_01_10292880_10292979_ID_658.merged'
# '>NC_033804.1_LG_01_12382290_12382509_ID_1154'
find.overlapping.alignments.R consensus.vs.self.blast.filtered TRUE 'LG_' '_'
CaptureAl v0.1 Documentation