- asm.fa - initial flye assembly
- l.fq.gz - long reads file
- r1.fq.gz - short reads file
- r2.fq.gz - short reads file
- shorts.txt - text file containing path to r1.fq.gz and r2.fq.gz
- l.bam - optional - l.fq.gz aligned to asm.fa
- Python lib requirement: pysam and biopython
- KMC3, minimap2, samtools in path
- Run build_all.sh
- All used executables will be at directory run_all/
Compile C++ scripts on misjoin/ overlap/ and scaffold/:
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make
Put the resulting binary (i.e. find_overlap and find_scaffold) in base directory (i.e. overlap/ and scaffold/) i.e. cp find_overlap ../ and cp find_scaffold ../ Build hypo polisher
mkdir build
cmake .. -DCMAKE_BUILD_TYPE=Release
make
Executable will be in build/bin/hypo
-
Run run_all.sh on run_all/
-
All used executables will be at directory run_all/
- Align l.fq.gz to asm.fa and sort
minimap2 -ax map-ont -t 40 asm.fa l.fq.gz | samtools view -bS | samtools sort -@ 10 -m 10G -o long_read_align.bam
Output: long_read_align.bam
- Run suk on shorts.txt
./suk -k 17 -i @shorts.txt -t 40 -e
Output: SUK_k17.bv
- Run misjoin/find_misjoin
./find_misjoin asm.fa long_read_align.bam misjoin.fa
Output: misjoin.fa
- Run overlap/run_overlap.sh
./run_overlap.sh -k SUK_k17.bv -i misjoin.fa -l l.fq.gz -t 40
Output: overlap.fa
- Realign short and long reads to overlap.fa
minimap2 -ax map-ont -t 40 overlap.fa l.fq.gz | samtools view -bS | samtools sort -@ 10 -m 10G -o overlap_long.bam
minimap2 -ax sr -t 40 overlap.fa r1.fq.gz r2.fq.gz | samtools view -bS | samtools sort -@ 10 -m 10G -o overlap_short.bam
Output: overlap_long.bam and overlap_short.bam
- Run hypo polisher
./hypo -d overlap.fa -s 3g -B overlap_long.bam -C 60 -b overlap_short.bam -r @shorts.txt -c 100 -t 40
Output: hypo_overlap.fa
- Run scaffold/run_scaffold.sh
./run_scaffold.sh -k SUK_k17.bv -i hypo_overlap.fa -l l.fq.gz -t 40
Output: scaffold_1.fa and scaffold_2.fa