Skip to content

kensung-lab/hypo_assembler_scripts

Repository files navigation

Input files:

  1. asm.fa - initial flye assembly
  2. l.fq.gz - long reads file
  3. r1.fq.gz - short reads file
  4. r2.fq.gz - short reads file
  5. shorts.txt - text file containing path to r1.fq.gz and r2.fq.gz
  6. l.bam - optional - l.fq.gz aligned to asm.fa

Prereqs:

  1. Python lib requirement: pysam and biopython
  2. KMC3, minimap2, samtools in path

Automated Compilation

  1. Run build_all.sh
  2. All used executables will be at directory run_all/

Manual Compilation

Compile C++ scripts on misjoin/ overlap/ and scaffold/:

mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make

Put the resulting binary (i.e. find_overlap and find_scaffold) in base directory (i.e. overlap/ and scaffold/) i.e. cp find_overlap ../ and cp find_scaffold ../ Build hypo polisher

mkdir build
cmake .. -DCMAKE_BUILD_TYPE=Release
make

Executable will be in build/bin/hypo

Running Steps

Automated Running Steps

  1. Run run_all.sh on run_all/

  2. All used executables will be at directory run_all/

Manual Running Steps

  1. Align l.fq.gz to asm.fa and sort
minimap2 -ax map-ont -t 40 asm.fa l.fq.gz | samtools view -bS | samtools sort -@ 10 -m 10G -o long_read_align.bam

Output: long_read_align.bam

  1. Run suk on shorts.txt
./suk -k 17 -i @shorts.txt -t 40 -e

Output: SUK_k17.bv

  1. Run misjoin/find_misjoin
./find_misjoin asm.fa long_read_align.bam misjoin.fa

Output: misjoin.fa

  1. Run overlap/run_overlap.sh
./run_overlap.sh -k SUK_k17.bv -i misjoin.fa -l l.fq.gz -t 40

Output: overlap.fa

  1. Realign short and long reads to overlap.fa
minimap2 -ax map-ont -t 40 overlap.fa l.fq.gz | samtools view -bS | samtools sort -@ 10 -m 10G -o overlap_long.bam
minimap2 -ax sr -t 40 overlap.fa r1.fq.gz r2.fq.gz | samtools view -bS | samtools sort -@ 10 -m 10G -o overlap_short.bam

Output: overlap_long.bam and overlap_short.bam

  1. Run hypo polisher
./hypo -d overlap.fa -s 3g -B overlap_long.bam -C 60 -b overlap_short.bam -r @shorts.txt -c 100 -t 40

Output: hypo_overlap.fa

  1. Run scaffold/run_scaffold.sh
./run_scaffold.sh -k SUK_k17.bv -i hypo_overlap.fa -l l.fq.gz -t 40

Output: scaffold_1.fa and scaffold_2.fa

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published