ANNEXA: Analysis of Nanopore with Nextflow for EXtended Annotation

⚠️ Code has been moved to IGDRIon/ANNEXA ⚠️

ANNEXA: Analysis of Nanopore with Nextflow for EXtended Annotation

Introduction

ANNEXA is an all-in-one reproductible pipeline, written in the Nextflow, which allows users to analyze LR-RNAseq sequences from Oxford Nanopore Technologies (ONT), and to reconstruct and quantify known and novel genes and isoforms.

Pipeline summary

ANNEXA works by using only three parameter files (a reference genome, a reference annotation and mapping files) and provides users with an extended annotation distinguishing between novel protein-coding (mRNA) versus long non-coding RNAs (lncRNA) genes. All known and novel gene/transcript models are further characterized through multiple features (length, number of spliced transcripts, normalized expression levels,...) available as graphical outputs.

Check if the input annotation contains all the information needed.
Transcriptome reconstruction and quantification with bambu.
Novel classification with FEELnc.
Retrieve information from input annotation and format final gtf with 3 level structure: gene -> transcript -> exon.
Filter novel transcripts based on bambu and/or TransforKmers Novel Discovery Rates.
Perform a quality control of both the full and filtered extended annotations (see example).
Optional: Check gene body coverage with RSeQC.

This pipeline has been tested with reference annotation from Ensembl and NCBI-RefSeq.

Usage

Install Nextflow
Test the pipeline on a small dataset

nextflow run mlorthiois/ANNEXA \
    -profile test,conda

Run ANNEXA on your own data (change input, gtf, fa with path of your files).

nextflow run mlorthiois/ANNEXA \
    -profile {test,docker,singularity,conda,slurm} \
    --input samples.txt \
    --gtf /path/to/ref.gtf \
    --fa /path/to/ref.fa

The input parameter takes a file listing the bams to analyze (see example below)

/path/to/1.bam
/path/to/2.bam
/path/to/3.bam

Options

Required:
--input             : Path to file listing paths to bam files.
--fa                : Path to reference genome.
--gtf               : Path to reference annotation.


Optional:
-profile test       : Run annexa on toy dataset.
-profile slurm      : Run annexa on slurm executor.
-profile singularity: Run annexa in singularity container.
-profile conda      : Run annexa in conda environment.
-profile docker     : Run annexa in docker container.

--filter            : Perform or not the filtering step. false by default.
--tfkmers_tokenizer : Path to TransforKmers tokenizer. Required if filter activated.
--tfkmers_model     : Path to TransforKmers model. Required if filter activated.
--bambu_threshold   : bambu NDR threshold below which new transcripts are retained.
--tfkmers_threshold : TransforKmers NDR threshold below which new transcripts are retained.
--operation         : Operation to retained novel transcripts. "union" retain tx validated by either bambu or transforkmers, "intersection" retain tx validated by both.

--withGeneCoverage  : Run RSeQC (can be long depending on annotation and bam sizes). False by default.

--maxCpu            : max cpu threads used by ANNEXA. 8 by default.
--maxMemory         : max memory used by ANNEXA. 40GB by default.

If the filter argument is set to true, TransforKmers model and tokenizer paths have to be given. They can be either downloaded from the TransforKmers official repository or trained in advance by yourself on your own data.

Automatic filtering step

By activating the filtering step (--filter true), ANNEXA proposes to filter the generated extended annotation according to 2 methods:

By using the NDR proposed by bambu. This threshold includes several information such as sequence profile, structure (mono-exonic, etc) and quantification (number of samples, expression). Each transcript with an NDR below the classification threshold will be retained by ANNEXA.
By analysing the TSS of each new transcript using the TransforKmers (deep-learning) tool. Each TSS validated below a certain threshold will be retained.

The filtered annotation can be the "union" of these 2 tools, i.e. all the transcripts validated by one or two of these tools; or the "intersection", i.e. the transcripts by these 2 tools.

At the end, the QC steps are performed both on the full and filtered extended annotations.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
assets		assets
bin		bin
examples		examples
modules		modules
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
environment.yml		environment.yml
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚠️ Code has been moved to IGDRIon/ANNEXA ⚠️

ANNEXA: Analysis of Nanopore with Nextflow for EXtended Annotation

Introduction

Pipeline summary

Usage

Options

Automatic filtering step

About

Uh oh!

Releases 3

Contributors 2

Uh oh!

Languages

mlorthiois/ANNEXA

Folders and files

Latest commit

History

Repository files navigation

⚠️ Code has been moved to IGDRIon/ANNEXA ⚠️

ANNEXA: Analysis of Nanopore with Nextflow for EXtended Annotation

Introduction

Pipeline summary

Usage

Options

Automatic filtering step

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Contributors 2

Uh oh!

Languages