The pipeline is inspired by bulkRNAseq pipeline from bioinformatics and biostatistics core at Van andel institute.
- Create a samplesheet file to execute the pipeline which should be a
csvfile following the format below:
| sample | fq1 | fq2 |
|---|---|---|
| sampleA | sampleA_R1.fastq.gz | sampleA_R2.fastq.gz |
- Execute the pipeline. Following steps/tools will be executed.
fastqcon each sample - raw fastq filesfastpto trim adapter sequences and low quality reads- below options used for
fastp--adapter_fasta $adapter
- below options used for
fastq_screenon R1 fastq files to detect possible contaminants.preseqfor library complexityqualimapfor gene body coverage plotsortMeRNAfor rRNA detection
multiqcfor summarizing the output files of the qc tools- Reads alignment using
STARwithquantMode GeneCountsoption to generate a gene count matrix - Transcription quantification using
Salmon
Adjust the configuration files such as bulk_rnaseq_conf/run.config and cluster.config. After that,
sbatch run_bulk_rnaseq.slurm
- clustter configuration ->
cluster.config - location of reference genome ->
reference.config,STARandsalmonused. - singularity image file path ->
processes.config run.configfor location of samplesheet and turn on/offRibodetectorfor rRNA removal,salmonandtpm calculator
strand info: try https://github.com/signalbash/how_are_we_stranded_here
https://github.com/igordot/genomics/blob/master/notes/rna-seq-strand.md
reverse strand for Illumina TruSeq Stranded Total RNA
https://dbrg77.wordpress.com/2015/03/20/library-type-option-in-the-tuxedo-suite/
