Name	Name	Last commit message	Last commit date
parent directory ..
1.run_ref_prep.sh	1.run_ref_prep.sh
2.run_simulation.sh	2.run_simulation.sh
README.md	README.md
get_diff.R	get_diff.R
gtex_info.R	gtex_info.R
simulate_dispersion.R	simulate_dispersion.R

Name

Last commit message

Last commit date

2.run_simulation.sh

Long-Read Simulation Pipeline

This pipeline prepares reference data and performs long-read simulation using SQANTI-SIM. This directory contains the scripts required to simulate Long-Read transcriptomic data with controlled Differential Gene Expression (DGE), Differential Transcript Expression (DTE), and Differential Transcript Usage (DTU). It is organized into two sequential steps.

Pipeline Steps

Step 1: Reference Preparation

Script: 1.run_ref_prep.sh

This steps performs the following:

Alignment & Quantification: Aligns raw GTEx long-read data to the reference transcriptom using minimap2 and quantifies abundance using salmon.
Baseline Estimation: Runs gtex_info.R to generate baselineAbundance.rds and a list of expressed transcripts (txid.txt).
Subsetting: specific GTF, FASTA, and BED files for the identified transcripts.

Usage:

sbatch 1.run_ref_prep.sh
# Ensure this completes successfully before running Step 2

Step 2: Simulation

Script: 2.run_simulation.sh

This step performs the actual simulation:

Design: Runs sqanti-sim.py design to create the simulation index.
Differential Expression: Calls get_diff.R to establish DGE/DTU/DTE ground truth lists.
Dispersion: Calls simulate_dispersion.R to simulate biological variation across replicates.
Run Simulation: Executes sqanti-sim.py sim to generate synthetic FASTQ reads for Control and DE conditions.

Usage:

sbatch 2.run_simulation.sh

Dependencies

Input Data: Expects raw data at ../../dataset/GTeX/long-read/sequence_data/
Reference Files:
- Transcript Reference FASTA: gencode.v44.transcripts.fa (Required for Step 1)
- Genome Reference FASTA: GRCh38.primary_assembly.genome.fa (Required for Step 2)
- Annotation GTF: gencode.v44.annotation.gtf
Software & Versions:
- minimap2 (2.26-r1175)
- salmon (1.10.2)
- samtools (1.19.2)
- seqkit (2.5.1)
- SQANTI3 (5.1.2)
- SQANTI-SIM (0.2.1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Long-Read Simulation Pipeline

Pipeline Steps

Step 1: Reference Preparation

Step 2: Simulation

Dependencies

FilesExpand file tree

simulation

Directory actions

More options

Directory actions

More options

Latest commit

History

simulation

Folders and files

parent directory

README.md

Long-Read Simulation Pipeline

Pipeline Steps

Step 1: Reference Preparation

Step 2: Simulation

Dependencies