Skip to content

tawaunl/scRNAseq_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scRNA-seq Snakemake Pipeline

This repository contains a fully reproducible pipeline for preprocessing, quality control, doublet detection, and basic downstream analysis of single-cell RNA-seq data using Snakemake and Docker.


🔧 Features

  • Environment-controlled via Docker
  • Flexible Snakemake workflow driven by a metadata.csv
  • Species-aware processing (human or mouse)
  • FASTQ → Cell Ranger → CellBender → scDblFinder → Seurat QC
  • Optional output as Seurat, SingleCellExperiment, or AnnData
  • Clustering and marker gene identification

📁 Directory Structure

├── config/
│   ├── config.yaml           # Pipeline configuration
│   └── metadata.csv          # Sample metadata (sample ID, species, fastq path)
├── data/                     # FASTQ files per sample
├── results/                  # All pipeline outputs
│   ├── cellranger/
│   ├── cellbender/
│   ├── fastqc/
│   ├── r_analysis/
│   └── scdblfinder/
├── scripts/                  # R scripts
├── Snakefile                 # Main Snakemake pipeline
├── Dockerfile                # Environment definition

📥 Requirements

  • Docker
  • Snakemake (or run inside the container)
  • FASTQ files named: sampleID_R1.fastq.gz, sampleID_R2.fastq.gz

🧾 Input Files

1. metadata.csv

sample,species,fastq_path
sample1,human,data/fastq/sample1
sample2,mouse,data/fastq/sample2
  • species: either human or mouse
  • fastq_path: path to folder with paired-end FASTQ files

2. config.yaml

genomes:
  human: /opt/refdata-gex-GRCh38-2020-A
  mouse: /opt/refdata-gex-mm10-2020-A

paths:
  metadata: config/metadata.csv
  output_dir: results
  r_script: scripts/scRNA_qc_plots.R

🚀 Running the Pipeline

Option A: Inside Docker

  1. Build the image:
docker build -t scrna_pipeline .
  1. Run the container:
docker run --rm -v $PWD:/workspace -w /workspace -e COMBINED_OUTPUT_TYPE=sce scrna_pipeline snakemake --cores 8

Option B: Local (outside Docker)

Ensure R, Snakemake, Cell Ranger, and CellBender are installed and on your PATH.

export COMBINED_OUTPUT_TYPE=anndata  # or 'sce'
snakemake --cores 8

📊 Outputs

  • QC plots: results/r_analysis/qc_report.pdf
  • Clustered Seurat object: combined_seurat.rds
  • Markers table: markers.csv
  • Optional: combined.rds (SCE) or combined.h5ad (AnnData)

🧪 Pipeline Steps

  1. FastQC (raw)
  2. Cell Ranger alignment
  3. CellBender background removal
  4. FastQC (post-cleaning)
  5. scDblFinder doublet prediction
  6. Seurat QC + UMAP + clustering
  7. Merge & output: AnnData / SCE
  8. Clustering & marker detection

🔁 Customization

To change final output format:

export COMBINED_OUTPUT_TYPE=sce    # or "anndata"

To change clustering resolution or genes used for PCA, edit scRNA_qc_plots.R.


📫 Contact

Tawaun Lucas
Postdoctoral Fellow, Genentech
tawanl@gmail.com

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published