scRNA-seq Snakemake Pipeline

This repository contains a fully reproducible pipeline for preprocessing, quality control, doublet detection, and basic downstream analysis of single-cell RNA-seq data using Snakemake and Docker.

🔧 Features

Environment-controlled via Docker
Flexible Snakemake workflow driven by a metadata.csv
Species-aware processing (human or mouse)
FASTQ → Cell Ranger → CellBender → scDblFinder → Seurat QC
Optional output as Seurat, SingleCellExperiment, or AnnData
Clustering and marker gene identification

📁 Directory Structure

├── config/
│   ├── config.yaml           # Pipeline configuration
│   └── metadata.csv          # Sample metadata (sample ID, species, fastq path)
├── data/                     # FASTQ files per sample
├── results/                  # All pipeline outputs
│   ├── cellranger/
│   ├── cellbender/
│   ├── fastqc/
│   ├── r_analysis/
│   └── scdblfinder/
├── scripts/                  # R scripts
├── Snakefile                 # Main Snakemake pipeline
├── Dockerfile                # Environment definition

📥 Requirements

Docker
Snakemake (or run inside the container)
FASTQ files named: sampleID_R1.fastq.gz, sampleID_R2.fastq.gz

🧾 Input Files

1. `metadata.csv`

sample,species,fastq_path
sample1,human,data/fastq/sample1
sample2,mouse,data/fastq/sample2

species: either human or mouse
fastq_path: path to folder with paired-end FASTQ files

2. `config.yaml`

genomes:
  human: /opt/refdata-gex-GRCh38-2020-A
  mouse: /opt/refdata-gex-mm10-2020-A

paths:
  metadata: config/metadata.csv
  output_dir: results
  r_script: scripts/scRNA_qc_plots.R

🚀 Running the Pipeline

Option A: Inside Docker

Build the image:

docker build -t scrna_pipeline .

Run the container:

docker run --rm -v $PWD:/workspace -w /workspace -e COMBINED_OUTPUT_TYPE=sce scrna_pipeline snakemake --cores 8

Option B: Local (outside Docker)

Ensure R, Snakemake, Cell Ranger, and CellBender are installed and on your PATH.

export COMBINED_OUTPUT_TYPE=anndata  # or 'sce'
snakemake --cores 8

📊 Outputs

QC plots: results/r_analysis/qc_report.pdf
Clustered Seurat object: combined_seurat.rds
Markers table: markers.csv
Optional: combined.rds (SCE) or combined.h5ad (AnnData)

🧪 Pipeline Steps

FastQC (raw)
Cell Ranger alignment
CellBender background removal
FastQC (post-cleaning)
scDblFinder doublet prediction
Seurat QC + UMAP + clustering
Merge & output: AnnData / SCE
Clustering & marker detection

🔁 Customization

To change final output format:

export COMBINED_OUTPUT_TYPE=sce    # or "anndata"

To change clustering resolution or genes used for PCA, edit scRNA_qc_plots.R.

📫 Contact

Tawaun Lucas
Postdoctoral Fellow, Genentech
tawanl@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

scRNA-seq Snakemake Pipeline

🔧 Features

📁 Directory Structure

📥 Requirements

🧾 Input Files

1. `metadata.csv`

2. `config.yaml`

🚀 Running the Pipeline

Option A: Inside Docker

Option B: Local (outside Docker)

📊 Outputs

🧪 Pipeline Steps

🔁 Customization

📫 Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
config		config
scripts		scripts
Dockerfile		Dockerfile
README.md		README.md
Snakefile		Snakefile

tawaunl/scRNAseq_pipeline

Folders and files

Latest commit

History

Repository files navigation

scRNA-seq Snakemake Pipeline

🔧 Features

📁 Directory Structure

📥 Requirements

🧾 Input Files

1. metadata.csv

2. config.yaml

🚀 Running the Pipeline

Option A: Inside Docker

Option B: Local (outside Docker)

📊 Outputs

🧪 Pipeline Steps

🔁 Customization

📫 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. `metadata.csv`

2. `config.yaml`

Packages