This repository contains a fully reproducible pipeline for preprocessing, quality control, doublet detection, and basic downstream analysis of single-cell RNA-seq data using Snakemake and Docker.
- Environment-controlled via Docker
- Flexible Snakemake workflow driven by a
metadata.csv - Species-aware processing (human or mouse)
- FASTQ → Cell Ranger → CellBender → scDblFinder → Seurat QC
- Optional output as Seurat, SingleCellExperiment, or AnnData
- Clustering and marker gene identification
├── config/
│ ├── config.yaml # Pipeline configuration
│ └── metadata.csv # Sample metadata (sample ID, species, fastq path)
├── data/ # FASTQ files per sample
├── results/ # All pipeline outputs
│ ├── cellranger/
│ ├── cellbender/
│ ├── fastqc/
│ ├── r_analysis/
│ └── scdblfinder/
├── scripts/ # R scripts
├── Snakefile # Main Snakemake pipeline
├── Dockerfile # Environment definition
- Docker
- Snakemake (or run inside the container)
- FASTQ files named:
sampleID_R1.fastq.gz,sampleID_R2.fastq.gz
sample,species,fastq_path
sample1,human,data/fastq/sample1
sample2,mouse,data/fastq/sample2
species: eitherhumanormousefastq_path: path to folder with paired-end FASTQ files
genomes:
human: /opt/refdata-gex-GRCh38-2020-A
mouse: /opt/refdata-gex-mm10-2020-A
paths:
metadata: config/metadata.csv
output_dir: results
r_script: scripts/scRNA_qc_plots.R- Build the image:
docker build -t scrna_pipeline .- Run the container:
docker run --rm -v $PWD:/workspace -w /workspace -e COMBINED_OUTPUT_TYPE=sce scrna_pipeline snakemake --cores 8Ensure R, Snakemake, Cell Ranger, and CellBender are installed and on your PATH.
export COMBINED_OUTPUT_TYPE=anndata # or 'sce'
snakemake --cores 8- QC plots:
results/r_analysis/qc_report.pdf - Clustered Seurat object:
combined_seurat.rds - Markers table:
markers.csv - Optional:
combined.rds(SCE) orcombined.h5ad(AnnData)
- FastQC (raw)
- Cell Ranger alignment
- CellBender background removal
- FastQC (post-cleaning)
- scDblFinder doublet prediction
- Seurat QC + UMAP + clustering
- Merge & output: AnnData / SCE
- Clustering & marker detection
To change final output format:
export COMBINED_OUTPUT_TYPE=sce # or "anndata"To change clustering resolution or genes used for PCA, edit scRNA_qc_plots.R.
Tawaun Lucas
Postdoctoral Fellow, Genentech
tawanl@gmail.com