Whole-genome analysis pipeline

This repository contains code to do a full analysis on your whole genome. If you want to run it locally, the repo assumes you have a Linux-based OS, an 8-core CPU with 16 threads, 32GB RAM (64 recommended), 1.5TB of storage (your genome uncompressed will be ~275GB in FASTQ format, even before all the outputs from analysis). This will take about ~30 hours to run the complete end-to-end pipeline.

If you want to run this on cloud, you just need to follow the steps for running it on cloud. Before you execute, I recommend you get as many free AWS credits as possible. You can get $100 from just signing up with a new credit card + new email, then an additional $100 from completing 5 small challenges on the homescreen (message me if you can't find them).

Pipeline Overview

input: FASTQ files
    ↓
1. quality control (FastQC/MultiQC)
    ↓
2. alignment (BWA-MEM) → BAM
    ↓
3. mark duplicates (Picard)
    ↓
4. base quality score recalibration (GATK BQSR)
    ↓
5. variant calling (HaplotypeCaller)
    ↓
6. output VCF!
    ↓
7. variant filtering (GATK VariantFiltration)
    ↓
8. variant consequences (SnpEff)
    ↓
9. variant prioritization (bcftools + awk, QUAL>=30)
    ↓
10. structural variant detection (Delly: DEL/DUP/INV/BND)
    ↓
11. copy number variant detection (CNVkit)
    ↓
output: lots of information about your genome!

Analysis Options

1. CPU Analysis (`cpu/`)

Complete end-to-end CPU-based genomic analysis pipeline for running on your own Linux workstation.

Time: ~21-25 hours
Requirements: Linux workstation with 8+ cores, 32GB+ RAM, 1.5TB+ storage

View CPU Pipeline Details →

2. GPU Analysis (`gpu/`)

GPU-accelerated whole-genome analysis pipeline using GPU-native programs for faster processing.

Time: ~12-16 hours
Requirements: NVIDIA GPU with CUDA support, 8GB+ GPU memory

View GPU Pipeline Details →

3. Cloud Infrastructure (`cloud/`)

AWS cloud infrastructure deployment using Terraform for HIPAA/NIST compliant genomic analysis.

Cost: ~$300/month
Requirements: AWS account, Terraform installed

View Cloud Infrastructure Details →

Quick Start

Step 1: Configure for Your Environment

Edit CONFIG.sh:

export SAMPLE_NAME="your_sample_name"
export WORK_DIR="$HOME/wgs_data"
export THREADS=$(nproc)  # or set manually, e.g., 24
export REFERENCE_GENOME="GRCh38"
export GPU_ENABLED=true  # set to false for CPU-only

Step 2: Choose Your Analysis Method

For CPU Analysis:

source CONFIG.sh
./cpu/scripts/RUN_FULL_PIPELINE_CPU.sh

For GPU Analysis:

source CONFIG.sh
./gpu/scripts/RUN_FULL_PIPELINE_GPU.sh

For Cloud Deployment:

cd cloud/
terraform init
terraform apply

Disclaimer

This software is provided "as is" without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the software or the use or other dealings in the software.

Use at your own risk. The authors assume no responsibility for any damages, data loss, or compliance issues that may arise from the use of this configuration. Always test in a non-production environment first and consult with qualified professionals before deploying to production.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
assets		assets
cloud		cloud
cpu		cpu
gpu		gpu
.gitignore		.gitignore
CONFIG.sh		CONFIG.sh
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Whole-genome analysis pipeline

Pipeline Overview

Analysis Options

1. CPU Analysis (`cpu/`)

2. GPU Analysis (`gpu/`)

3. Cloud Infrastructure (`cloud/`)

Quick Start

Step 1: Configure for Your Environment

Step 2: Choose Your Analysis Method

Disclaimer

About

Uh oh!

Releases

Packages

Languages

License

bmwoolf/genomic_template_AWS

Folders and files

Latest commit

History

Repository files navigation

Whole-genome analysis pipeline

Pipeline Overview

Analysis Options

1. CPU Analysis (cpu/)

2. GPU Analysis (gpu/)

3. Cloud Infrastructure (cloud/)

Quick Start

Step 1: Configure for Your Environment

Step 2: Choose Your Analysis Method

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. CPU Analysis (`cpu/`)

2. GPU Analysis (`gpu/`)

3. Cloud Infrastructure (`cloud/`)

Packages