Yet another snakemake workflow for ATAC-seq data processing. This pipeline was created from code developed by:
- Crazy Hot Tommy!'s many instructional guides
- TOBIAS ATAC-seq footprinting Snakemake workflow
For SLURM setup we reference:
- jdblischak/smk-simple-slur repo for simple submitting snakemake on SLURM
- Tessa Pierce blog for example templates
Snakemake pipelines promote experimental reproducibility. For this project, you should have the following inputs customized for your analysis:
- A config.yaml that describes the run parameters and location of reference data.
- A tab-delimited sample meta file file that describes the experiments to download from SRA and how to group them.
- A unique output directory.
A detailed overview of the steps in the ATAC-seq data processing are found on the maxATAC wiki site.
This version of snakeATAC is geared towards use with maxATAC and TOBIAS for making TF binding predictions.
This pipeline uses Anaconda and Snakemake. Follow the Snakemake install instructions for the best experience. Below is a brief overview of how to install Snakemake.
Create a conda environment and download mamba:
conda create -n snakeatac -c conda-forge -c bioconda mamba snakemakeActivate the snakeatac environment:
conda activate snakeatacIn your favorite directory clone the snakeATAC repo:
git clone https://github.com/tacazares/snakeATAC.gitIf you are running this pipeline for your first time, you will need to install all the conda environments used and perform a dry-run to make sure that everything was installed right.
-
Adjust the config.yaml and the tab-delimited sample meta file for your specific experiment.
-
Change to the working directory for snakeATAC. By default, Snakemake will look for a file called
Snakefilewith the rules and run information. You can use a customeSnakefilewith the-sflag followed by the path to the file.cd ./snakeATAC/ -
Next, use the
--conda-create-envs-onlyflag to create the environments.snakemake --cores 14 --use-conda --conda-frontend mamba --conda-create-envs-only --configfile ./inputs/config.yaml
-
Test the workflow and scripts are correctly set up by performing a dry-run with the
--dry-runflag.snakemake --cores 14 --use-conda --conda-frontend mamba --configfile ./inputs/config.yaml --dry-run
The ./snakeATAC/inputs/GM12878_sample.tsv contains information for a test run to process GM12878 OMNI ATAC-seq data.
After install, you can run the full run using your favorite HPC system.
snakemake --cores 14 --use-conda --conda-frontend mamba --configfile ./inputs/config.yamlIf you want to use Snakemake to submit jobs to slurm, you will need to follow the instruction described by jdblischak/smk-simple-slur repo. The directory and scripts are included in this repository, but you will need to adjust the account information. You can also adjust any defaults that you wish to use with your job submissions. NOTE: You will need to use chmod +x status-sacct.sh to make the script executable.
Example .bat file to drive the snakeATAC workflow
#!/bin/bash
#SBATCH -D ./outputs
#SBATCH -J dmnd_snake
#SBATCH -t 96:00:00
#SBATCH --ntasks=8
#SBATCH --mem=16gb
#SBATCH --account={YOUR_ACCOUNT}
#SBATCH --output ./outputs/snakeatac-%j.out
#SBATCH --error ./outputs/snakeatac-%j.err
# Load modules
module load python/3.7-2019.10
# Load the snakemake/mamba env
source activate mamba
# go to a particular directory
cd ./snakeATAC
# make things fail on errors
set -o nounset
set -o errexit
set -x
### run your commands here!
# Develop from the below links
# https://bluegenes.github.io/snakemake-via-slurm/
# https://github.com/jdblischak/smk-simple-slurm
snakemake -s /snakeATAC/Snakefile \
--use-conda \
--conda-frontend mamba \
--configfile /snakeATAC/inputs/config.yaml \
--profile simple/