TFXcan

This pipeline tests TF binding-GWAS trait associations using SNP-based predictors of TF binding.

Version:

TFXcan v3.0

Usage/Command:

conda activate /beagle3/haky/users/shared_software/TFXcan-pipeline-tools
snakemake -s snakefile.smk --configfile config/pipeline.yaml --profile profiles/simple/ --resources load=45

The --resources load=45 flag makes sure that the PredictDB part of the pipeline does not run more than 9 jobs at a time on midway3 i.e 9*5. Any number could have been used but I chose multiples of 5. If your cluster allows you to run more than 100 jobs at a time, you can up this number.

To use screen [preferred]:

screen
conda activate << conda environment >> (see software section)
export PATH=$PATH:/project2/haky/temi/software/homer/bin
snakemake -s snakefile.smk --configfile config/pipeline.yaml --profile profiles/simple/ --resources load=45

Software:

This pipeline depends on a number of software to do the following:

Finemap GWAS SNPs (SuSie); optional because you can decide not to finemap and just use the top SNPs per locus [default]
Predict with Enformer (this dependency is optional) (Enformer, GPUs, pytorch)
Train models of TF binding that is linear on SNPs (Nextflow, predictDB)
Test TF binding-GWAS trait association (PrediXcan, Summary-PrediXcan, MetaXcan)

We suggest the following to have a hitch-free environment:

Use conda to create an environment and install the software with the environment file

All of these software are self-contained in this repository. You only need to install the conda environment.

Input:

In general, the pipeline expects:

A yaml config or parameters file. Details are here
A metadata sheet of the GWAS summary statistics. Details are here
A number of files needed in the yaml config file. These can be downloaded from here. You will need to decompress this archive.

The GWAS summary statistics file should have the following columns (others headers are allowed but will be ignored):

|chrom|pos|variant_id|ref|alt|pval|zscore|beta|se|
|---|---|---|---|---|---|---|---|---|
|1|134|1_134_A_G|A|G|0.0001|0.1|0.7|0.1|

- chrom: (character or string) 1,2,3, e.t.c (No chromosomes X, Y, or M e.t.c)

- pos: (numeric) 134 (bp coordinates)

- variant_id: chrom_pos_ref_alt

- pval: (numeric) GWAS pvalues

- zscore: GWAS zscores; you can pre-calculate this from the beta and standard errors (beta/se)

The framework assumes that genomic coordinates are in hg38 coordinates
Weights file: You will need this in the config yaml file (see enpact_weights). The weights file is a dataframe of TF binding predictors. You can find and use examples from here.

Output:

The output of the pipeline is the association results of the GWAS trait with the TF binding, and it can be found in the data/.../output folder. The output is a summary ***.TFXcan.csv file of the association results file with the following columns:

|tfbs|zscore|effect_size|pvalue|var_g|pred_perf_r2|pred_perf_pval|pred_perf_qval|n_snps_used|n_snps_in_cov|n_snps_in_model|

Notes:

Updates:

[X] To predict TF/tissue binding, the pipeline takes in a dataframe of weights.

feature	TF/tissue1	TF/tissue2	...	TF/tissueN
f1	0.1	0.2	...	0.1
f2	0.1	0.2	...	0.1
...	...	...	...	...
f5313	0.1	0.2	...	0.1

[X] The pipeline now matches SNPs with the reference panel and uses the matched SNPs for the PredictDB training. This is to ensure that the SNPs used for the PredictDB training are the same as the SNPs used for the GWAS.

[X] All software necessary for TFXcan are shipped with the pipeline. You only need to install the conda environment.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
README_files/libs		README_files/libs
config		config
metadata		metadata
minimal		minimal
notebooks		notebooks
profiles/simple		profiles/simple
software		software
weights		weights
workflow		workflow
.gitignore		.gitignore
README.html		README.html
README.md		README.md
dryrun.out		dryrun.out
notes		notes
snakefile.smk		snakefile.smk
snakerun.cmd		snakerun.cmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TFXcan

Version:

Usage/Command:

To use screen [preferred]:

Software:

Input:

Output:

Notes:

Updates:

About

Uh oh!

Releases

Packages

Languages

hakyimlab/TFXcan-snakemake

Folders and files

Latest commit

History

Repository files navigation

TFXcan

Version:

Usage/Command:

To use screen [preferred]:

Software:

Input:

Output:

Notes:

Updates:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages