Bayesian LOgistic REgression
A tool for meta-analysis in GWAS using Bayesian multiple logistic regression
B-LORE is a command line tool that creates summary statistics from multiple logistic regression on GWAS data, and combines the summary statistics from multiple studies in a meta-analysis. It can also incorporate functional information about the SNPs from other external sources. Several genetic regions, or loci are preselected for analysis with B-LORE.
- Association probability: B-LORE outputs probabilities of the input genetic loci being statistically associated with the phenotype.
- Finemapping: B-LORE also outputs the probability of each SNP being statistically associated with the phenotype.
- Leverage functional genomic data as a prior probability to improve prioritization.
- Models data with logistic regression, and is suited for case/control studies.
- Combines information over all SNPs in a locus with multiple regression.
B-LORE is written in python and C++. To run B-LORE, you will need
- python version 3.4 or higher,
- the Python packages for scientific computing NumPy and SciPy.
- C++ compiler
To use B-LORE, you have to download the repository and compile the C++ shared libraries:
git clone https://github.com/soedinglab/b-lore.git
cd b-lore
make
The Makefile uses g++ by default, which you can change depending on the compiler available on your system.
For calculating summary statistics, it uses the following file formats as input:
- Genotype files in Oxford format, for all loci of interest (e.g. Locus001.gen, Locus002.gen, etc.).
- Sample file in Oxford format (e.g. study1.sample)
For meta-analysis, it uses the following input:
- Output files B-LORE summary statistics.
- List of loci to be analyzed. This is a single file containing 2 columns with no header. The first column lists the name of the loci (e.g. Locus001, Locus002, etc.) and the second column is a binary number (1 or 0) indicating if it is a SNP locus (1) or a covariate locus (0). [Note: The summary statistics at each study outputs this file]
- (Optional) Functional genomics data, separately for each locus.
Each feature file contains 2 parts:
(a) a header line detailing the names of the columns in the file, and
(b) a line for each SNP detailing the information for that SNP.
The columns are tab-separated.
The annotation tracks are present from column 4 onwards.
The first 3 columns are:
- RSID: must have the same SNP identifier as in the genotype files
- CHR: chromosome number
- POS: base-pair position of the SNP.
- Clone the repository
cd exampletar -zxvf input.tar.gzThis will create an example input folder, with genotypes at 20 loci for 3 populations, a sample file for each population and ENCODE data for the 20 loci../commands.shto run B-LORE on the 3 populations to generate summary statistics, followed by a meta-analysis.
An executable file to run B-LORE is provided as bin/blore. This can used as follows:
blore [--help] [COMMAND] [OPTIONS]
There are 2 commands for B-LORE:
--summary: for creating summary statistics of individual studies.--meta: for meta-analysis from summary statistics of multiple studies.
Each of these 2 commands takes different options, as described below.
Create summary statistics of individual studies. Valid options are:
| Option | Description | Priority | Default value |
|---|---|---|---|
| ‑‑gen filename(s) | Input genotype file(s), all loci should have separate genotype files and specified here (wildcards allowed) | Required | -- |
| ‑‑sample filename | Input sample file | Required | -- |
| ‑‑pheno string | Name of the phenotype as it appears in the header of the sample file | Optional | pheno |
| ‑‑regoptiom | If specified, the variance of the regularizer will be optimized, otherwise it will be N(0, σ2) where σ is specified by --reg |
Optional | -- |
| ‑‑reg float | Value of the standard deviation (σ) of the regularizer | Optional | 0.01 |
| ‑‑pca int | Number of principal components of the genotype to be included as covariates | Optional | 0 |
| ‑‑cov string(s) | Name of covariate(s) as they appears in the header of the sample file, multiple covariates can be specified as space-separated strings | Optional | None |
| ‑‑out directory | Name of the output directory where summary statistics will be created | Optional | directory of the genotype files |
| ‑‑prefix string | Prefix for the summary statistics files | Optional | _summary |
Perform meta-analysis from summary statistics of multiple studies. Valid options are:
| Option | Description | Priority | Default value |
|---|---|---|---|
| ‑‑input filename | Input file containing list of loci to be analyzed together | Required | -- |
| ‑‑statdir filename(s) | Input directory of B-LORE summary statistics | Required | -- |
| ‑‑feature filename(s) | Input file(s) for genomic feature tracks | Optional | -- |
| ‑‑params floats | Initial values of the hyperparameters, requires 4 space-separated floats corresponding to βπ μ σ σbg | Optional | 0.01 0.0 0.01 0.01 |
| ‑‑muvar | If specified, μ will be optimized, otherwise it will be fixed to the initial value (default 0) | Optional | -- |
| ‑‑zmax int | Maximum number of causal SNPs allowed | Optional | 2 |
| ‑‑out directory | Name of the output directory where result files will be created | Optional | current directory |
| ‑‑prefix string | Prefix for the meta-analysis output files | Optional | _meta |
- Clone the repository
cd exampletar -zxvf input.tar.gzThis will create an example input folder, with genotypes at 20 loci for 3 populations, a sample file for each population and ENCODE data for the 20 loci.
View commands.sh in your favorite editor to see the commands, and execute ./commands.sh to run B-LORE on the 3 populations to generate summary statistics, followed by a meta-analysis.
- Saikat Banerjee, Lingyao Zeng, Heribert Schunkert and Johannes Soeding (2017). Bayesian multiple logistic regression for case-control GWAS. bioRxiv.
B-LORE is released under the GNU General Public License version 3. See LICENSE for more details. Copyright Johannes Soeding and Saikat Banerjee.
