GP2 ❤️ Open Science 😍
Last Updated: July 2025
This repository houses the code, data workflows, and results associated with the manuscript titled “Polygenic risk scores and Parkinson’s disease in South Africa: Towards ancestry‑informed prediction.”
The study develops and evaluates a polygenic risk score (PRS) model tailored to a South African cohort. Key objectives include:
- Selecting optimal p‑value thresholds for score construction.
- Quantifying variance explained by covariates and by ancestry‑specific components.
- Identifying variants that contribute most to risk prediction and cross‑validating them with local ancestry information.
- Assessing predictive performance via AUC, balanced accuracy, sensitivity, and specificity.
- The data were obtained from the Global Parkinson’s Genetics Program (GP2) release 7 (DOI: 10.5281/zenodo.10962119). Access can be requested through the Accelerating Medicines Partnership – Parkinson’s Disease (AMP‑PD) online application (https://www.amp‑pd.org/).
- Requests to access the Nama datasets should be directed to Prof. Marlo Moller (marlom@sun.ac.za).
If you use this repository or find it helpful for your research, please cite the corresponding manuscript:
Polygenic risk scores and Parkinson’s disease in South Africa: Towards ancestry‑informed prediction [Step K et al., Global Parkinson’s Genetics Program (GP2), Bardien S, 2025] (DOI: pending)
analyses/– scripts and notebooks used throughout the project
├── analyses
│ ├── PRS_codes.sh
│ ├── extractHaploProportions.py
│ └── AUCBoot_codes.R
└── LICENSE
- File preparation QC, liftover, and formatting for PRSice‑2 and ancestry analyses.
- PRS analysis for status prediction Threshold tuning, covariate selection, and score construction.
- Variant contribution & local ancestry cross‑validation Ranking variant weights and intersecting top hits with local ancestry segments.
- Model performance assessment Computation of AUC, balanced accuracy, sensitivity, and specificity (with bootstrapped CIs).
| Notebook / Script | Description |
|---|---|
extractHaploProportions.py |
Processes local ancestry (LAI) outputs into per‑individual ancestry proportion tables |
PRS_codes.sh |
End‑to‑end pipeline: QC → threshold selection → PRSice‑2 execution → variance explained → top‑variant extraction |
AUCBoot_codes.R |
Functions for bootstrapped AUC and other performance metrics |
| Software | Version(s) | Resource URL | RRID | Notes |
|---|---|---|---|---|
| Python | 3.7.0 | https://www.python.org/ | RRID:SCR_008394 | File preparation & LAI parsing |
| PLINK | 1.9 / 2.0 | http://www.cog-genomics.org/plink/ | RRID:SCR_001757 | QC & recoding |
| R | 4.2.0 | https://www.r-project.org/ | RRID:SCR_001905 | Plotting & performance stats |
| PRSice‑2 | 2.3.3 | https://choishingwan.github.io/PRSice/ | RRID:SCR_017057 | Core PRS computation |
| ADMIXTURE | 1.3.0 | https://dalexander.github.io/admixture/ | RRID:SCR_001263 | Population substructure |
| AUCBoot | 1.0 | https://cran.r-project.org/package=pROC | - | Bootstrapped ROC / AUC |
For questions, please open an issue or contact the corresponding author.