Skip to content

GP2code/SouthAfrican_PD_PRS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Polygenic Risk scores and Parkinson’s disease in South Africa: Towards Ancestry‑Informed Prediction

GP2 ❤️ Open Science 😍

DOI

License: MIT

Last Updated: July 2025


Summary

This repository houses the code, data workflows, and results associated with the manuscript titled “Polygenic risk scores and Parkinson’s disease in South Africa: Towards ancestry‑informed prediction.”

The study develops and evaluates a polygenic risk score (PRS) model tailored to a South African cohort. Key objectives include:

  • Selecting optimal p‑value thresholds for score construction.
  • Quantifying variance explained by covariates and by ancestry‑specific components.
  • Identifying variants that contribute most to risk prediction and cross‑validating them with local ancestry information.
  • Assessing predictive performance via AUC, balanced accuracy, sensitivity, and specificity.

Data Statement

  • The data were obtained from the Global Parkinson’s Genetics Program (GP2) release 7 (DOI: 10.5281/zenodo.10962119). Access can be requested through the Accelerating Medicines Partnership – Parkinson’s Disease (AMP‑PD) online application (https://www.amp‑pd.org/).
  • Requests to access the Nama datasets should be directed to Prof. Marlo Moller (marlom@sun.ac.za).

Citation

If you use this repository or find it helpful for your research, please cite the corresponding manuscript:

Polygenic risk scores and Parkinson’s disease in South Africa: Towards ancestry‑informed prediction [Step K et al., Global Parkinson’s Genetics Program (GP2), Bardien S, 2025] (DOI: pending)


Repository Orientation

  • analyses/ – scripts and notebooks used throughout the project
├── analyses
│   ├── PRS_codes.sh
│   ├── extractHaploProportions.py
│   └── AUCBoot_codes.R
└── LICENSE

Key Analyses

  1. File preparation QC, liftover, and formatting for PRSice‑2 and ancestry analyses.
  2. PRS analysis for status prediction Threshold tuning, covariate selection, and score construction.
  3. Variant contribution & local ancestry cross‑validation Ranking variant weights and intersecting top hits with local ancestry segments.
  4. Model performance assessment Computation of AUC, balanced accuracy, sensitivity, and specificity (with bootstrapped CIs).

Analysis Notebooks

Notebook / Script Description
extractHaploProportions.py Processes local ancestry (LAI) outputs into per‑individual ancestry proportion tables
PRS_codes.sh End‑to‑end pipeline: QC → threshold selection → PRSice‑2 execution → variance explained → top‑variant extraction
AUCBoot_codes.R Functions for bootstrapped AUC and other performance metrics

Software

Software Version(s) Resource URL RRID Notes
Python 3.7.0 https://www.python.org/ RRID:SCR_008394 File preparation & LAI parsing
PLINK 1.9 / 2.0 http://www.cog-genomics.org/plink/ RRID:SCR_001757 QC & recoding
R 4.2.0 https://www.r-project.org/ RRID:SCR_001905 Plotting & performance stats
PRSice‑2 2.3.3 https://choishingwan.github.io/PRSice/ RRID:SCR_017057 Core PRS computation
ADMIXTURE 1.3.0 https://dalexander.github.io/admixture/ RRID:SCR_001263 Population substructure
AUCBoot 1.0 https://cran.r-project.org/package=pROC - Bootstrapped ROC / AUC

For questions, please open an issue or contact the corresponding author.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors