Skip to content

GP2code/Cross_ancestry_PRS_AMR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Cross-ancestry performance of Parkinson’s disease polygenic risk scores in admixed Latin American populations

GP2 ❤️ Open Science 😍

DOI License: MIT

Last Updated: February 2026

Summary

This repository contains all analyses for the manuscript titled "Cross-ancestry performance of Parkinson’s disease polygenic risk scores in admixed Latin American populations" by Flores et al.


Data Statement

  • Data used in the preparation of this article were obtained from the Global Parkinson’s Genetics Program (GP2; https://gp2.org).
    • GP2 data release 9 (controlled-tier access; DOI: 10.5281/zenodo.14510099)

Repository Orientation

analyses/
├── 00.Set_up.ipynb
├── 01.QC_pre_run.ipynb
├── 02.Baseline_PRS_construction.ipynb
├── 03.Ancestry-aware_PRS_construction.ipynb
│ └── 03.1.PRScsx_GP2.22.wdl
├── 04.Global_ancestry_estimation.ipynb
└── 05.Regression_models.ipynb


Notebooks Description

Directory Notebook Description
Analyses/ 00.Set_up Define paths, install/update packages, load functions, create conda environments, download target data, clean base data, prepare allele order combinations.
Analyses/ 01.QC_pre_run Prepare 1KGP samples, clean runs, exclude cohorts, check rsq, IBD analysis, PCA prep, create PLINK files, parse imputation quality, filter SNPs, remove related individuals, update rsIDs, exclude monogenic samples, export clean files, run SmartPCA.
Analyses/ 02.Baseline_PRS_construction Run PRSice (EUR/AMR bases), SBAYES RC (EUR/AMR), download annotation sets, reformat sumstats, plink scoring with sbayes sumstats, shortcuts using collaborator data.
Analyses/ 03.Ancestry-aware_PRS_construction Run PRS-CSX with multiple Phi values, concatenate outputs, test best linear combination, validate scores, run BridgePRS (prepare pheno, correct RSIDs, scoring with plink).
Analyses/ 04.Global_ancestry_estimation Use homogeneous references, prepare target + 1KGP ref sets, join PLINK sets, estimate ancestry percentages, label continental ancestry, plot composition results.
Analyses/ 05.Regression_models Fit GLMs for single-/multi-ancestry tools, calculate PseudoR2, bootstrapping CIs, odds ratios, assess performance by ancestry percentage, balance cases/controls, run quartile-based GLMs, covariate analysis, plot ROC curves.

Data Privacy

All individual-level data have been removed, including sample IDs and identifiers. Sensitive paths have been cleared. Notebooks have been saved as .ipynb files for easy sharing while ensuring compliance with data privacy standards.

Software and Tools

Software/Tool Version Resource URL RRID Notes
Python 3.10 http://www.python.org/ RRID:SCR_008394 pandas; numpy; matplotlib; forestplot used for data wrangling/plotting
Jupyter Notebook 4.3 https://jupyter.org RRID:SCR_018315 Used to keep record of the analysis pipeline
R 4.4.2 https://www.r-project.org/ RRID:SCR_001905 Used for QC scripts, glm package for regression models
PLINK 2.0 https://www.cog-genomics.org/plink/2.0/ RRID:SCR_001757 Used for association analysis, LD pruning, scoring
ADMIXTURE 1.3.0 https://dalexander.github.io/admixture/ RRID:SCR_004173 Used for ancestry estimation
MungeSumstats (R package) 1.12.2 https://github.com/MRCIEU/MungeSumstats N/A Used for harmonizing GWAS summary statistics
DescTools (R package) 0.99.60 https://cran.r-project.org/package=DescTools N/A Used for statistical utilities
boot (R package) 1.3.31 https://cran.r-project.org/package=boot N/A Used for bootstrapping confidence intervals
pROC (R package) 1.19.0.1 https://cran.r-project.org/package=pROC N/A Used for ROC curve analysis
Illumina NeuroBooster array N/A https://www.illumina.com N/A Genotyping array used for target data
UMAP N/A https://umap-learn.readthedocs.io RRID:SCR_018996 Used for dimensionality reduction
XGBoost N/A https://xgboost.readthedocs.io RRID:SCR_022964 Used for machine learning models
SmartPCA (EIGENSOFT) 6.1.4 https://github.com/DavidLawson/EIGENSOFT RRID:SCR_004965 Used for PCA in ancestry estimation
PRSice-2 2.3.5 https://www.prsice.info/ N/A Used for baseline PRS construction
SBayesRC (via GCTB) 2.03 https://cnsgenomics.com/software/gctb/ N/A Used for Bayesian PRS construction
PRS-CSx 1.0 https://github.com/getian107/PRScsx N/A Used for ancestry-aware PRS construction
BridgePRS 1.0 https://github.com/CliveR/BridgePRS N/A Used for ancestry-aware PRS scoring

About

This repository contains all analyses for the manuscript titled "Cross-ancestry performance of Parkinson’s disease polygenic risk scores in admixed Latin American populations"

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors