Cross-ancestry performance of Parkinson’s disease polygenic risk scores in admixed Latin American populations
GP2 ❤️ Open Science 😍
Last Updated: February 2026
This repository contains all analyses for the manuscript titled "Cross-ancestry performance of Parkinson’s disease polygenic risk scores in admixed Latin American populations" by Flores et al.
- Data used in the preparation of this article were obtained from the Global Parkinson’s Genetics Program (GP2; https://gp2.org).
- GP2 data release 9 (controlled-tier access; DOI: 10.5281/zenodo.14510099)
analyses/
├── 00.Set_up.ipynb
├── 01.QC_pre_run.ipynb
├── 02.Baseline_PRS_construction.ipynb
├── 03.Ancestry-aware_PRS_construction.ipynb
│ └── 03.1.PRScsx_GP2.22.wdl
├── 04.Global_ancestry_estimation.ipynb
└── 05.Regression_models.ipynb
| Directory | Notebook | Description |
|---|---|---|
| Analyses/ | 00.Set_up | Define paths, install/update packages, load functions, create conda environments, download target data, clean base data, prepare allele order combinations. |
| Analyses/ | 01.QC_pre_run | Prepare 1KGP samples, clean runs, exclude cohorts, check rsq, IBD analysis, PCA prep, create PLINK files, parse imputation quality, filter SNPs, remove related individuals, update rsIDs, exclude monogenic samples, export clean files, run SmartPCA. |
| Analyses/ | 02.Baseline_PRS_construction | Run PRSice (EUR/AMR bases), SBAYES RC (EUR/AMR), download annotation sets, reformat sumstats, plink scoring with sbayes sumstats, shortcuts using collaborator data. |
| Analyses/ | 03.Ancestry-aware_PRS_construction | Run PRS-CSX with multiple Phi values, concatenate outputs, test best linear combination, validate scores, run BridgePRS (prepare pheno, correct RSIDs, scoring with plink). |
| Analyses/ | 04.Global_ancestry_estimation | Use homogeneous references, prepare target + 1KGP ref sets, join PLINK sets, estimate ancestry percentages, label continental ancestry, plot composition results. |
| Analyses/ | 05.Regression_models | Fit GLMs for single-/multi-ancestry tools, calculate PseudoR2, bootstrapping CIs, odds ratios, assess performance by ancestry percentage, balance cases/controls, run quartile-based GLMs, covariate analysis, plot ROC curves. |
All individual-level data have been removed, including sample IDs and identifiers. Sensitive paths have been cleared. Notebooks have been saved as .ipynb files for easy sharing while ensuring compliance with data privacy standards.
| Software/Tool | Version | Resource URL | RRID | Notes |
|---|---|---|---|---|
| Python | 3.10 | http://www.python.org/ | RRID:SCR_008394 | pandas; numpy; matplotlib; forestplot used for data wrangling/plotting |
| Jupyter Notebook | 4.3 | https://jupyter.org | RRID:SCR_018315 | Used to keep record of the analysis pipeline |
| R | 4.4.2 | https://www.r-project.org/ | RRID:SCR_001905 | Used for QC scripts, glm package for regression models |
| PLINK | 2.0 | https://www.cog-genomics.org/plink/2.0/ | RRID:SCR_001757 | Used for association analysis, LD pruning, scoring |
| ADMIXTURE | 1.3.0 | https://dalexander.github.io/admixture/ | RRID:SCR_004173 | Used for ancestry estimation |
| MungeSumstats (R package) | 1.12.2 | https://github.com/MRCIEU/MungeSumstats | N/A | Used for harmonizing GWAS summary statistics |
| DescTools (R package) | 0.99.60 | https://cran.r-project.org/package=DescTools | N/A | Used for statistical utilities |
| boot (R package) | 1.3.31 | https://cran.r-project.org/package=boot | N/A | Used for bootstrapping confidence intervals |
| pROC (R package) | 1.19.0.1 | https://cran.r-project.org/package=pROC | N/A | Used for ROC curve analysis |
| Illumina NeuroBooster array | N/A | https://www.illumina.com | N/A | Genotyping array used for target data |
| UMAP | N/A | https://umap-learn.readthedocs.io | RRID:SCR_018996 | Used for dimensionality reduction |
| XGBoost | N/A | https://xgboost.readthedocs.io | RRID:SCR_022964 | Used for machine learning models |
| SmartPCA (EIGENSOFT) | 6.1.4 | https://github.com/DavidLawson/EIGENSOFT | RRID:SCR_004965 | Used for PCA in ancestry estimation |
| PRSice-2 | 2.3.5 | https://www.prsice.info/ | N/A | Used for baseline PRS construction |
| SBayesRC (via GCTB) | 2.03 | https://cnsgenomics.com/software/gctb/ | N/A | Used for Bayesian PRS construction |
| PRS-CSx | 1.0 | https://github.com/getian107/PRScsx | N/A | Used for ancestry-aware PRS construction |
| BridgePRS | 1.0 | https://github.com/CliveR/BridgePRS | N/A | Used for ancestry-aware PRS scoring |