Skip to content

CuiweiG/exposomeSC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

exposomeSC

Single-Cell Exposome-Wide Association Studies with Cross-Omic Network Inference

R-CMD-check License: Artistic-2.0 BioC style


The problem

Single-cell RNA-seq resolves cell-type-specific biology, but environmental exposures are measured at the donor level. Bridging these two scales requires pseudobulk aggregation — and getting it wrong inflates false positives by 10–100× (Squair et al. 2021). Existing tools don't address this:

  • Bulk exposome tools (rexposome, omicRexposome) ignore cell-type specificity entirely
  • Single-cell DE tools (Seurat, scran) lack donor-level exposure covariates and commit pseudoreplication
  • General pseudobulk tools (muscat, limma-voom) can be adapted manually, but provide no built-in support for continuous exposures, mixture decomposition, dose-response modeling, or interaction testing
  • No existing tool decomposes exposure effects into direct transcriptional changes vs cell-composition shifts — a fundamental confound in single-cell exposomics

Even when pseudoreplication is properly handled, existing tools treat each gene as independent. But exposure effects are systemic — they rewire molecular networks. The same PM2.5 exposure might strengthen a fatty acid–inflammatory gene connection in monocytes while leaving it unchanged in B cells. No tool captures this.

The solution

exposomeSC provides a framework from single-gene associations to cell-type-resolved multi-omic network inference:

  1. sc-ExWAS: Cell-type-specific association testing
  2. ERD: Decompose direct transcriptional vs composition-driven effects
  3. Cross-omic networks: Cell-type-resolved conditional independence networks integrating scRNA-seq pseudobulk with donor-level metabolomics via block-calibrated graphical lasso
  4. Differential networks: Statistical comparison of network topology across cell types
  5. Temporal dynamics: Track molecular rewiring across exposure time points

Quick start (copy-paste runs as-is)

library(exposomeSC)

# Load demo data (included in package)
sce <- readRDS(system.file("extdata", "demo_sce.rds",
    package = "exposomeSC"))
exp <- as.matrix(read.csv(system.file("extdata",
    "demo_exposures.csv", package = "exposomeSC"),
    row.names = 1))

# Build integrated container
scee <- build_scee(sce, exp, sample_col = "donor_id")

# Cell-type-specific ExWAS
# method = "dreamlet" uses voom+dream with precision weights
result <- run_sc_exwas(scee, exposure = "PM2.5",
    celltype_col = "cell_type", covariates = c("age", "sex"),
    method = "dreamlet")

# Decompose: direct transcriptional vs composition shift
# composition_transform = "ilr" handles compositional constraint
dec <- run_decomposed_exwas(scee, exposure = "PM2.5",
    celltype = "Monocyte", celltype_col = "cell_type",
    adjustment = "mediator", composition_transform = "ilr")

Figures

All figures below were generated locally in R using library(exposomeSC) on publicly available datasets. Reproduction scripts are in inst/scripts/.


exposomeSC method overview

Figure 1 | exposomeSC framework. (a) Data hierarchy: donor-level exposures paired with cell-level multi-omics. (b) Three analysis levels with increasing data requirements. (c) Pseudoreplication control: naive cell-level analysis inflates Type I error to 93%; exposomeSC pseudobulk maintains 4%. (d) Exposure-Response Decomposition (ERD) DAG separating direct transcriptional from compositional effects. (e) Cell-level coupling: within-donor gene-protein correlation as a function of exposure (meta-regression). (f) Effective degrees of freedom: cell coupling (df=45) vs precision matrix (df=3) for n=47 donors, p=45 features.


sc-ExWAS on Allen IHI

Figure 2 | Aging and the immune transcriptome: 651,003 cells from 108 healthy donors. Allen Institute Human Immune Health Atlas (cellxgene). Exposures: age (11–65), BMI, CMV status. 4 cell types: Monocytes, NK, B, DC. (a) Pseudoreplication: naive cell-level analysis of 318,479 monocytes finds 2,585/3,000 genes at FDR < 0.05 (86% false positive rate); pseudobulk correctly identifies 944 age-associated monocyte genes. (b) Cell-type-resolved sc-ExWAS: age effects across 4 cell types (944 Monocyte, 718 NK, 679 B, 802 DC genes; 31,989 tests, global BH). (c) Exposure-Response Decomposition (ERD): stacked bars show direct (within-cell transcriptional, blue) vs compositional (proportion shift, red) components of the age effect. 3,145/8,000 genes have >50% of their apparent age effect driven by composition changes. Uses mediator adjustment (VanderWeele 2015). (d) Mixture: age (85.3%) dominates the age + BMI mixture on monocyte expression. (e) Interaction: 5/50 top genes show significantly different age effects across cell types (BH FDR < 0.05). (f) Power: n = 108 achieves >80% power for moderate effects. Code: inst/scripts/gen_figures.R.


Cell-level gene-protein coupling

Figure 3 | Cell-level gene-protein coupling and severity-driven molecular rewiring. Liu et al. (2021) Cell CITE-seq data (GSE161918): 407,006 PBMCs, 47 donors, 192 surface proteins. (a) Cell composition by severity. (b) Top 15 surface proteins correlated with severity (Spearman rho). (c) Severity-driven rewiring volcano across 5 cell types: within-donor Spearman partial correlation (controlling library sizes, cell cycle, mitochondrial %) with meta-regression (REML + Knapp-Hartung). 54 pairs at BH q<0.05. (d) Top 6 rewired gene-protein pairs showing dose-response trajectories across severity levels. (e) Significant pairs per cell type (baseline coupling vs severity rewiring). (f) Rewiring heatmap for B_Mem (* = q<0.05). Code: inst/scripts/gen_figure3.R.


Benchmark comparison

Figure 4 | Statistical calibration, power, and replicability. All panels use real data from the Allen Institute Human Immune Health Atlas (108 donors, 651,003 monocyte PBMCs) and OneK1K (981 donors). (a) Type I error under permutation null (100 shuffles of age labels, 5000 genes). voom+dream (4.9%) and DESeq2 (5.3%) both achieve nominal 5% FPR control. (b) p-value calibration: QQ plot under permutation null. Well-calibrated methods follow the diagonal. (c) Discovery recovery as a function of sample size (donor subsampling from n=108, reference: 501 genes at BH < 0.05). ≥80 donors needed for >40% recovery. (d) Split-sample replication on OneK1K (490 + 491 donors): log₂FC Spearman ρ = 0.889, Bonferroni gene overlap Jaccard = 0.556. (e) Runtime scalability on real data (Allen IHI, OneK1K). (f) Runtime per analysis (5000 genes, 108 donors). Code: inst/scripts/benchmark_permutation.R.


Calibration

Type I error was assessed by permuting age labels 100 times on Allen IHI Monocyte pseudobulk (5000 genes, 108 donors). Under the permutation null, no gene has a true age association; well-calibrated methods should reject ~5%.

Method FPR (permutation null) Speed (5000 genes)
voom+dream (exposomeSC default) 4.9% 1.3s
DESeq2 5.3% 13.1s
Naive correlation (no pseudobulk) 3.2%* 0.6s

*Naive correlation has low per-gene FPR but fails to control FDR across genes (mean 47.9 significant genes under null vs 0.1 for voom+dream).

Scalability: 981 donors (OneK1K) in 6.6 seconds.

Datasets

Figure 1 — Method overview: schematic + pseudoreplication demo + ERD concept + cell coupling from Liu 2021 data.

Figure 2 — Allen IHI (108 donors, scRNA-seq):

  • 651,003 PBMCs, continuous age/BMI/CMV exposures
  • sc-ExWAS, ERD, mixture, interaction

Figure 3 — Liu 2021 (47 donors, CITE-seq):

  • 407,006 PBMCs, 192 surface proteins + transcriptome
  • Cell-level coupling meta-regression: 54 rewired pairs

Figure 4 — Benchmark: 100 simulated datasets with known ground truth + real-data scalability and replication.

OneK1K (981 donors, scRNA-seq):

  • 1,248,980 PBMCs (Yazar et al. 2022 Science)
  • 1,450/2,000 genes age-associated (BH<0.05)
  • 16/18 known immune aging genes validated
  • NKG7, GZMB, CCL3 ↑; MS4A1, CD79B ↓ (consistent with Peters 2015, Mogilenko 2021)

Replicability

Split-sample replication (OneK1K, 490 + 491 donors): log2FC Spearman ρ = 0.889; Bonferroni-significant gene overlap Jaccard = 0.556.

Multi-scale convergence (IERS on Liu CITE-seq): SERPINB2 ranked #1 by Integrated Exposure Response Score in classical monocytes — simultaneously the strongest severity-associated gene (sc-ExWAS), most directly regulated (ERD), and most rewired at the protein-coupling level.


Functions

Core analysis

Function Description
build_scee() Build SingleCellExposomeExperiment from SCE + exposures
run_sc_exwas() Cell-type-specific ExWAS via pseudobulk. method="dreamlet" (recommended) or "pseudobulk" (DESeq2)
run_sc_exwas_glmm() GLMM-based ExWAS via lme4/glmmTMB with donor random effects
run_multi_exwas() Batch ExWAS across multiple exposures with global FDR

Exposure-Response Decomposition

Function Description
run_decomposed_exwas() Separate direct vs compositional effects. composition_transform="ilr" for compositional-aware inference. adjustment = "confounder" or "mediator".

Dose-response & nonlinearity

Function Description
run_dose_response() Polynomial dose-response (AIC selection)
run_dose_response_gam() GAM dose-response via mgcv with LOOCV R²
run_interaction_test() Formal exposure × celltype interaction F-test

Mixture & composition

Function Description
run_sc_mixture() Quantile-scored mixture screening
run_mixture_qgcomp() Formal qgcomp with bootstrap CI
run_exposure_composition() CLR-transformed cell composition analysis

Advanced analysis

Function Description
run_causal_mediation() Formal causal mediation (Imai 2010): ACME, ADE, sensitivity ρ
compute_iers() Integrated Exposure Response Score: multi-scale gene ranking
run_state_coupling() Cell-state coupling: exposure × trajectory interaction
run_differential_network() Exposure-stratified differential network: permutation-tested edges

Causal inference, enrichment & meta-analysis

Function Description
run_mediation() Causal mediation (Imai et al. 2010)
run_gsea() Pathway enrichment via fgsea
run_meta_exwas() Cross-cohort meta-analysis (fixed/random effects, I²)
run_spatial_exwas() Region-resolved ExWAS via SpatialExperiment
estimate_power() Power/sample size calculator

Preprocessing & interoperability

Function Description
exposure_impute_lod() Below-LOD imputation (LOD/√2, ROS, multiple)
as_scee() Convert rexposome ExposomeSet to SCEE
seurat_to_exposure() Extract donor-level metadata from Seurat

Cross-omic network inference

Function Description
run_celltype_network() Block-calibrated cross-omic network per cell type via collaborative graphical lasso
run_exposure_network() Network restricted to exposure-associated features (two-stage: selection + network)
run_comparative_network() Statistical comparison of networks across cell types (permutation or Fisher's Z)
run_temporal_network() Temporal network evolution across time points with edge dynamics tracking
run_network_mediation() Network-topology-guided high-dimensional mediation (exposure → metabolite → transcript)

Network visualization

Function Description
plot_celltype_network() Cell-type network with cross-omic edge highlighting and bipartite layout
plot_network_comparison() Side-by-side or overlay comparison of cell-type networks
plot_stability_surface() Lambda-pi calibration heatmap for hyperparameter transparency
plot_temporal_dynamics() Temporal edge dynamics panel (emerging/disappearing/persistent)

Cell-level gene-protein coupling

Function Description
run_cell_coupling() Cell-level Spearman partial correlation + meta-regression for exposure-driven gene-protein rewiring (edge-level p-values, df = n_donors - 2)

Simulation & benchmarking

Function Description
simulate_crossomic_network() Generate multi-omics data with known network structure for benchmarking

Cross-omic network quick start

library(exposomeSC)

# After building SCEE (as before)
scee <- build_scee(sce, exp, sample_col = "donor_id")

# Build cell-type-specific cross-omic network
mono_net <- run_celltype_network(
    scee,
    metabolites = metab_matrix,
    celltype = "Monocyte",
    celltype_col = "cell_type",
    exposure = "PM2.5",
    covariates = c("age", "sex"),
    method = "coglasso",
    stability = TRUE
)

# Compare across cell types
comparison <- run_comparative_network(
    mono_net, bcell_net, nk_net, dc_net,
    method = "permutation",
    n_perm = 1000
)

# Visualize
plot_celltype_network(mono_net, color_by = "omic_layer")
plot_network_comparison(comparison, highlight = "differential")

Architecture

sc-eQTL:   SNP        x cell type -> gene expression
sc-ExWAS:  Exposure   x cell type -> gene expression  (this package)

Installation

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("CuiweiG/exposomeSC")

Statistical methods

Challenge Method Reference
Pseudoreplication Pseudobulk + DESeq2 Squair et al. 2021
Direct vs compositional ERD VanderWeele 2015
Donor-level correlation GLMM (lme4/glmmTMB) Bates et al. 2015
Nonlinear dose-response GAM (mgcv) Wood 2017
Correlated mixtures Quantile g-computation Keil et al. 2020
Cell-type interaction F-test Gelman & Stern 2006
Causal mediation Mediation analysis Imai et al. 2010
Below detection limit ROS / multiple imputation Helsel 2012
Multi-cohort synthesis IV-weighted meta-analysis Borenstein et al. 2009
Causal interpretation Confounder vs mediator Pearl 2014
Cross-omic network Collaborative graphical lasso Albanese et al. 2024
Block calibration Separate within/between penalization Sinisi et al. 2024
Edge stability XStARS (extended StARS) Liu et al. 2010
Differential networks Permutation test on partial correlations
Temporal dynamics Time-point-specific network estimation Sinisi et al. 2024
Network mediation Topology-guided penalized mediation

Key references

  • Squair JW et al. (2021). Confronting false discoveries in single-cell DE. Nat Commun 12:5692. doi:10.1038/s41467-021-25960-2
  • VanderWeele TJ (2015). Explanation in Causal Inference. Oxford University Press.
  • Pearl J (2014). Interpretation and identification of causal mediation. Psychol Methods 19:459-481. doi:10.1037/a0036434
  • Keil AP et al. (2020). Quantile-based g-computation for exposure mixtures. Environ Health Perspect 128:047004. doi:10.1289/EHP5838
  • Bates D et al. (2015). Fitting linear mixed-effects models using lme4. J Stat Softw 67:1-48. doi:10.18637/jss.v067.i01
  • Wood SN (2017). Generalized Additive Models. 2nd ed. Chapman and Hall/CRC.
  • Gelman A, Stern H (2006). The difference between "significant" and "not significant" is not itself statistically significant. Am Stat 60:328-331.
  • Imai K, Keele L, Tingley D (2010). A general approach to causal mediation analysis. Psychol Methods 15:309-334. doi:10.1037/a0020761
  • Helsel DR (2012). Statistics for Censored Environmental Data. 2nd ed. Wiley.
  • Albanese A, Kohlen W, Behrouzi P (2024). Collaborative graphical lasso. arXiv 2403.18602. R package: coglasso.
  • Sinisi SE et al. (2024). Multiomic signatures of traffic-related air pollution. Environ Sci Technol. doi:10.1021/acs.est.3c09148
  • Maitre L et al. (2022). Multi-omics signatures of the human early life exposome. Nat Commun 13:7024. doi:10.1038/s41467-022-34422-2
  • Stratakis N et al. (2025). Multi-omics architecture of childhood obesity. Nat Commun 16:854. doi:10.1038/s41467-025-56013-7
  • Liu C et al. (2021). Time-resolved systems immunology reveals a late juncture linked to fatal COVID-19. Cell 184(7):1836-1857. doi:10.1016/j.cell.2021.02.018
  • Liu H, Roeder K, Wasserman L (2010). Stability approach to regularization selection (StARS). NeurIPS 23.
  • Meinshausen N, Buhlmann P (2010). Stability selection. J R Stat Soc B 72:417-473. doi:10.1111/j.1467-9868.2010.00740.x

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages