exposomeSC

Single-Cell Exposome-Wide Association Studies with Cross-Omic Network Inference

The problem

Single-cell RNA-seq resolves cell-type-specific biology, but environmental exposures are measured at the donor level. Bridging these two scales requires pseudobulk aggregation — and getting it wrong inflates false positives by 10–100× (Squair et al. 2021). Existing tools don't address this:

Bulk exposome tools (rexposome, omicRexposome) ignore cell-type specificity entirely
Single-cell DE tools (Seurat, scran) lack donor-level exposure covariates and commit pseudoreplication
General pseudobulk tools (muscat, limma-voom) can be adapted manually, but provide no built-in support for continuous exposures, mixture decomposition, dose-response modeling, or interaction testing
No existing tool decomposes exposure effects into direct transcriptional changes vs cell-composition shifts — a fundamental confound in single-cell exposomics

Even when pseudoreplication is properly handled, existing tools treat each gene as independent. But exposure effects are systemic — they rewire molecular networks. The same PM2.5 exposure might strengthen a fatty acid–inflammatory gene connection in monocytes while leaving it unchanged in B cells. No tool captures this.

The solution

exposomeSC provides a framework from single-gene associations to cell-type-resolved multi-omic network inference:

sc-ExWAS: Cell-type-specific association testing
ERD: Decompose direct transcriptional vs composition-driven effects
Cross-omic networks: Cell-type-resolved conditional independence networks integrating scRNA-seq pseudobulk with donor-level metabolomics via block-calibrated graphical lasso
Differential networks: Statistical comparison of network topology across cell types
Temporal dynamics: Track molecular rewiring across exposure time points

Quick start (copy-paste runs as-is)

library(exposomeSC)

# Load demo data (included in package)
sce <- readRDS(system.file("extdata", "demo_sce.rds",
    package = "exposomeSC"))
exp <- as.matrix(read.csv(system.file("extdata",
    "demo_exposures.csv", package = "exposomeSC"),
    row.names = 1))

# Build integrated container
scee <- build_scee(sce, exp, sample_col = "donor_id")

# Cell-type-specific ExWAS
# method = "dreamlet" uses voom+dream with precision weights
result <- run_sc_exwas(scee, exposure = "PM2.5",
    celltype_col = "cell_type", covariates = c("age", "sex"),
    method = "dreamlet")

# Decompose: direct transcriptional vs composition shift
# composition_transform = "ilr" handles compositional constraint
dec <- run_decomposed_exwas(scee, exposure = "PM2.5",
    celltype = "Monocyte", celltype_col = "cell_type",
    adjustment = "mediator", composition_transform = "ilr")

Figures

All figures below were generated locally in R using library(exposomeSC) on publicly available datasets. Reproduction scripts are in inst/scripts/.

Figure 1 | exposomeSC framework. (a) Data hierarchy: donor-level exposures paired with cell-level multi-omics. (b) Three analysis levels with increasing data requirements. (c) Pseudoreplication control: naive cell-level analysis inflates Type I error to 93%; exposomeSC pseudobulk maintains 4%. (d) Exposure-Response Decomposition (ERD) DAG separating direct transcriptional from compositional effects. (e) Cell-level coupling: within-donor gene-protein correlation as a function of exposure (meta-regression). (f) Effective degrees of freedom: cell coupling (df=45) vs precision matrix (df=3) for n=47 donors, p=45 features.

Figure 2 | Aging and the immune transcriptome: 651,003 cells from 108 healthy donors. Allen Institute Human Immune Health Atlas (cellxgene). Exposures: age (11–65), BMI, CMV status. 4 cell types: Monocytes, NK, B, DC. (a) Pseudoreplication: naive cell-level analysis of 318,479 monocytes finds 2,585/3,000 genes at FDR < 0.05 (86% false positive rate); pseudobulk correctly identifies 944 age-associated monocyte genes. (b) Cell-type-resolved sc-ExWAS: age effects across 4 cell types (944 Monocyte, 718 NK, 679 B, 802 DC genes; 31,989 tests, global BH). (c) Exposure-Response Decomposition (ERD): stacked bars show direct (within-cell transcriptional, blue) vs compositional (proportion shift, red) components of the age effect. 3,145/8,000 genes have >50% of their apparent age effect driven by composition changes. Uses mediator adjustment (VanderWeele 2015). (d) Mixture: age (85.3%) dominates the age + BMI mixture on monocyte expression. (e) Interaction: 5/50 top genes show significantly different age effects across cell types (BH FDR < 0.05). (f) Power: n = 108 achieves >80% power for moderate effects. Code: inst/scripts/gen_figures.R.

Figure 3 | Cell-level gene-protein coupling and severity-driven molecular rewiring. Liu et al. (2021) Cell CITE-seq data (GSE161918): 407,006 PBMCs, 47 donors, 192 surface proteins. (a) Cell composition by severity. (b) Top 15 surface proteins correlated with severity (Spearman rho). (c) Severity-driven rewiring volcano across 5 cell types: within-donor Spearman partial correlation (controlling library sizes, cell cycle, mitochondrial %) with meta-regression (REML + Knapp-Hartung). 54 pairs at BH q<0.05. (d) Top 6 rewired gene-protein pairs showing dose-response trajectories across severity levels. (e) Significant pairs per cell type (baseline coupling vs severity rewiring). (f) Rewiring heatmap for B_Mem (* = q<0.05). Code: inst/scripts/gen_figure3.R.

Figure 4 | Statistical calibration, power, and replicability. All panels use real data from the Allen Institute Human Immune Health Atlas (108 donors, 651,003 monocyte PBMCs) and OneK1K (981 donors). (a) Type I error under permutation null (100 shuffles of age labels, 5000 genes). voom+dream (4.9%) and DESeq2 (5.3%) both achieve nominal 5% FPR control. (b) p-value calibration: QQ plot under permutation null. Well-calibrated methods follow the diagonal. (c) Discovery recovery as a function of sample size (donor subsampling from n=108, reference: 501 genes at BH < 0.05). ≥80 donors needed for >40% recovery. (d) Split-sample replication on OneK1K (490 + 491 donors): log₂FC Spearman ρ = 0.889, Bonferroni gene overlap Jaccard = 0.556. (e) Runtime scalability on real data (Allen IHI, OneK1K). (f) Runtime per analysis (5000 genes, 108 donors). Code: inst/scripts/benchmark_permutation.R.

Calibration

Type I error was assessed by permuting age labels 100 times on Allen IHI Monocyte pseudobulk (5000 genes, 108 donors). Under the permutation null, no gene has a true age association; well-calibrated methods should reject ~5%.

Method	FPR (permutation null)	Speed (5000 genes)
voom+dream (exposomeSC default)	4.9%	1.3s
DESeq2	5.3%	13.1s
Naive correlation (no pseudobulk)	3.2%*	0.6s

*Naive correlation has low per-gene FPR but fails to control FDR across genes (mean 47.9 significant genes under null vs 0.1 for voom+dream).

Scalability: 981 donors (OneK1K) in 6.6 seconds.

Datasets

Figure 1 — Method overview: schematic + pseudoreplication demo + ERD concept + cell coupling from Liu 2021 data.

Figure 2 — Allen IHI (108 donors, scRNA-seq):

651,003 PBMCs, continuous age/BMI/CMV exposures
sc-ExWAS, ERD, mixture, interaction

Figure 3 — Liu 2021 (47 donors, CITE-seq):

407,006 PBMCs, 192 surface proteins + transcriptome
Cell-level coupling meta-regression: 54 rewired pairs

Figure 4 — Benchmark: 100 simulated datasets with known ground truth + real-data scalability and replication.

OneK1K (981 donors, scRNA-seq):

1,248,980 PBMCs (Yazar et al. 2022 Science)
1,450/2,000 genes age-associated (BH<0.05)
16/18 known immune aging genes validated
NKG7, GZMB, CCL3 ↑; MS4A1, CD79B ↓ (consistent with Peters 2015, Mogilenko 2021)

Replicability

Split-sample replication (OneK1K, 490 + 491 donors): log2FC Spearman ρ = 0.889; Bonferroni-significant gene overlap Jaccard = 0.556.

Multi-scale convergence (IERS on Liu CITE-seq): SERPINB2 ranked #1 by Integrated Exposure Response Score in classical monocytes — simultaneously the strongest severity-associated gene (sc-ExWAS), most directly regulated (ERD), and most rewired at the protein-coupling level.

Functions

Core analysis

Function	Description
`build_scee()`	Build SingleCellExposomeExperiment from SCE + exposures
`run_sc_exwas()`	Cell-type-specific ExWAS via pseudobulk. `method="dreamlet"` (recommended) or `"pseudobulk"` (DESeq2)
`run_sc_exwas_glmm()`	GLMM-based ExWAS via lme4/glmmTMB with donor random effects
`run_multi_exwas()`	Batch ExWAS across multiple exposures with global FDR

Exposure-Response Decomposition

Function	Description
`run_decomposed_exwas()`	Separate direct vs compositional effects. `composition_transform="ilr"` for compositional-aware inference. `adjustment = "confounder"` or `"mediator"`.

Dose-response & nonlinearity

Function	Description
`run_dose_response()`	Polynomial dose-response (AIC selection)
`run_dose_response_gam()`	GAM dose-response via mgcv with LOOCV R²
`run_interaction_test()`	Formal exposure × celltype interaction F-test

Mixture & composition

Function	Description
`run_sc_mixture()`	Quantile-scored mixture screening
`run_mixture_qgcomp()`	Formal qgcomp with bootstrap CI
`run_exposure_composition()`	CLR-transformed cell composition analysis

Advanced analysis

Function	Description
`run_causal_mediation()`	Formal causal mediation (Imai 2010): ACME, ADE, sensitivity ρ
`compute_iers()`	Integrated Exposure Response Score: multi-scale gene ranking
`run_state_coupling()`	Cell-state coupling: exposure × trajectory interaction
`run_differential_network()`	Exposure-stratified differential network: permutation-tested edges

Causal inference, enrichment & meta-analysis

Function	Description
`run_mediation()`	Causal mediation (Imai et al. 2010)
`run_gsea()`	Pathway enrichment via fgsea
`run_meta_exwas()`	Cross-cohort meta-analysis (fixed/random effects, I²)
`run_spatial_exwas()`	Region-resolved ExWAS via SpatialExperiment
`estimate_power()`	Power/sample size calculator

Preprocessing & interoperability

Function	Description
`exposure_impute_lod()`	Below-LOD imputation (LOD/√2, ROS, multiple)
`as_scee()`	Convert rexposome ExposomeSet to SCEE
`seurat_to_exposure()`	Extract donor-level metadata from Seurat

Cross-omic network inference

Function	Description
`run_celltype_network()`	Block-calibrated cross-omic network per cell type via collaborative graphical lasso
`run_exposure_network()`	Network restricted to exposure-associated features (two-stage: selection + network)
`run_comparative_network()`	Statistical comparison of networks across cell types (permutation or Fisher's Z)
`run_temporal_network()`	Temporal network evolution across time points with edge dynamics tracking
`run_network_mediation()`	Network-topology-guided high-dimensional mediation (exposure → metabolite → transcript)

Network visualization

Function	Description
`plot_celltype_network()`	Cell-type network with cross-omic edge highlighting and bipartite layout
`plot_network_comparison()`	Side-by-side or overlay comparison of cell-type networks
`plot_stability_surface()`	Lambda-pi calibration heatmap for hyperparameter transparency
`plot_temporal_dynamics()`	Temporal edge dynamics panel (emerging/disappearing/persistent)

Cell-level gene-protein coupling

Function	Description
`run_cell_coupling()`	Cell-level Spearman partial correlation + meta-regression for exposure-driven gene-protein rewiring (edge-level p-values, df = n_donors - 2)

Simulation & benchmarking

Function	Description
`simulate_crossomic_network()`	Generate multi-omics data with known network structure for benchmarking

Cross-omic network quick start

library(exposomeSC)

# After building SCEE (as before)
scee <- build_scee(sce, exp, sample_col = "donor_id")

# Build cell-type-specific cross-omic network
mono_net <- run_celltype_network(
    scee,
    metabolites = metab_matrix,
    celltype = "Monocyte",
    celltype_col = "cell_type",
    exposure = "PM2.5",
    covariates = c("age", "sex"),
    method = "coglasso",
    stability = TRUE
)

# Compare across cell types
comparison <- run_comparative_network(
    mono_net, bcell_net, nk_net, dc_net,
    method = "permutation",
    n_perm = 1000
)

# Visualize
plot_celltype_network(mono_net, color_by = "omic_layer")
plot_network_comparison(comparison, highlight = "differential")

Architecture

sc-eQTL:   SNP        x cell type -> gene expression
sc-ExWAS:  Exposure   x cell type -> gene expression  (this package)

Installation

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("CuiweiG/exposomeSC")

Statistical methods

Challenge	Method	Reference
Pseudoreplication	Pseudobulk + DESeq2	Squair et al. 2021
Direct vs compositional	ERD	VanderWeele 2015
Donor-level correlation	GLMM (lme4/glmmTMB)	Bates et al. 2015
Nonlinear dose-response	GAM (mgcv)	Wood 2017
Correlated mixtures	Quantile g-computation	Keil et al. 2020
Cell-type interaction	F-test	Gelman & Stern 2006
Causal mediation	Mediation analysis	Imai et al. 2010
Below detection limit	ROS / multiple imputation	Helsel 2012
Multi-cohort synthesis	IV-weighted meta-analysis	Borenstein et al. 2009
Causal interpretation	Confounder vs mediator	Pearl 2014
Cross-omic network	Collaborative graphical lasso	Albanese et al. 2024
Block calibration	Separate within/between penalization	Sinisi et al. 2024
Edge stability	XStARS (extended StARS)	Liu et al. 2010
Differential networks	Permutation test on partial correlations	—
Temporal dynamics	Time-point-specific network estimation	Sinisi et al. 2024
Network mediation	Topology-guided penalized mediation	—

Key references

Squair JW et al. (2021). Confronting false discoveries in single-cell DE. Nat Commun 12:5692. doi:10.1038/s41467-021-25960-2
VanderWeele TJ (2015). Explanation in Causal Inference. Oxford University Press.
Pearl J (2014). Interpretation and identification of causal mediation. Psychol Methods 19:459-481. doi:10.1037/a0036434
Keil AP et al. (2020). Quantile-based g-computation for exposure mixtures. Environ Health Perspect 128:047004. doi:10.1289/EHP5838
Bates D et al. (2015). Fitting linear mixed-effects models using lme4. J Stat Softw 67:1-48. doi:10.18637/jss.v067.i01
Wood SN (2017). Generalized Additive Models. 2nd ed. Chapman and Hall/CRC.
Gelman A, Stern H (2006). The difference between "significant" and "not significant" is not itself statistically significant. Am Stat 60:328-331.
Imai K, Keele L, Tingley D (2010). A general approach to causal mediation analysis. Psychol Methods 15:309-334. doi:10.1037/a0020761
Helsel DR (2012). Statistics for Censored Environmental Data. 2nd ed. Wiley.
Albanese A, Kohlen W, Behrouzi P (2024). Collaborative graphical lasso. arXiv 2403.18602. R package: coglasso.
Sinisi SE et al. (2024). Multiomic signatures of traffic-related air pollution. Environ Sci Technol. doi:10.1021/acs.est.3c09148
Maitre L et al. (2022). Multi-omics signatures of the human early life exposome. Nat Commun 13:7024. doi:10.1038/s41467-022-34422-2
Stratakis N et al. (2025). Multi-omics architecture of childhood obesity. Nat Commun 16:854. doi:10.1038/s41467-025-56013-7
Liu C et al. (2021). Time-resolved systems immunology reveals a late juncture linked to fatal COVID-19. Cell 184(7):1836-1857. doi:10.1016/j.cell.2021.02.018
Liu H, Roeder K, Wasserman L (2010). Stability approach to regularization selection (StARS). NeurIPS 23.
Meinshausen N, Buhlmann P (2010). Stability selection. J R Stat Soc B 72:417-473. doi:10.1111/j.1467-9868.2010.00740.x

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
.github/workflows		.github/workflows
R		R
inst		inst
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

exposomeSC

The problem

The solution

Quick start (copy-paste runs as-is)

Figures

Calibration

Datasets

Replicability

Functions

Core analysis

Exposure-Response Decomposition

Dose-response & nonlinearity

Mixture & composition

Advanced analysis

Causal inference, enrichment & meta-analysis

Preprocessing & interoperability

Cross-omic network inference

Network visualization

Cell-level gene-protein coupling

Simulation & benchmarking

Cross-omic network quick start

Architecture

Installation

Statistical methods

Key references

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

exposomeSC

The problem

The solution

Quick start (copy-paste runs as-is)

Figures

Calibration

Datasets

Replicability

Functions

Core analysis

Exposure-Response Decomposition

Dose-response & nonlinearity

Mixture & composition

Advanced analysis

Causal inference, enrichment & meta-analysis

Preprocessing & interoperability

Cross-omic network inference

Network visualization

Cell-level gene-protein coupling

Simulation & benchmarking

Cross-omic network quick start

Architecture

Installation

Statistical methods

Key references

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages