Single-cell RNA-seq resolves cell-type-specific biology, but environmental exposures are measured at the donor level. Bridging these two scales requires pseudobulk aggregation — and getting it wrong inflates false positives by 10–100× (Squair et al. 2021). Existing tools don't address this:
- Bulk exposome tools (
rexposome,omicRexposome) ignore cell-type specificity entirely - Single-cell DE tools (
Seurat,scran) lack donor-level exposure covariates and commit pseudoreplication - General pseudobulk tools (
muscat,limma-voom) can be adapted manually, but provide no built-in support for continuous exposures, mixture decomposition, dose-response modeling, or interaction testing - No existing tool decomposes exposure effects into direct transcriptional changes vs cell-composition shifts — a fundamental confound in single-cell exposomics
Even when pseudoreplication is properly handled, existing tools treat each gene as independent. But exposure effects are systemic — they rewire molecular networks. The same PM2.5 exposure might strengthen a fatty acid–inflammatory gene connection in monocytes while leaving it unchanged in B cells. No tool captures this.
exposomeSC provides a framework from single-gene
associations to cell-type-resolved multi-omic network
inference:
- sc-ExWAS: Cell-type-specific association testing
- ERD: Decompose direct transcriptional vs composition-driven effects
- Cross-omic networks: Cell-type-resolved conditional independence networks integrating scRNA-seq pseudobulk with donor-level metabolomics via block-calibrated graphical lasso
- Differential networks: Statistical comparison of network topology across cell types
- Temporal dynamics: Track molecular rewiring across exposure time points
library(exposomeSC)
# Load demo data (included in package)
sce <- readRDS(system.file("extdata", "demo_sce.rds",
package = "exposomeSC"))
exp <- as.matrix(read.csv(system.file("extdata",
"demo_exposures.csv", package = "exposomeSC"),
row.names = 1))
# Build integrated container
scee <- build_scee(sce, exp, sample_col = "donor_id")
# Cell-type-specific ExWAS
# method = "dreamlet" uses voom+dream with precision weights
result <- run_sc_exwas(scee, exposure = "PM2.5",
celltype_col = "cell_type", covariates = c("age", "sex"),
method = "dreamlet")
# Decompose: direct transcriptional vs composition shift
# composition_transform = "ilr" handles compositional constraint
dec <- run_decomposed_exwas(scee, exposure = "PM2.5",
celltype = "Monocyte", celltype_col = "cell_type",
adjustment = "mediator", composition_transform = "ilr")All figures below were generated locally in R using
library(exposomeSC) on publicly available datasets.
Reproduction scripts are in inst/scripts/.
Figure 1 | exposomeSC framework. (a) Data hierarchy: donor-level exposures paired with cell-level multi-omics. (b) Three analysis levels with increasing data requirements. (c) Pseudoreplication control: naive cell-level analysis inflates Type I error to 93%; exposomeSC pseudobulk maintains 4%. (d) Exposure-Response Decomposition (ERD) DAG separating direct transcriptional from compositional effects. (e) Cell-level coupling: within-donor gene-protein correlation as a function of exposure (meta-regression). (f) Effective degrees of freedom: cell coupling (df=45) vs precision matrix (df=3) for n=47 donors, p=45 features.
Figure 2 | Aging and the immune transcriptome: 651,003 cells from 108 healthy donors. Allen Institute Human Immune Health Atlas (cellxgene). Exposures: age (11–65), BMI, CMV status. 4 cell types: Monocytes, NK, B, DC. (a) Pseudoreplication: naive cell-level analysis of 318,479 monocytes finds 2,585/3,000 genes at FDR < 0.05 (86% false positive rate); pseudobulk correctly identifies 944 age-associated monocyte genes. (b) Cell-type-resolved sc-ExWAS: age effects across 4 cell types (944 Monocyte, 718 NK, 679 B, 802 DC genes; 31,989 tests, global BH). (c) Exposure-Response Decomposition (ERD): stacked bars show direct (within-cell transcriptional, blue) vs compositional (proportion shift, red) components of the age effect. 3,145/8,000 genes have >50% of their apparent age effect driven by composition changes. Uses mediator adjustment (VanderWeele 2015). (d) Mixture: age (85.3%) dominates the age + BMI mixture on monocyte expression. (e) Interaction: 5/50 top genes show significantly different age effects across cell types (BH FDR < 0.05). (f) Power: n = 108 achieves >80% power for moderate effects. Code:
inst/scripts/gen_figures.R.
Figure 3 | Cell-level gene-protein coupling and severity-driven molecular rewiring. Liu et al. (2021) Cell CITE-seq data (GSE161918): 407,006 PBMCs, 47 donors, 192 surface proteins. (a) Cell composition by severity. (b) Top 15 surface proteins correlated with severity (Spearman rho). (c) Severity-driven rewiring volcano across 5 cell types: within-donor Spearman partial correlation (controlling library sizes, cell cycle, mitochondrial %) with meta-regression (REML + Knapp-Hartung). 54 pairs at BH q<0.05. (d) Top 6 rewired gene-protein pairs showing dose-response trajectories across severity levels. (e) Significant pairs per cell type (baseline coupling vs severity rewiring). (f) Rewiring heatmap for B_Mem (* = q<0.05). Code:
inst/scripts/gen_figure3.R.
Figure 4 | Statistical calibration, power, and replicability. All panels use real data from the Allen Institute Human Immune Health Atlas (108 donors, 651,003 monocyte PBMCs) and OneK1K (981 donors). (a) Type I error under permutation null (100 shuffles of age labels, 5000 genes). voom+dream (4.9%) and DESeq2 (5.3%) both achieve nominal 5% FPR control. (b) p-value calibration: QQ plot under permutation null. Well-calibrated methods follow the diagonal. (c) Discovery recovery as a function of sample size (donor subsampling from n=108, reference: 501 genes at BH < 0.05). ≥80 donors needed for >40% recovery. (d) Split-sample replication on OneK1K (490 + 491 donors): log₂FC Spearman ρ = 0.889, Bonferroni gene overlap Jaccard = 0.556. (e) Runtime scalability on real data (Allen IHI, OneK1K). (f) Runtime per analysis (5000 genes, 108 donors). Code:
inst/scripts/benchmark_permutation.R.
Type I error was assessed by permuting age labels 100 times on Allen IHI Monocyte pseudobulk (5000 genes, 108 donors). Under the permutation null, no gene has a true age association; well-calibrated methods should reject ~5%.
| Method | FPR (permutation null) | Speed (5000 genes) |
|---|---|---|
| voom+dream (exposomeSC default) | 4.9% | 1.3s |
| DESeq2 | 5.3% | 13.1s |
| Naive correlation (no pseudobulk) | 3.2%* | 0.6s |
*Naive correlation has low per-gene FPR but fails to control FDR across genes (mean 47.9 significant genes under null vs 0.1 for voom+dream).
Scalability: 981 donors (OneK1K) in 6.6 seconds.
Figure 1 — Method overview: schematic + pseudoreplication demo + ERD concept + cell coupling from Liu 2021 data.
Figure 2 — Allen IHI (108 donors, scRNA-seq):
- 651,003 PBMCs, continuous age/BMI/CMV exposures
- sc-ExWAS, ERD, mixture, interaction
Figure 3 — Liu 2021 (47 donors, CITE-seq):
- 407,006 PBMCs, 192 surface proteins + transcriptome
- Cell-level coupling meta-regression: 54 rewired pairs
Figure 4 — Benchmark: 100 simulated datasets with known ground truth + real-data scalability and replication.
OneK1K (981 donors, scRNA-seq):
- 1,248,980 PBMCs (Yazar et al. 2022 Science)
- 1,450/2,000 genes age-associated (BH<0.05)
- 16/18 known immune aging genes validated
- NKG7, GZMB, CCL3 ↑; MS4A1, CD79B ↓ (consistent with Peters 2015, Mogilenko 2021)
Split-sample replication (OneK1K, 490 + 491 donors): log2FC Spearman ρ = 0.889; Bonferroni-significant gene overlap Jaccard = 0.556.
Multi-scale convergence (IERS on Liu CITE-seq): SERPINB2 ranked #1 by Integrated Exposure Response Score in classical monocytes — simultaneously the strongest severity-associated gene (sc-ExWAS), most directly regulated (ERD), and most rewired at the protein-coupling level.
| Function | Description |
|---|---|
build_scee() |
Build SingleCellExposomeExperiment from SCE + exposures |
run_sc_exwas() |
Cell-type-specific ExWAS via pseudobulk. method="dreamlet" (recommended) or "pseudobulk" (DESeq2) |
run_sc_exwas_glmm() |
GLMM-based ExWAS via lme4/glmmTMB with donor random effects |
run_multi_exwas() |
Batch ExWAS across multiple exposures with global FDR |
| Function | Description |
|---|---|
run_decomposed_exwas() |
Separate direct vs compositional effects. composition_transform="ilr" for compositional-aware inference. adjustment = "confounder" or "mediator". |
| Function | Description |
|---|---|
run_dose_response() |
Polynomial dose-response (AIC selection) |
run_dose_response_gam() |
GAM dose-response via mgcv with LOOCV R² |
run_interaction_test() |
Formal exposure × celltype interaction F-test |
| Function | Description |
|---|---|
run_sc_mixture() |
Quantile-scored mixture screening |
run_mixture_qgcomp() |
Formal qgcomp with bootstrap CI |
run_exposure_composition() |
CLR-transformed cell composition analysis |
| Function | Description |
|---|---|
run_causal_mediation() |
Formal causal mediation (Imai 2010): ACME, ADE, sensitivity ρ |
compute_iers() |
Integrated Exposure Response Score: multi-scale gene ranking |
run_state_coupling() |
Cell-state coupling: exposure × trajectory interaction |
run_differential_network() |
Exposure-stratified differential network: permutation-tested edges |
| Function | Description |
|---|---|
run_mediation() |
Causal mediation (Imai et al. 2010) |
run_gsea() |
Pathway enrichment via fgsea |
run_meta_exwas() |
Cross-cohort meta-analysis (fixed/random effects, I²) |
run_spatial_exwas() |
Region-resolved ExWAS via SpatialExperiment |
estimate_power() |
Power/sample size calculator |
| Function | Description |
|---|---|
exposure_impute_lod() |
Below-LOD imputation (LOD/√2, ROS, multiple) |
as_scee() |
Convert rexposome ExposomeSet to SCEE |
seurat_to_exposure() |
Extract donor-level metadata from Seurat |
| Function | Description |
|---|---|
run_celltype_network() |
Block-calibrated cross-omic network per cell type via collaborative graphical lasso |
run_exposure_network() |
Network restricted to exposure-associated features (two-stage: selection + network) |
run_comparative_network() |
Statistical comparison of networks across cell types (permutation or Fisher's Z) |
run_temporal_network() |
Temporal network evolution across time points with edge dynamics tracking |
run_network_mediation() |
Network-topology-guided high-dimensional mediation (exposure → metabolite → transcript) |
| Function | Description |
|---|---|
plot_celltype_network() |
Cell-type network with cross-omic edge highlighting and bipartite layout |
plot_network_comparison() |
Side-by-side or overlay comparison of cell-type networks |
plot_stability_surface() |
Lambda-pi calibration heatmap for hyperparameter transparency |
plot_temporal_dynamics() |
Temporal edge dynamics panel (emerging/disappearing/persistent) |
| Function | Description |
|---|---|
run_cell_coupling() |
Cell-level Spearman partial correlation + meta-regression for exposure-driven gene-protein rewiring (edge-level p-values, df = n_donors - 2) |
| Function | Description |
|---|---|
simulate_crossomic_network() |
Generate multi-omics data with known network structure for benchmarking |
library(exposomeSC)
# After building SCEE (as before)
scee <- build_scee(sce, exp, sample_col = "donor_id")
# Build cell-type-specific cross-omic network
mono_net <- run_celltype_network(
scee,
metabolites = metab_matrix,
celltype = "Monocyte",
celltype_col = "cell_type",
exposure = "PM2.5",
covariates = c("age", "sex"),
method = "coglasso",
stability = TRUE
)
# Compare across cell types
comparison <- run_comparative_network(
mono_net, bcell_net, nk_net, dc_net,
method = "permutation",
n_perm = 1000
)
# Visualize
plot_celltype_network(mono_net, color_by = "omic_layer")
plot_network_comparison(comparison, highlight = "differential")sc-eQTL: SNP x cell type -> gene expression
sc-ExWAS: Exposure x cell type -> gene expression (this package)
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("CuiweiG/exposomeSC")| Challenge | Method | Reference |
|---|---|---|
| Pseudoreplication | Pseudobulk + DESeq2 | Squair et al. 2021 |
| Direct vs compositional | ERD | VanderWeele 2015 |
| Donor-level correlation | GLMM (lme4/glmmTMB) | Bates et al. 2015 |
| Nonlinear dose-response | GAM (mgcv) | Wood 2017 |
| Correlated mixtures | Quantile g-computation | Keil et al. 2020 |
| Cell-type interaction | F-test | Gelman & Stern 2006 |
| Causal mediation | Mediation analysis | Imai et al. 2010 |
| Below detection limit | ROS / multiple imputation | Helsel 2012 |
| Multi-cohort synthesis | IV-weighted meta-analysis | Borenstein et al. 2009 |
| Causal interpretation | Confounder vs mediator | Pearl 2014 |
| Cross-omic network | Collaborative graphical lasso | Albanese et al. 2024 |
| Block calibration | Separate within/between penalization | Sinisi et al. 2024 |
| Edge stability | XStARS (extended StARS) | Liu et al. 2010 |
| Differential networks | Permutation test on partial correlations | — |
| Temporal dynamics | Time-point-specific network estimation | Sinisi et al. 2024 |
| Network mediation | Topology-guided penalized mediation | — |
- Squair JW et al. (2021). Confronting false discoveries in single-cell DE. Nat Commun 12:5692. doi:10.1038/s41467-021-25960-2
- VanderWeele TJ (2015). Explanation in Causal Inference. Oxford University Press.
- Pearl J (2014). Interpretation and identification of causal mediation. Psychol Methods 19:459-481. doi:10.1037/a0036434
- Keil AP et al. (2020). Quantile-based g-computation for exposure mixtures. Environ Health Perspect 128:047004. doi:10.1289/EHP5838
- Bates D et al. (2015). Fitting linear mixed-effects models using lme4. J Stat Softw 67:1-48. doi:10.18637/jss.v067.i01
- Wood SN (2017). Generalized Additive Models. 2nd ed. Chapman and Hall/CRC.
- Gelman A, Stern H (2006). The difference between "significant" and "not significant" is not itself statistically significant. Am Stat 60:328-331.
- Imai K, Keele L, Tingley D (2010). A general approach to causal mediation analysis. Psychol Methods 15:309-334. doi:10.1037/a0020761
- Helsel DR (2012). Statistics for Censored Environmental Data. 2nd ed. Wiley.
- Albanese A, Kohlen W, Behrouzi P (2024). Collaborative graphical lasso. arXiv 2403.18602. R package: coglasso.
- Sinisi SE et al. (2024). Multiomic signatures of traffic-related air pollution. Environ Sci Technol. doi:10.1021/acs.est.3c09148
- Maitre L et al. (2022). Multi-omics signatures of the human early life exposome. Nat Commun 13:7024. doi:10.1038/s41467-022-34422-2
- Stratakis N et al. (2025). Multi-omics architecture of childhood obesity. Nat Commun 16:854. doi:10.1038/s41467-025-56013-7
- Liu C et al. (2021). Time-resolved systems immunology reveals a late juncture linked to fatal COVID-19. Cell 184(7):1836-1857. doi:10.1016/j.cell.2021.02.018
- Liu H, Roeder K, Wasserman L (2010). Stability approach to regularization selection (StARS). NeurIPS 23.
- Meinshausen N, Buhlmann P (2010). Stability selection. J R Stat Soc B 72:417-473. doi:10.1111/j.1467-9868.2010.00740.x



