This repository provides the training and evaluation framework for an Electronic Health Record (EHR) Foundation Model. Developed within Verily Workbench using the All of Us Research Program dataset, this model transforms complex longitudinal medical histories into actionable insights for:
- Disease Forecasting: Predicting future diagnoses based on historical clinical markers.
- Risk Stratification: Identifying high-risk patient cohorts for clinical intervention.
For full details on the approach and results, see our paper:
Integrating Genomics into Multimodal EHR Foundation Models arXiv:2510.23639
The pipeline covers the full workflow from raw EHR data to evaluation:
- Data export -- Extract structured clinical data from All of Us BigQuery tables
- Tokenization -- Transform records into token sequences suitable for autoregressive modeling
- Pre-training -- Train a GPT-style foundation model on the tokenized sequences (supports single- and multi-GPU setups via HuggingFace Accelerate or NeMo)
- Task evaluation -- Generate labeled evaluation datasets for downstream clinical prediction tasks (e.g., Type 2 Diabetes onset)
- Inference & scoring -- Run predictions and compute evaluation metrics
A mock dataset is included so you can verify the pipeline end-to-end without access to All of Us data.
- Python 3.10+
- uv (Python package manager)
- NVIDIA Tesla V100 or higher
-
Install
uvand create a virtual environment:curl -LsSf https://astral.sh/uv/install.sh | sh uv venvIf running inside the AoU Researcher Workbench, you may need to clear pre-installed environments from your path:
export PATH=$(echo "$PATH" | tr ':' '\n' | grep -v -E 'workbench|conda' | tr '\n' ':' | sed 's/:$//') export PYTHONPATH=$(echo "${PYTHONPATH:-}" | tr ':' '\n' | grep -v 'workbench' | tr '\n' ':' | sed 's/:$//')
-
Install dependencies:
uv sync
To run this model on the All of Us dataset, you must be a registered researcher with the All of Us program.
- Register: Sign up at the All of Us Research Hub.
- Environment: Once granted access, we recommend executing the model via the new Researcher Workbench on Verily Workbench.
If you do not yet have data access, we’ve included a mock dataset within the /verily/forecast/mock_data directory. This allows you to test the pipeline architecture and training scripts immediately.
# Tokenize the mock dataset
uv run verily/forecast/aou_data_loader.py
# Train a small model
uv run verily/forecast/trainer.py --use-mock-data
# Run inference
uv run verily/forecast/inference.py -m <path-to-saved-model> -mn gpt -d <path-to-eval-dataset> -t T2D-
Export data from the All of Us BigQuery tables. Use
-nto export a smaller sample for faster iteration:uv run verily/forecast/export_data.py -m export -n 10000This step requires access to the AoU CDR BigQuery dataset. Set the
WORKSPACE_CDRenvironment variable to point to your CDR, e.g. for the Registered Tier dataset:export WORKSPACE_CDR="wb-affable-acorn-7941.R2024Q3R8"
-
Tokenize the exported data into model-ready sequences. If you sampled with
-nin the previous step, add--skip-filtering:uv run verily/forecast/aou_data_loader.py
-
Train the model. On a multi-GPU machine, use
accelerate:# Single GPU uv run verily/forecast/trainer.py # Multi-GPU uv run accelerate launch verily/forecast/trainer.py # With Weights & Biases logging (see "Weights & Biases" section below) uv run verily/forecast/trainer.py --enable-wandb
Alternatively, train with NeMo by pointing the YAML config at your dataset:
source .venv/bin/activate cd verily/forecast/nemo && python pretrain.py --config aou_gpt_pretrain.yaml # Multi-GPU with NeMo torchrun --nproc-per-node 8 pretrain.py --config aou_gpt_pretrain.yaml
-
Generate evaluation data for a downstream task (e.g., Type 2 Diabetes):
uv run verily/forecast/analysis.py --task T2D
-
Run inference on the evaluation set:
uv run verily/forecast/inference.py -m <path-to-saved-model> -mn gpt -d <path-to-eval-dataset> -t T2D
For models trained with NeMo, add the
-fgand-ncflags:uv run verily/forecast/inference.py -m <path-to-saved-model> -mn gpt -d <path-to-eval-dataset> -t T2D -fg -nc <path-to-nemo-yaml-config>
-
(Optional) Evaluate a batch of labeled predictions:
uv run verily/forecast/eval.py --inference-path <path-to-inference>
See single_subject_inference.ipynb for an end-to-end
walkthrough of running the model for a single patient.
Weights & Biases (W&B) integration is available for experiment tracking during
training and inference. It is disabled by default and can be enabled with the --enable-wandb
flag.
-
Create an account at wandb.ai (free for personal and academic use).
-
Log in from the command line:
uv run wandb login
This will prompt you for an API key, which you can find at wandb.ai/authorize. The key is saved to
~/.netrcso you only need to do this once per machine. -
Enable logging by passing
--enable-wandbto the training or inference script:# Training with W&B uv run verily/forecast/trainer.py --enable-wandb # Inference with W&B uv run verily/forecast/inference.py --enable-wandb -m <model-path> -mn gpt -d <dataset-path> -t T2D
Runs will appear under the forecast project (training) or eval_inference project (inference) in
your W&B dashboard. Logged metrics include training loss, epoch loss, and inference configuration.
If you use this code in your research, please cite:
@article{amar2025integratinggenomicsmultimodalehr,
title={Integrating Genomics into Multimodal EHR Foundation Models},
author={Jonathan Amar and Edward Liu and Alessandra Breschi and Liangliang Zhang and Pouya Kheradpour and Sylvia Li and Lisa Soleymani Lehmann and Alessandro Giulianelli and Matt Edwards and Yugang Jia and David Nola and Raghav Mani and Pankaj Vats and Jesse Tetreault and T. J. Chen and Cory Y. McLean},
year={2025},
eprint={2510.23639},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2510.23639},
}This project is provided for research purposes. See LICENSE for details.