HALO Implementation into PyHealth + Colab Notebook#812
Draft
jalengg wants to merge 60 commits intosunlabuiuc:masterfrom
Draft
HALO Implementation into PyHealth + Colab Notebook#812jalengg wants to merge 60 commits intosunlabuiuc:masterfrom
jalengg wants to merge 60 commits intosunlabuiuc:masterfrom
Conversation
- Add HALO (Healthcare generative model using transformers) implementation - Include example training script with configurable parameters - Include example generation script for synthetic patient data - Add canonical SLURM scripts with optimal parameters (80 epochs, batch_size 48, lr 0.0001) - Register HALO in generators module - Update HALO_MIMIC3Dataset with latest preprocessing - Update README with HALO documentation
Remove README.rst changes that only documented CorGAN, not HALO. This PR should focus solely on HALO implementation.
…ls to HALO notebook Complete Tasks 3-7: - Configuration panel with demo defaults - Data upload with validation - Training logic with checkpoint management - Generation with CSV conversion - Results display with quality checks and download Notebook now has 24 cells with complete end-to-end workflow.
- Replace `!pip install` with subprocess.run() for error checking - Show clear error message if installation fails - Raise RuntimeError to stop notebook execution on failure Fixes #1
jalengg
commented
Feb 16, 2026
|
|
||
| import os | ||
| import sys | ||
| sys.path.insert(0, '/u/jalenj4/PyHealth') |
- Remove PATIENTS.csv and patient_ids.txt (not used by HALO_MIMIC3Dataset) - Handle Colab file renaming (ADMISSIONS (1).csv -> ADMISSIONS.csv) - Allow uploading files one at a time with progress tracking - Check Google Drive for existing files before requesting upload - Add FORK variable to installation cell for easier testing Fixes #4, #5, #6
7 tasks
Ensures Colab users always get the latest version from GitHub without using cached packages. Critical for picking up recent fixes like the halo_resources __init__.py. Fixes #18
Use os.path.join() instead of string concatenation to properly handle directory paths with or without trailing slashes. Fixes #19
Fixes #21 The YAML config files in pyhealth/datasets/configs/ were not being included when the package was installed via pip. This caused FileNotFoundError for multiple datasets including HALO, MIMIC3, MIMIC4, EHRShot, COVID-19 CXR, and Medical Transcriptions. Added MANIFEST.in to specify which non-Python files should be included in the package distribution.
Fixes #21 MANIFEST.in only affects sdist source distributions. When installing via `pip install git+https://...` (as in Colab), pip relies on package_data in setup.py to include non-Python files. Added explicit package_data to ensure YAML configs in pyhealth/datasets/configs/ are included in all install paths. Removed MANIFEST.in as it provided no benefit for pip-from-git installs.
…rom test_halo_model
…rd unique_patients in cell 22
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.