-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
When PyHealth is installed via pip (from git or PyPI), the YAML configuration files in pyhealth/datasets/configs/ are not included in the installed package. This causes FileNotFoundError when trying to use several datasets.
Error Example
from pyhealth.datasets import HALO_MIMIC3Dataset
dataset = HALO_MIMIC3Dataset(mimic3_dir="data")Results in:
FileNotFoundError: [Errno 2] No such file or directory:
'/usr/local/lib/python3.12/dist-packages/pyhealth/datasets/configs/hcup_ccs_2015_definitions_benchmark.yaml'
Affected Datasets
This issue affects multiple datasets that rely on YAML configuration files:
halo_mimic3.py→hcup_ccs_2015_definitions_benchmark.yamlmimic3.py→mimic3.yamlmimic4.py→mimic4_cxr.yaml,mimic4_ehr.yaml,mimic4_note.yamlehrshot.py→ehrshot.yamlcovid19_cxr.py→covid19_cxr.yamlmedical_transcriptions.py→medical_transcriptions.yaml
Root Cause
The setup.py has include_package_data=True but there is no MANIFEST.in file to specify which non-Python files should be included in the package. By default, setuptools only includes .py files.
Solution
Create a MANIFEST.in file in the repository root:
include README.rst
include requirements.txt
include LICENSE
recursive-include pyhealth/datasets/configs *.yaml *.yml
This tells setuptools to include all YAML files in the configs directory when building the package.
Verification
After the fix, verify with:
python setup.py sdist
tar -tzf dist/pyhealth-*.tar.gz | grep "\.yaml"Should show:
pyhealth-1.1.4/pyhealth/datasets/configs/covid19_cxr.yaml
pyhealth-1.1.4/pyhealth/datasets/configs/ehrshot.yaml
pyhealth-1.1.4/pyhealth/datasets/configs/hcup_ccs_2015_definitions_benchmark.yaml
pyhealth-1.1.4/pyhealth/datasets/configs/medical_transcriptions.yaml
pyhealth-1.1.4/pyhealth/datasets/configs/mimic3.yaml
pyhealth-1.1.4/pyhealth/datasets/configs/mimic4_cxr.yaml
pyhealth-1.1.4/pyhealth/datasets/configs/mimic4_ehr.yaml
pyhealth-1.1.4/pyhealth/datasets/configs/mimic4_note.yaml
Context
Discovered while fixing the HALO Colab notebook in PR sunlabuiuc#528. The notebook installs PyHealth from git and users encountered this error when trying to load the HALO_MIMIC3Dataset.
This is a project-wide packaging issue that affects any user installing PyHealth via pip rather than running from source.