Skip to content

Fix/eicu#832

Merged
jhnwu3 merged 17 commits intomasterfrom
fix/eicu
Feb 12, 2026
Merged

Fix/eicu#832
jhnwu3 merged 17 commits intomasterfrom
fix/eicu

Conversation

@jhnwu3
Copy link
Collaborator

@jhnwu3 jhnwu3 commented Feb 9, 2026

This pull request modernizes and standardizes the example scripts for running clinical prediction tasks on the eICU dataset, introduces a YAML configuration for eICU, and updates task imports for easier use of new task classes. The most significant updates are the migration of all eICU example scripts to use the new BaseDataset-based eICUDataset and corresponding BaseTask classes, the addition of a detailed YAML configuration for eICU, and improvements to code clarity and maintainability throughout the examples.

Major improvements to eICU examples:

  • Migrated drug_recommendation_eicu_transformer.py, length_of_stay_eicu_rnn.py, mortality_prediction_eicu_rnn.py, and readmission_eicu_rnn.py to use the new eICUDataset class (with YAML config), and the new BaseTask classes (DrugRecommendationEICU, LengthOfStayPredictioneICU, MortalityPredictionEICU, ReadmissionPredictionEICU). These examples now follow a standardized PyHealth workflow, improve feature selection, and include clearer documentation and configuration. [1] [2] [3] [4]

  • Added pyhealth/datasets/configs/eicu.yaml, providing a comprehensive YAML configuration for the eICU dataset, defining tables, joins, and attributes for streamlined data loading and processing.

Task import and API improvements:

  • Updated pyhealth/tasks/__init__.py to expose the new eICU BaseTask classes (DrugRecommendationEICU, ReadmissionPredictionEICU) for direct import and usage in examples and user code. [1] [2]

Minor improvements and consistency changes:

  • Made minor syntax and formatting updates to MIMIC-III, MIMIC-IV, and OMOP readmission example scripts for consistency (e.g., main guard style, argument formatting). [1] [2] [3] [4] [5] [6]

These changes collectively make the eICU example workflows more robust, maintainable, and aligned with the latest PyHealth APIs.


# Exclude visits without condition, procedure, or drug code
# Exclude stays without condition, procedure, or drug code
if len(conditions) * len(procedures) * len(drugs) == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting this if statement up could short circuit a lot of computation (here and in other tasks). Any sense of whether that difference will be large enough to notice when processing large datasets?

conditions = ...
if len(conditions) == 0:
    continue

procedures = ...
if len(procecures) == 0:
    continue

drugs = ...
if len(drugs) == 0:
    continue

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, that's a good question, I don't know. Mostly, because I know Python does a lot of underlying optimizations that we don't necessarily think about.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #843 to follow up on this question later.

@jhnwu3 jhnwu3 merged commit 9cbe36e into master Feb 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants