Skip to content

[medgan-pr-prep] Overview: MedGAN PyHealth 2.0 Integration Tasks #46

@jalengg

Description

@jalengg

Tracking issue for integrating MedGAN into PyHealth 2.0 standards. Follows the same 9-task pattern used for previous model integrations.

Background

MedGAN is a GAN-based EHR synthetic data generator using:

  • A two-phase training process: pretrain autoencoder first, then adversarial training
  • Standard BCE loss (NOT WGAN — discriminator uses sigmoid + BCE)
  • Minibatch averaging in the discriminator for training stability
  • multi_hot schema: flat ICD code lists → binary vectors

Source: pyhealth/models/generators/medgan.py (ported from corgan-medgan-port branch)

Key differences from CorGAN

CorGAN MedGAN
GAN loss Wasserstein (WGAN) BCE (standard)
Discriminator WGAN critic, no sigmoid Sigmoid + minibatch averaging
Training Single phase Two-phase: AE pretrain then GAN
Autoencoder No AE Linear AE for latent space

Tasks

  • T1 — Merge upstream + copy medgan.py into worktree
  • T2 — Create MedGANGenerationMIMIC3(BaseTask) task function
  • T3 — Refactor MedGAN to remove DummyWrapper, add train_model() / synthesize_dataset()
  • T4 — Update training example
  • T5 — Update generation example
  • T6 — Check/remove bespoke dataset class (likely no-op)
  • T7 — Update docstrings
  • T8 — Integration tests (8 MECE in-memory + 4 MIMIC-III skip-gracefully)
  • T9 — Final verification + push to jalengg/PyHealth:medgan-pr-integration

Branch

medgan-pr-integration (worktree at ~/.config/superpowers/worktrees/PyHealth/medgan-pr-integration)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions