A modular PyTorch dataset library for working with MIMIC-CXR-JPG and MIMIC-IV datasets.
This library provides ready-to-use dataset classes and a clean BaseDataset abstraction to help researchers quickly build and experiment with MIMIC-derived datasets.
- 🏥 Built-in support for MIMIC-CXR-JPG and MIMIC-IV
- 🧩 Define your own dataset in seconds using
BaseDataset - ⚡ Fully compatible with PyTorch’s
DataLoader - 🛠️ Helper functions for filtering, processing, and batching
- 📓 Example notebooks for rapid exploration
- Python ≥ 3.10
- PyTorch ≥ 2.5
- Access to MIMIC-CXR-JPG and/or MIMIC-IV via PhysioNet
Install directly via pip:
pip install git+https://github.com/naughtFound/mimic.gitmimic/
├── datasets/ # Core Dataset classes (BaseDataset, CXR, IV)
├── utils/ # Shared utilities and helpers
notebooks/
├── cxr.ipynb # MIMIC-CXR visualization and exploration
├── iv.ipynb # MIMIC-IV usage demo
requirements.txt # Dependency list
setup.py # Install script
pyproject.toml # Build metadata
To define your own dataset, subclass BaseDataset and implement two things:
_files(): a list of files to download from PhysioNet.collate_fn(): how to batch items together for theDataLoader
That's it — no need to redefine PyTorch boilerplate.
from mimic.datasets import BaseDataset
class MyDataset(BaseDataset):
def _files(self) -> dict:
# Prepare your list of files or samples
return {...}
def collate_fn(self, batch:list[int]):
# Define how samples should be combined into a batch
return ...This project is licensed under the MIT License. See LICENSE for details.
Have ideas, improvements, or bug fixes? Open an issue or submit a pull request — contributions are welcome!
You must be credentialed and approved via PhysioNet to access:
See the respective pages for details on requesting access.