Add WFDB Support for PhysioNet Waveform Datasets#19
Merged
tompollard merged 1 commit intomainfrom Nov 5, 2025
Merged
Conversation
- Add WFDBHandler for processing .hea files and extracting metadata - Support multi-file WFDB records (.hea, .dat, .atr) via related_files pattern - Create RecordSet with Field entries for each signal (MLII, V5, etc.) - Extract complete metadata: sampling frequency, units, ADC gain, checksums - Tested on MIT-BIH Arrhythmia Database (71 records, 213 files) - All existing CSV tests (MIMIC-IV, eICU) pass
rafiattrach
commented
Oct 23, 2025
Member
|
Nice, thanks Rafi! |
tompollard
approved these changes
Nov 5, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add WFDB Support for PhysioNet Waveform Datasets
Summary
Adds support for WFDB (WaveForm DataBase) format to the Croissant metadata generator. The tool can now process PhysioNet physiological signal datasets in addition to tabular CSV files.
Motivation
WFDB is the standard format for PhysioNet waveform databases (ECG, EEG, etc.). This enables Croissant metadata generation for datasets like MIT-BIH Arrhythmia Database, MIMIC-III Waveforms, and other PhysioNet signal repositories.
Key Changes
WFDB Record Structure
WFDB records consist of multiple files per recording:
Maps to Croissant as:
Implementation
WFDBHandlerprocesses.heafiles and extracts all metadatarelated_filespattern for formats where multiple physical files form one logical recordTesting
Tested on MIT-BIH Arrhythmia Database (71 records, 213 files):
$ croissant-maker -i mitdb/1.0.0/ -o mitdb.jsonld \ --name "MIT-BIH Arrhythmia Database" \ --url "https://physionet.org/content/mitdb/1.0.0/" Files: 213 | Record sets: 71 | Validated successfullyAll existing CSV tests (MIMIC-IV, eICU) pass.