Skip to content

Add WFDB Support for PhysioNet Waveform Datasets#19

Merged
tompollard merged 1 commit intomainfrom
feature/wfdb-support
Nov 5, 2025
Merged

Add WFDB Support for PhysioNet Waveform Datasets#19
tompollard merged 1 commit intomainfrom
feature/wfdb-support

Conversation

@rafiattrach
Copy link
Collaborator

Add WFDB Support for PhysioNet Waveform Datasets

Summary

Adds support for WFDB (WaveForm DataBase) format to the Croissant metadata generator. The tool can now process PhysioNet physiological signal datasets in addition to tabular CSV files.

Motivation

WFDB is the standard format for PhysioNet waveform databases (ECG, EEG, etc.). This enables Croissant metadata generation for datasets like MIT-BIH Arrhythmia Database, MIMIC-III Waveforms, and other PhysioNet signal repositories.

Key Changes

WFDB Record Structure

WFDB records consist of multiple files per recording:

Record "100":
├── 100.hea  (header: metadata, signal names, sampling rate)
├── 100.dat  (data: raw signals)
└── 100.atr  (annotations: beat markers)

Maps to Croissant as:

Croissant Metadata for "100":
├── FileObjects (3 physical files):
│   ├── 100.hea (application/x-wfdb-header)
│   ├── 100.dat (application/x-wfdb-data)
│   └── 100.atr (application/x-wfdb-atr)
└── RecordSet "100" (logical schema):
    ├── Field "MLII" (sc:Float)
    └── Field "V5" (sc:Float)

Implementation

  • New handler: WFDBHandler processes .hea files and extracts all metadata
  • Multi-file support: Introduced related_files pattern for formats where multiple physical files form one logical record
  • Signal schema: Creates RecordSet with Field entries for each signal
  • Complete metadata: Extracts sampling frequency, units, ADC gain, checksums, etc.

Testing

Tested on MIT-BIH Arrhythmia Database (71 records, 213 files):

$ croissant-maker -i mitdb/1.0.0/ -o mitdb.jsonld \
    --name "MIT-BIH Arrhythmia Database" \
    --url "https://physionet.org/content/mitdb/1.0.0/"
 Files: 213 | Record sets: 71 | Validated successfully

All existing CSV tests (MIMIC-IV, eICU) pass.

- Add WFDBHandler for processing .hea files and extracting metadata
- Support multi-file WFDB records (.hea, .dat, .atr) via related_files pattern
- Create RecordSet with Field entries for each signal (MLII, V5, etc.)
- Extract complete metadata: sampling frequency, units, ADC gain, checksums
- Tested on MIT-BIH Arrhythmia Database (71 records, 213 files)
- All existing CSV tests (MIMIC-IV, eICU) pass
@tompollard
Copy link
Member

Nice, thanks Rafi!

@tompollard tompollard merged commit 69b4b7e into main Nov 5, 2025
3 checks passed
@tompollard tompollard deleted the feature/wfdb-support branch November 5, 2025 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants