Official code for "An Electrocardiogram Multi-Task Benchmark with Comprehensive Evaluations and Insightful Findings". This paper has been accepted by The 20th World Congress on Medical and Health Informatics (MedInfo 2025).
We provide a comprehensive ECG multitasks benchmark to evaluate large language models, general time-series foundation models, and ECG foundation model in comparison with time-series deep learning models across five different types of downstream tasks under zero-shot, few-shot, and fine-tuning settings, including RR Interval Estimation, Age Estimation, Gender Classification, Potassium Abnormality Prediction and Arrhythmia Detection.
We provide .jsonl file subset from the MIMIC-IV-ECG, along with the corresponding labels to evaluate in different downstream tasks, including RR Interval Estimation rr_interval, Age Estimation age, Gender Classification gender, Potassium Abnormality Prediction flag, and Arrhythmia Detection report_label.
The required packages can be installed by running pip install -r requirements.txt.
For ECG-FM environment please refer the link ECG-FM and fairseq-signals.
In the scripts folder, we provide shell scripts, and you can change the --task_name parameter to start the evaluation.