A clean, reproducible benchmark repo for time series forecasting on the ETT (Electricity Transformer Temperature) dataset.
-
Create environment: Create a conda environment named
energybenchwith Python 3.11 and activate it. -
Install dependencies: Install all required packages from
requirements.txt. -
Download dataset: Download the ETT dataset using the download tool.
-
Train model: Train any model (PatchTST, Autoformer, Informer, XGBoost, Prophet, or ARIMA) on the ETT dataset with default hyperparameters.
-
Evaluate and report: Evaluate the trained model and generate a leaderboard CSV file.
- ETT dataset focus โ ETT (Electricity Transformer Temperature) datasets with curated loaders + preprocessing.
- Reproducible splits (time-based: train/val/test with rolling-origin).
- Leakage-safe scaling (fit on train, apply on val/test).
- Standard metrics: MAE, RMSE, MAPE, sMAPE, MASE (with seasonal naive), CRPS (probabilistic models).
- Strong baselines: Fully implemented classical models (ARIMA, Prophet, XGBoost) vs. SOTA deep learning models (PatchTST, Autoformer, Informer).
- Config-first (Hydra/OmegaConf) & experiment tracking.
- CI sanity checks on tiny subsets of ETT datasets.
- ETT (Electricity Transformer Temperature) โ transformer station load/temperature data:
- ETTh1, ETTh2 โ hourly data
- ETTm1, ETTm2 โ 15-minute data
The repository provides PyTorch dataset loaders for ETT datasets. These loaders handle data scaling, time feature extraction, and proper train/val/test splits.
Data Splits:
- ETTh: 12 months train, 4 months val, 4 months test
- ETTm: 12 months train, 4 months val, 4 months test (with 4 samples per hour)
- Fixed split (default):
- Train: first 70% of timeline
- Val: next 10%
- Test: last 20%
- Rolling-origin: multiple forecast origins across test span.
- Multi-horizon: report H โ {24, 48, 96, 168} with context L โ {96, 168, 336, 720}.
- Seasonality-aware MASE: seasonal period = 24 (hourly), = 96 (15-min).
The repository supports multiple forecasting models, from classical baselines to state-of-the-art deep learning architectures.
PatchTST โ Patch-based Transformer for long-term time series forecasting. This implementation matches the official PatchTST architecture (ICLR 2023) with patching, channel independence, RevIN, and efficient architecture.
Documentation: See models/patchtst/README.md for detailed architecture explanation, paper reference, and usage guide.
Paper: A Time Series is Worth 64 Words: Long-term Forecasting with Transformers (ICLR 2023)
Autoformer โ Decomposition Transformer with Auto-Correlation for long-term forecasting. Uses series decomposition and auto-correlation mechanism for efficient long-sequence modeling.
Documentation: See models/autoformer/README.md for detailed architecture explanation.
Informer โ Efficient Transformer for long sequence forecasting using ProbSparse attention mechanism.
Documentation: See models/informer/README.md for detailed architecture explanation.
XGBoost โ Gradient boosting framework for time series with automatic feature engineering (lag features, rolling statistics, time features).
Documentation: See models/classical/README.md for detailed usage guide.
Prophet โ Facebook's additive regression model for time series forecasting with automatic seasonality detection and holiday effects.
Documentation: See models/classical/README.md for comprehensive parameter tuning guide and examples.
ARIMA โ AutoRegressive Integrated Moving Average model for statistical time series forecasting with automatic order selection.
Documentation: See models/classical/README.md for detailed usage guide and parameter tuning tips.
All models can be configured via YAML files in configs/models/. Each model has a corresponding configuration file:
configs/models/patchtst.yaml- PatchTST configurationconfigs/models/autoformer.yaml- Autoformer configurationconfigs/models/informer.yaml- Informer configurationconfigs/models/xgboost.yaml- XGBoost configurationconfigs/models/prophet.yaml- Prophet configuration
Use the --config flag to load a custom configuration file, or override specific parameters directly via command-line arguments.
Basic training:
python train.py --model patchtst --dataset etth1 --epochs 10 --batch_size 32With config file:
python train.py --config configs/models/patchtst.yaml --dataset etth1Resume training:
python train.py --resume runs/experiment_name/last_model.ptNeural model training options:
--model: Model name (patchtst,autoformer,informer)--dataset: Dataset name (etth1,etth2,ettm1,ettm2)--target: Target column name (default:'OT')--context_len: Input sequence length (default:336)--horizon: Forecast horizon (default:96)--epochs: Number of training epochs (default:10)--batch_size: Batch size (default:32)--lr: Learning rate (default:1e-3)--weight_decay: Weight decay (default:1e-4)--dropout: Dropout rate (default:0.1)--patience: Early stopping patience (default:10)--save_best_after: Only save best model after N epochs (default:0= immediately)--scheduler: Learning rate scheduler (cosine,step,plateau,none)--seed: Random seed for reproducibility (default:42)--device: Device (cuda/cpu, auto-detects if not specified)--output_dir: Output directory for experiments (default:'runs')--exp_name: Experiment name (auto-generated if not specified)--config: Path to YAML config file--resume: Resume from checkpoint path
Basic training:
python train.py --model xgboost --dataset etth1
python train.py --model prophet --dataset etth1 --horizon 96
python train.py --model arima --dataset etth1 --horizon 96With config file:
python train.py --config configs/models/xgboost.yaml --dataset etth1Classical model training options:
--model: Model name (xgboost,prophet,arima)--dataset: Dataset name (etth1,etth2,ettm1,ettm2)--target: Target column name (default:'OT')--context_len: Input sequence length for feature extraction (default:336)--horizon: Forecast horizon (default:96)--seed: Random seed for reproducibility (default:42)--output_dir: Output directory for experiments (default:'runs')--exp_name: Experiment name (auto-generated if not specified)--config: Path to YAML config file with model-specific parameters
Use the provided shell scripts in the scripts/ directory for batch training:
Train all models on ETTh datasets:
bash scripts/run_all_etth.shTrain all models on ETTm datasets:
bash scripts/run_all_ettm.shThese scripts train all 6 models (xgboost, prophet, arima, patchtst, autoformer, informer) on all dataset variants (etth1/etth2 or ettm1/ettm2).
The evaluation script (eval.py) supports both neural and classical models with comprehensive metrics.
Evaluate a trained model:
python eval.py --exp_dir runs/experiment_name --split testSpecify checkpoint:
python eval.py --exp_dir runs/experiment_name --checkpoint best_model.pt --split testCustom metrics:
python eval.py --exp_dir runs/experiment_name --metrics MAE,RMSE,MAPE,sMAPE,MASE --split testEvaluation options:
--exp_dir: Experiment directory containing model and config (required)--checkpoint: Specific checkpoint file (default:'best_model.pt'for neural,'trained_model.pkl'for classical)--split: Split to evaluate (train,val,test, default:'test')--batch_size: Batch size for neural models (default:32)--device: Device (cuda/cpu, auto-detects if not specified)--metrics: Comma-separated metrics (default:'MAE,RMSE,MAPE,sMAPE,MASE')--seasonal_period: Seasonal period for MASE (default:24for hourly,96for 15-min)--output_file: Output file for results (default: saves to experiment directory asresults_{split}.json)
Metrics computed:
- MAE: Mean Absolute Error
- RMSE: Root Mean Squared Error
- MAPE: Mean Absolute Percentage Error
- sMAPE: Symmetric Mean Absolute Percentage Error
- MASE: Mean Absolute Scaled Error (requires training data and seasonal period)
- CRPS: Continuous Ranked Probability Score (for probabilistic models with prediction intervals)
Results location: Results are saved to {exp_dir}/results_{split}.json and also printed to the console.
Generate a leaderboard comparing multiple experiments by running tools/make_leaderboard.py.
Compare multiple experiments:
python tools/make_leaderboard.py runs/exp1 runs/exp2 runs/exp3Compare all experiments in runs directory (bash):
python tools/make_leaderboard.py runs/*Save to CSV:
python tools/make_leaderboard.py runs/exp1 runs/exp2 --output leaderboard.csvSort by different metric:
python tools/make_leaderboard.py runs/exp1 runs/exp2 --sort-by RMSELeaderboard options:
exp_dirs: One or more experiment directory paths (required)--output,-o: Output CSV file path (default: don't save)--sort-by: Metric to sort by (default:MAE)
Leaderboard includes:
- Rank (sorted by specified metric)
- Model name
- Dataset name
- Split (train/val/test)
- All metrics found in results files (MAE, RMSE, MSE, MAPE, sMAPE, MASE, etc.)
- Experiment directory and name (for reference)
The script automatically finds results.json (from training) or results_{split}.json (from eval.py) files in each experiment directory and extracts all metrics.
Basic download:
python tools/download_data.py --dataset etth1Download all ETTh datasets:
python tools/download_data.py --dataset etthDownload all ETTm datasets:
python tools/download_data.py --dataset ettmAvailable datasets:
etth: ETTh1/ETTh2 (hourly transformer data)ettm: ETTm1/ETTm2 (15-min transformer data)etth1: ETTh1 dataset (hourly)etth2: ETTh2 dataset (hourly)ettm1: ETTm1 dataset (15-min)ettm2: ETTm2 dataset (15-min)
Download options:
--dataset: Dataset name (required)--subset: Subset size (small/full, default:'full')--output: Output directory (default:'data/raw')--force: Force re-download even if exists
Install test dependencies:
pip install pytest pytest-covRun all tests:
pytest tests/ -vRun specific test categories:
# Test imports
pytest tests/test_imports.py -v
# Test models
pytest tests/test_models.py -v
# Test datasets (requires data download)
pytest tests/test_datasets.py -v
# Test metrics
pytest tests/test_metrics.py -v
# Test training pipeline
pytest tests/test_training_pipeline.py -vQuick training test (1 epoch, small model):
python train.py --model patchtst --dataset etth1 --epochs 1 --batch_size 2 \
--context_len 96 --horizon 24 --d_model 32 --n_heads 2 --n_layers 1 \
--exp_name test_runSee TESTING.md for detailed testing instructions, including:
- Running tests with coverage
- Integration testing
- Troubleshooting
- Adding new tests
- Best practices
- One PR per model/dataset.
- Include
configs/+scripts/for batch runs. - Add tiny fixture (โค200 KB) for CI.