A machine learning project that benchmarks various Bayesian and traditional models for forecasting weekly precipitation in Boston, with emphasis on uncertainty quantification.
This project compares multiple ML approaches to predict Boston's weekly precipitation patterns while providing uncertainty estimates. We evaluate Bayesian regression models, Gaussian Processes, Bayesian Neural Networks, and Decision Trees using NOAA weather data spanning 2005-2024.
.
├── data/ # Data directory
├── Results/ # Results and output visualizations
├── best_bnn.pt # Saved weights for the best Bayesian NN model
├── data_cleaning.ipynb # Notebook for preprocessing and feature engineering
├── LICENSE # Project license
├── Gaussian_Process_Regression.ipynb # Notebook for gaussian process regression with GridSearch Optimization and Bayesian hyperparameter optimization
├── models.ipynb # Notebook for model training and evaluation
└── README.md # This file
-
Bayesian Linear & Polynomial Regression using PyMC3
- Models with different priors (Gaussian, Gamma)
- Domain-informed priors based on feature correlations
-
Gaussian Process Regression
- Composite kernels: Constant × Squared Exponential
- Both grid search and Bayesian hyperparameter optimization
-
Bayesian Neural Networks
- MLP and RNN with Monte Carlo Dropout
- Vanilla Bayesian MLP with variational inference
- Laplace Bayesian MLP for aleatoric uncertainty modeling
-
Decision Tree Regression
- Enhanced with rolling statistical features
- Optimized via hyperparameter tuning
- Feature engineering with lagged variables, seasonal indicators, and rolling statistics
- Uncertainty quantification using Bayesian methods
- Comparative analysis of model accuracy and calibration metrics
- Visualization of predictions, uncertainty intervals, and model performance
| Model | RMSE ↓ | MAE ↓ | R² ↑ | NLL ↓ | 95% Coverage ↑ |
|---|---|---|---|---|---|
| Bayesian Regression | 23.63 | 17.97 | -0.07 | 8.998 | 52.17% |
| Gaussian Process | 28.50 | 19.41 | -0.44 | 76.837 | 13.04% |
| Laplace BNN | 26.66 | 17.97 | -0.18 | 4.685 | 92.75% |
| Decision Tree | 14.65 | 7.45 | 0.41 | N/A | N/A |
The Decision Tree with rolling features achieved the best accuracy metrics, while the Laplace Bayesian Neural Network provided the most well-calibrated uncertainty estimates.
- Python 3.8+
- PyMC3
- PyTorch
- scikit-learn
- pandas, numpy, matplotlib
- Data preprocessing:
jupyter notebook data_cleaning.ipynb- Model training and evaluation:
jupyter notebook models.ipynb- Muhammad Salman: Neural Network models & feature engineering
- Manivannan Senthil Kumar: Decision Tree models & rolling features
- Nikhil Anil Prakash: Gaussian Process Regression
- Mohit Kakda: Bayesian regression models
- Gal, Y., & Ghahramani, Z. (2016). Dropout as a bayesian approximation: Representing model uncertainty in deep learning.
- NOAA GHCND: Global historical climatology network - daily dataset.
- Scikit-learn GaussianProcessRegressor documentation.
