COMP 542 Project
A machine learning-driven analysis of behavioral, physiological, and lifestyle factors to predict sleep efficiency.
- Brandon Ismalej
- Jittapatana (Patrick) Prayoonpruk
- John Lee
Sleep efficiency is a vital indicator of sleep quality and overall health. This project utilizes advanced machine learning algorithms to forecast sleep efficiency using behavioral and lifestyle-related data. It aims to assist individuals and healthcare professionals in identifying patterns that influence sleep and promote better sleep hygiene.
- Source: Kaggle - Sleep Efficiency Dataset
- Samples: 452 observations
- Features: 14
- Sleep duration, REM %, deep/light sleep %, caffeine/alcohol consumption, smoking, gender, awakenings, etc.
- Target: Sleep Efficiency (continuous)
- Train-Test Split: 80% train / 20% test
- Cross Validation: k-fold with k=5
- Categorical Encoding: Binary encoding for gender and smoking status
- Missing Values: Imputed using mean/median values within groups
- Feature Selection Techniques:
- Correlation Analysis
- Mutual Information
- Recursive Feature Elimination (RFE)
- Random Forest Importance
| Feature | Correlation | Impact Description |
|---|---|---|
| Light Sleep % | -0.82 | Strong negative impact on sleep efficiency |
| Deep Sleep % | +0.79 | Strong positive correlation (most restorative) |
| Awakenings | -0.55 | More awakenings reduce sleep quality |
| Smoking Status | -0.29 | Behavioral factor negatively affecting sleep |
- Why: Simplicity, interpretability, and effectiveness for linear patterns
- Metric: Adjusted R², AIC, BIC
- RMSE: 0.0617
- Untuned RMSE: 0.0563
- Tuned RMSE: 0.0494 (best performing model)
- All Features RMSE: 0.0511
- After Feature Selection RMSE: 0.0525
- All Features RMSE: 0.0503
- With Feature Selection RMSE: 0.0557
| Model | RMSE | Notes |
|---|---|---|
| Multiple Regression | 0.0617 | Stepwise selected features |
| XGBoost (untuned) | 0.0563 | Baseline GBM |
| XGBoost (tuned) | 0.0494 | ⭐️ Best performance |
| LightGBM (all features) | 0.0511 | Fast and efficient |
| Random Forest | 0.0503 | Robust ensemble model |
- Python:
scikit-learn,xgboost,lightgbm,pandas,matplotlib - R:
stepAIC,ggplot2,lm(), diagnostic plotting - Jupyter & RMarkdown for exploratory and final analysis
- Personalized sleep tracking and health monitoring
- Integration into smartwatches, fitness trackers, or health apps
- Use in sleep disorder clinics for patient screening
- Deploy models in a web/mobile interface
- Integrate additional physiological signals (heart rate, HRV)
- Use time-series modeling for longitudinal sleep data