This project was developed during an intensive 4-day sprint at HTW Berlin. The goal was to forecast daily waste volumes (tonnage) for the Berliner Stadtreinigung (BSR) based on a four-year historical dataset.
The core research question was whether integrating external urban data sources could significantly improve the prediction accuracy of Machine Learning models compared to using historical waste data alone.
- Data Fusion: Integrated the primary BSR dataset with five external sources:
- Weather data (DWD - German Weather Service)
- School holiday calendars
- Public holidays
- Election surveys ("Sonntagsfrage")
- Temporal features (weekdays, months, seasons)
- Feature Engineering: Developed advanced time-series features, including
Tonnage_lag_2(lagged values) and rolling averages to capture seasonal and weekly trends. - Model Benchmarking: Implemented and compared three different modeling approaches:
- Linear Regression (serving as the baseline)
- Decision Tree
- Random Forest & XGBoost (the top-performing models)
- Evaluation: Models were validated using Mean Squared Error (MSE), RMSE, and R²-Score to ensure robust forecasting.
- Language: Python
- Libraries: Pandas, NumPy, Scikit-Learn, XGBoost, Matplotlib, Seaborn.
- Environment: Jupyter Notebooks.
-
External vs. Internal Data: The analysis revealed that internal features—specifically historical lag data and the "Tour ID"—had a much higher predictive power than external factors like weather or holiday status.
-
Model Performance: The XGBoost model outperformed all others, achieving an R²-Score of approximately 0.64, proving its effectiveness in handling complex tabular data.
-
Operational Value: The findings suggest that while external data adds context, highly localized grouping (by specific depots and days) is the most effective path toward optimizing daily resource planning.
-
The results are summarized in the documentation.
- Type: Study Project (Module: AI Analytics)
- Collaboration: Developed by a team of four students.
- Timeline: 4-Day Sprint
- Institution: HTW Berlin (University of Applied Sciences)
- Status: Completed (Proof of Concept)