This project applies predictive analytics to Boston's Bluebikes bike-sharing system to address supply-demand mismatches that cause revenue loss and customer dissatisfaction when stations are empty or full.
Bluebikes serves 4.7 million annual rides but faces persistent challenges with bike availability at stations. Current mitigation relies on the "Bike Angels" user incentive program, which cannot adequately respond to dynamic demand from weather changes, events, or peak hours.
Bluebikes generates rich spatiotemporal datasets capturing cycling patterns, station utilization, and user behavior. By leveraging this data through predictive modeling, we can anticipate demand and proactively optimize bike distribution.
- Reduce revenue loss from unavailable bikes
- Improve user satisfaction by ensuring bike availability
- Enable proactive operations instead of reactive responses
- Support city-wide sustainability and traffic reduction initiatives
Develop predictive models using historical ridership patterns, weather data, seasonal variations, and event-driven demand spikes to forecast when and where bikes will be needed most.
bluebikes-mlops/
├── data_pipeline/ # Data collection & processing
│ ├── dags/ # Airflow DAGs
│ ├── scripts/ # Processing scripts
│ └── data/ # Raw & processed data
│
├── model_pipeline/ # Model training & deployment
│ ├── dags/ # Training DAGs
│ ├── scripts/ # Training scripts
│ ├── models/ # Saved models
│ └── monitoring/ # Drift detection
│
├── model_deployment/ # Cloud Run deployment
│ ├── app.py # Flask API
│ └── Dockerfile # Container config
│
├── bluebikes-ui/ # Web interface
│ ├── frontend/ # React app
│ └── backend/ # Express.js API
│
├── docker-compose.yaml # Service orchestration
├── Dockerfile # Airflow image
├── setup.sh # Setup wizard
└── .env.example # Environment template
The following steps will ensure that anyone can set up and reproduce the Bluebikes pipeline - either locally or using the full Dockerized Airflow environment.
# 1. Clone and enter directory
git clone https://github.com/YOUR_USERNAME/bluebikes-mlops.git
cd bluebikes-mlops
# 2. Run setup
chmod +x setup.sh
./setup.sh
# 3. Add your API keys
nano .env
# Fill in: NOAA_API_KEY, GITHUB_TOKEN, GITHUB_REPO
# 4. Start services
./start-airflow.sh
# 5. Open Airflow
open http://localhost:8080
# Login: airflow / airflow| Action | Command |
|---|---|
| Start | ./start-airflow.sh |
| Stop | ./stop-airflow.sh |
| View logs | docker compose logs -f |
| Check status | docker compose ps |
| Restart | docker compose restart |
| Full reset | ./stop-airflow.sh --clean |
# Data collection
docker compose exec airflow-webserver airflow dags trigger data_pipeline_dag
# Model training
docker compose exec airflow-webserver airflow dags trigger bluebikes_integrated_bias_training
# Drift check
docker compose exec airflow-webserver airflow dags trigger drift_monitoring_dagServices won't start?
docker compose down -v
docker compose up airflow-init
docker compose up -dOut of memory?
- Increase Docker memory to 8GB+
- Docker Desktop → Settings → Resources
Permission denied?
echo "AIRFLOW_UID=$(id -u)" >> .env| Service | Get it from | Required? |
|---|---|---|
| NOAA | https://www.ncdc.noaa.gov/cdo-web/token | Yes |
| GitHub | GitHub -> Settings -> Developer settings | Yes |
| Discord | Your Discord server webhook | Optional |
| GCS | Google Cloud Console | Optional |
- Check logs:
docker compose logs -f - Issues: GitHub Issues page
- Reset:
./stop-airflow.sh --clean && ./setup.sh
| Requirement | Version | Check Command |
|---|---|---|
| Docker | 20.10+ | docker --version |
| Docker Compose | 2.0+ | docker compose version |
| Git | 2.30+ | git --version |
| Python | 3.10+ | python3 --version |
git clone https://github.com/PranavViswanathan/Optimizing-Bluebikes-Operations-with-Machine-Learning-Based-Demand-Prediction.git
cd Optimizing-Bluebikes-Operations-with-Machine-Learning-Based-Demand-Prediction┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Data Sources │────▶│ Data Pipeline │────▶│ Feature Store │
│ - BlueBikes │ │ (Airflow DAG) │ │ (Processed) │
│ - NOAA Weather │ │ Daily @ 12AM │ │ │
│ - Boston Cols │ └─────────────────┘ └────────┬────────┘
└─────────────────┘ │
▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Cloud Run API │◀────│ Model Registry │◀────│ Model Training │
│ /predict │ │ (GCS Bucket) │ │ (Airflow DAG) │
│ /health │ │ │ │ Weekly │
└─────────────────┘ └─────────────────┘ └────────┬────────┘
│
┌─────────────────┐ │
│ Drift Monitor │◀─────────────┘
│ (Evidently AI) │
│ Triggers Retrain│
└─────────────────┘
# Run the setup wizard
./setup.shThis will:
- Check prerequisites
- Create necessary directories
- Set up environment configuration
- Build Docker images
- Initialize Airflow
# 1. Create environment file
cp .env.example .env
# 2. Edit with your API keys
nano .env
# 3. Create required directories
mkdir -p keys
mkdir -p data_pipeline/data/{raw,processed}/{bluebikes,NOAA_weather,boston_clg}
mkdir -p model_pipeline/{models,mlruns,artifacts}
mkdir -p model_pipeline/monitoring/{baselines,reports,logs}
# 4. Set Airflow user ID
echo "AIRFLOW_UID=$(id -u)" >> .env
# 5. Build and initialize
docker compose build
docker compose up airflow-init| Variable | Required | Description |
|---|---|---|
NOAA_API_KEY |
Yes | NOAA weather API key |
DISCORD_WEBHOOK_URL |
No | Discord notifications |
GITHUB_REPO |
Yes | Your GitHub repo (user/repo) |
GITHUB_TOKEN |
Yes | GitHub personal access token |
GCS_MODEL_BUCKET |
No* | GCS bucket for models |
AIRFLOW_UID |
Yes | Your user ID (run id -u) |
*Required only for cloud deployment features
If you want to use Google Cloud features:
- Go to GCP Console
- Create a new project or select existing
- Go to IAM & Admin → Service Accounts
- Create service account with roles:
- Storage Admin
- Storage Object Admin
- Create and download JSON key
- Save as
keys/gcs_service_account.json
# Start all services in background
./start-airflow.sh
# or
docker compose up -d
# View logs
docker compose logs -f
# Check service health
./airflow-health-check.sh| Service | URL | Credentials |
|---|---|---|
| Airflow UI | http://localhost:8080 | airflow / airflow |
| MLflow UI | http://localhost:5000 | None |
-
Data Pipeline (runs daily automatically)
- Collects BlueBikes trip data
- Fetches weather data from NOAA
- Processes and creates features
-
Model Training (runs weekly automatically)
- Trains XGBoost, LightGBM, Random Forest
- Performs bias detection and mitigation
- Promotes best model to production
-
Drift Monitoring (triggered after data pipeline)
- Checks for data drift using Evidently AI
- Triggers retraining if drift detected
# Trigger data pipeline
docker compose exec airflow-webserver airflow dags trigger data_pipeline_dag
# Trigger model training
docker compose exec airflow-webserver airflow dags trigger bluebikes_integrated_bias_training
# Trigger drift monitoring
docker compose exec airflow-webserver airflow dags trigger drift_monitoring_dag./stop-airflow.sh
# or
docker compose downOnce deployed to Cloud Run:
# Health check
curl https://your-cloud-run-url/health
# Get prediction
curl -X POST https://your-cloud-run-url/predict \
-H "Content-Type: application/json" \
-d '{"hour": 8, "day_of_week": 1, "temp_avg": 20}'
# Reload model
curl -X POST https://your-cloud-run-url/reloadDocker permission denied
sudo usermod -aG docker $USER
newgrp dockerAirflow webserver not starting
# Check logs
docker compose logs airflow-webserver
# Restart services
docker compose down && docker compose up -dDatabase connection issues
# Reset everything
docker compose down -v
docker compose up airflow-init
docker compose up -dOut of memory errors
# Increase Docker memory limit
# Docker Desktop → Settings → Resources → Memory: 8GB+- Check the Issues page
- Review Airflow logs:
docker compose logs -f - Check individual task logs in Airflow UI
| Model | R² Score | MAE | RMSE |
|---|---|---|---|
| XGBoost | 0.87 | 18.5 | 24.3 |
| LightGBM | 0.86 | 19.2 | 25.1 |
| Random Forest | 0.84 | 20.8 | 27.2 |
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Open a Pull Request
This project is licensed under the MIT License.
- Nikhil Anil Prakash
- Harsh Shah
- Ananya Hegde
- Pranav Viswanathan
- Gyula Planky
Built with ❤️ for Northeastern University MLOps Course (December 2025)

