An advanced machine learning project that predicts major delays (>5 minutes) in the Toronto TTC subway system and provides route optimization with interactive map visualization.
- Delay Prediction: Predicts major delays using machine learning
- Route Optimization: Finds the best routes considering delay probabilities
- Interactive Map: Visualizes stations with delay risk indicators
- Real-time Predictions: Get instant delay probability for any station
- Station Map: Interactive map showing all TTC stations
- Risk Indicators: Color-coded markers (green/orange/red) based on delay probability
- Station Details: Click markers to see delay probability and station info
- Route Visualization: See optimized routes on the map
- Multi-route Planning: Compare different route options
- Delay Risk Assessment: Routes ranked by total delay risk
- Time Preferences: Optimize for rush hour, off-peak, or any time
- Transfer Optimization: Smart transfer point recommendations
- FastAPI Backend: High-performance REST API
- Machine Learning: RandomForestClassifier with 85% accuracy
- Interactive Web UI: Modern, responsive interface
- GitHub Pages Ready: Static deployment support
ttc-predict/
├── main.py # FastAPI application with web interface
├── train_model.py # Model training script
├── requirements.txt # Python dependencies
├── model_training.ipynb # Jupyter notebook with full ML pipeline
├── random_forest_model_new_task.pkl # Trained ML model
├── label_encoders_new_task.pkl # Feature encoders
├── docs/index.html # Web interface
├── .github/workflows/deploy.yml # GitHub Actions deployment
├── README.md # This file
└── LICENSE # MIT License
-
Dataset: Toronto Open Data – TTC Subway Delay Data
-
Target variable:
MajorDelay0= No major delay (≤5 minutes)1= Major delay (>5 minutes)
-
Input features:
Line– Subway line (e.g., YU, BD)Station– Station nameCode– Delay cause codeDayOfWeek– Numeric day of week (0 = Monday, 6 = Sunday)
-
Model used: RandomForestClassifier
-
Feature importance (sample result):
Code– 41.5%Station– 38.6%DayOfWeek– 17.2%Line– 2.5%
- Clone the Repository
git clone https://github.com/DanielDemoz/ttc-predict.git
cd ttc-predict- Install Dependencies
py -m pip install -r requirements.txt- Train the Model (if needed)
py train_model.py- Run the Application
py -m uvicorn main:app --reload- Access the Web Interface
- Main App: http://127.0.0.1:8000
- API Docs: http://127.0.0.1:8000/docs
- Health Check: http://127.0.0.1:8000/health
- Generate Static Files
py deploy.py- Deploy to GitHub Pages
- Push to main branch
- GitHub Actions will automatically deploy
- Access at:
https://danieldemoz.github.io/ttc-predict
POST /predict
{
"Line": "YU",
"Station": "UNION STATION",
"Code": "MUIS",
"DayOfWeek": 0
}Response:
{
"prediction": 1,
"probability": 0.75,
"input": { ... }
}POST /route/optimize
{
"start_station": "UNION STATION",
"end_station": "FINCH",
"day_of_week": 0,
"time_preference": "rush_hour"
}Response:
{
"routes": [
{
"stations": ["UNION STATION", "FINCH"],
"total_delay_risk": 0.15,
"estimated_time": 25
}
]
}GET /stations/predictions
Returns delay probabilities for all stations with coordinates.
GET /health
Returns API status and model loading information.
fastapi
uvicorn[standard]
pandas
scikit-learn
joblib
numpy
requests
matplotlib
seabornInstall with:
py -m pip install -r requirements.txt- Real-time Visualization: See all TTC stations on an interactive map
- Risk Indicators: Color-coded markers show delay probability
- Green: Low risk (< 10%)
- Orange: Medium risk (10-30%)
- Red: High risk (> 30%)
- Station Details: Click any marker for detailed information
- Smart Routing: Considers delay probabilities when planning routes
- Multiple Options: Compare different route alternatives
- Time Preferences: Optimize for rush hour, off-peak, or any time
- Transfer Points: Intelligent transfer station recommendations
- Model: RandomForestClassifier with 85% accuracy
- Features: Line, Station, Code, DayOfWeek
- Prediction: Major delay probability (>5 minutes)
- Real-time: Instant predictions for any station/condition
- Data Collection: Toronto Open Data API
- Data Processing: Cleaning, feature engineering, encoding
- Model Training: RandomForestClassifier with cross-validation
- API Development: FastAPI with interactive web interface
- Deployment: GitHub Pages with automated CI/CD
- Accuracy: 85%
- Precision: 89% (No delay), 44% (Major delay)
- Recall: 94% (No delay), 31% (Major delay)
- Feature Importance:
- Code: 41.6%
- Station: 38.7%
- DayOfWeek: 17.2%
- Line: 2.5%
py -m uvicorn main:app --reloadpy deploy.py
git add .
git commit -m "Deploy to GitHub Pages"
git push origin mainFROM python:3.9-slim
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]This project is open-source and available under the MIT License for educational and research purposes.