Skip to content

End-to-end machine learning pipeline for trip duration prediction with feature engineering, regression models, and automated evaluation.

License

harshitaphadtare/GoPredict

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

GoPredict - Machine Learning Pipeline for Trip Duration Prediction

A comprehensive machine learning pipeline for predicting trip durations using various regression models, feature engineering, and hyperparameter optimization.

Medium post: https://medium.com/@hphadtare02/how-machine-learning-predicts-trip-duration-just-like-uber-zomato-91f7db6e9ce9

πŸ“ Project Structure

GoPredict/
β”œβ”€β”€ main.py                          # Main runner script
β”œβ”€β”€ start_api.py                     # API server startup script
β”œβ”€β”€ test_api.py                      # API testing script
β”œβ”€β”€ config.py                        # Project configuration
β”œβ”€β”€ requirements.txt                  # Python dependencies
β”œβ”€β”€ README.md                        # This file
β”œβ”€β”€ CONTRIBUTING.md                  # Development and integration guide
β”œβ”€β”€ CODE_OF_CONDUCT.md               # Code of conduct and security
β”‚
β”œβ”€β”€ api/                            # FastAPI backend
β”‚   └── main.py                     # FastAPI application
β”‚
β”œβ”€β”€ frontend/                       # React frontend
β”‚   └── src/
β”‚       └── lib/
β”‚           └── api.ts              # API client library
β”‚
β”œβ”€β”€ data/                            # Data directory
β”‚   β”œβ”€β”€ raw/                         # Raw data files
β”‚   β”‚   β”œβ”€β”€ train.csv               # Training data
β”‚   β”‚   └── test.csv                # Test data
β”‚   β”œβ”€β”€ processed/                   # Processed data files
β”‚   β”‚   β”œβ”€β”€ feature_engineered_train.csv
β”‚   β”‚   β”œβ”€β”€ feature_engineered_test.csv
β”‚   β”‚   └── gmapsdata/              # Google Maps data
β”‚   └── external/                    # External data sources
β”‚       └── precipitation.csv       # Weather data
β”‚
β”œβ”€β”€ src/                            # Source code
β”‚   β”œβ”€β”€ model/                      # Model-related modules
β”‚   β”‚   β”œβ”€β”€ models.py              # All ML models and pipeline
β”‚   β”‚   β”œβ”€β”€ evaluation.py          # Model evaluation functions
β”‚   β”‚   └── save_models.py         # Model persistence
β”‚   β”œβ”€β”€ features/                   # Feature engineering modules
β”‚   β”‚   β”œβ”€β”€ distance.py            # Distance calculations
β”‚   β”‚   β”œβ”€β”€ geolocation.py         # Geographic features
β”‚   β”‚   β”œβ”€β”€ gmaps.py               # Google Maps integration
β”‚   β”‚   β”œβ”€β”€ precipitation.py       # Weather features
β”‚   β”‚   β”œβ”€β”€ time.py                # Time-based features
β”‚   β”‚   └── weather_api.py         # Weather API integration
β”‚   β”œβ”€β”€ feature_pipe.py            # Feature engineering pipeline
β”‚   β”œβ”€β”€ data_preprocessing.py      # Data preprocessing
β”‚   └── complete_pipeline.py       # Complete ML pipeline
β”‚
β”œβ”€β”€ notebooks/                      # Jupyter notebooks
β”‚   β”œβ”€β”€ 01_EDA.ipynb               # Exploratory Data Analysis
β”‚   β”œβ”€β”€ 02_Feature_Engineering.ipynb # Feature engineering
β”‚   β”œβ”€β”€ 03_Model_Training.ipynb    # Model training
β”‚   β”œβ”€β”€ figures/                   # Generated plots
β”‚   └── gmaps/                     # Interactive maps
β”‚
β”œβ”€β”€ saved_models/                   # Trained models (auto-created)
β”œβ”€β”€ output/                         # Predictions and submissions (auto-created)
└── logs/                          # Log files (auto-created)

πŸš€ Quick Start

1. Installation

# Clone the repository
git clone <your-repo-url>
cd GoPredict

# Install dependencies
pip install -r requirements.txt

# Create necessary directories
mkdir -p logs output saved_models

2. API Server

Start the FastAPI server to connect your frontend with ML models:

# Start the API server
python start_api.py

# Test the API
python test_api.py

# View API documentation
# Visit http://localhost:8000/docs

3. Frontend Development

# Install frontend dependencies
cd frontend
npm install

# Start development server
npm run dev

πŸ”Œ API Documentation

The GoPredict API provides REST endpoints for machine learning-based trip duration prediction using FastAPI.

Quick API Start

# Start the API server
python start_api.py

# Or with custom options
python start_api.py --host 0.0.0.0 --port 8000 --reload

API Access Points

Core API Endpoints

Weather API

GET /weather - Get weather data for a specific location and time

Parameters:

  • latitude (float): Latitude coordinate
  • longitude (float): Longitude coordinate
  • timestamp (str): ISO format timestamp (e.g., "2016-01-01T17:00:00")

Example:

curl "http://localhost:8000/weather?latitude=40.767937&longitude=-73.982155&timestamp=2016-01-01T17:00:00"

Response:

{
  "success": true,
  "data": {
    "temp": 5.0,
    "humidity": 53.0,
    "pressure": 1013.25
  },
  "location": { "latitude": 40.767937, "longitude": -73.982155 },
  "timestamp": "2016-01-01T17:00:00"
}

Distance Calculation API

POST /distance - Calculate Manhattan and/or Euclidean distances

Parameters:

  • start_lat (float): Starting latitude
  • start_lng (float): Starting longitude
  • end_lat (float): Ending latitude
  • end_lng (float): Ending longitude
  • method (str): "manhattan", "euclidean", or "both" (default: "both")

Example:

curl -X POST "http://localhost:8000/distance" \
  -H "Content-Type: application/json" \
  -d '{
    "start_lat": 40.767937,
    "start_lng": -73.982155,
    "end_lat": 40.748817,
    "end_lng": -73.985428,
    "method": "both"
  }'

Time Features API

POST /time-features - Extract time-based features from datetime

Parameters:

  • datetime_str (str): ISO format datetime string

Example:

curl -X POST "http://localhost:8000/time-features" \
  -H "Content-Type: application/json" \
  -d '{"datetime_str": "2016-01-01T17:00:00"}'

Prediction API

POST /predict - Predict trip duration using ML models

Parameters (JSON Body):

{
  "from": {
    "lat": 40.767937,
    "lon": -73.982155
  },
  "to": {
    "lat": 40.748817,
    "lon": -73.985428
  },
  "startTime": "2016-01-01T17:00:00",
  "city": "new_york",
  "model_name": "XGBoost"
}

Response:

{
  "minutes": 5.2,
  "confidence": 0.75,
  "model_version": "XGBoost",
  "distance_km": 2.1,
  "city": "new_york"
}

Model Management API

GET /models - List available trained models GET /models/{model_name} - Get specific model information POST /models/train - Train models in background

Example:

# List models
curl "http://localhost:8000/models"

# Train models
curl -X POST "http://localhost:8000/models/train" \
  -H "Content-Type: application/json" \
  -d '{"models_to_run": ["XGBoost", "Random Forest"]}'

Health & Status API

GET /health - Health check endpoint GET /status - Detailed API status

Frontend Integration

The frontend uses the API client in frontend/src/lib/api.ts:

import { predictTravelTime } from "@/lib/api";

// Example usage
const prediction = await predictTravelTime({
  from: { lat: 40.767937, lon: -73.982155 },
  to: { lat: 40.748817, lon: -73.985428 },
  startTime: "2016-01-01T17:00:00",
  city: "new_york",
});

🎯 ML Pipeline Usage

Simple Pipeline (Default)

python main.py

Runs the complete end-to-end pipeline:

  • Data preprocessing - Loads and cleans raw data
  • Feature engineering - Adds distance, time, cluster, and weather features
  • Model training - Trains all specified models
  • Model evaluation - Compares model performance
  • Prediction generation - Creates submission files

Custom Models

python main.py --models XGB,RF

Train only specific models.

With Hyperparameter Tuning

python main.py --tune-xgb

Enable XGBoost hyperparameter tuning.

πŸ“ˆ Output Files

Predictions

  • output/[model_name]/test_prediction_YYYYMMDD_HHMMSS.csv
  • Ready-to-submit prediction files with timestamps

Models

  • saved_models/[model_name]_YYYYMMDD_HHMMSS.pkl
  • Trained models with metadata

Logs

  • logs/main.log - Complete pipeline execution log
  • Detailed progress tracking and metrics

Visualizations

  • output/prediction_comparison_YYYYMMDD_HHMMSS.png
  • Model comparison plots
  • Feature importance plots

πŸ”§ Configuration

Edit config.py to customize:

  • Model parameters
  • Data paths
  • Output directories
  • Hyperparameter tuning ranges
  • Logging settings

πŸ“ Usage Examples

Basic Usage

from src.model.models import run_complete_pipeline
import pandas as pd

# Load data
train_df = pd.read_csv('data/processed/feature_engineered_train.csv')
test_df = pd.read_csv('data/processed/feature_engineered_test.csv')

# Run complete pipeline
results = run_complete_pipeline(
    train_df=train_df,
    test_df=test_df,
    models_to_run=['LINREG', 'RIDGE', 'XGB'],
    tune_xgb=True,
    create_submission=True
)

Individual Components

from src.model.models import run_regression_models, predict_duration, to_submission

# Train models
models = run_regression_models(train_df, ['XGB', 'RF'])

# Make predictions
predictions = predict_duration(models['XGBoost'], test_df)

# Create submission
submission = to_submission(predictions, test_df)
submission.to_csv('my_submission.csv', index=False)

πŸ§ͺ Testing

API Testing

# Run comprehensive API tests
python test_api.py

Frontend Testing

cd frontend
npm run test
npm run test:coverage

πŸ“Š Available Models

  • LINREG - Linear Regression
  • RIDGE - Ridge Regression
  • LASSO - Lasso Regression
  • SVR - Support Vector Regression
  • XGB - XGBoost
  • RF - Random Forest
  • NN - Neural Network

🀝 Contributing

See CONTRIBUTING.md for development guidelines and frontend integration details.

πŸ“‹ Code of Conduct

See CODE_OF_CONDUCT.md for our community guidelines and security policies.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

About

End-to-end machine learning pipeline for trip duration prediction with feature engineering, regression models, and automated evaluation.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 15