Multi-Modal Vehicle Intelligence Platform

I built this Multi-Modal Vehicle Intelligence Platform that pulls together vehicle data from different sources and gives a clean service record.
It mixes computer vision for spotting vehicles and damage, an LLM to understand what the customer really wants, some data engineering to tie in metadata, and FastAPI to make it all run super fast in real-time.

Main idea and Business requirement- Classify vehicle image (like from CCTV), the customer's text request, detects the damages, then output structured like below:

{
   "vehicle_type": "SUV",
   "detected_damage": ["scratch","dent"],
   "customer_intent": "insurance claim",
   "service_priority": "high"
}

System Architecture

FastAPI (main.py, routes.py) serves as the central hub connecting all components:

Computer Vision Pipeline:

models/vehicle_model.pth (PyTorch ResNet18 for classification)
detect_damage.py (lightweight damage detection logic using image-based features)
models/vehicle_model.py (vehicle type detection)

LLM Pipeline:

models/service_model.py (Groq API for customer intent)

Data Processing:

data/metadata.py (Pandas + Kaggle CarDekho data)
fusion_service.py (multi-modal logic + priority scoring)

API Endpoint: POST /analyze_vehicle

Data Pipeline

Image Input → vehicle_model.py + detect_damage.py → vehicle_type & damage list
Text Input → service_model.py (Groq LLM) → customer_intent
Metadata → data/metadata.py → enrichment
Fusion → fusion_service.py → final JSON output

What It Does

Vehicle Classification

I used ResNet18 with transfer learning, trained it on the Kaggle Vehicle Classification dataset (~5600 images), and tweaked the last layers for my custom classes.

Damage Detection

Got a basic pipeline going with the Kaggle Car Damage Detection dataset.

Since the dataset didn’t have fully structured labels for direct classification, I implemented a lightweight, feature-based approach:

The image is converted to grayscale and resized
A gradient-based score is calculated to capture pixel intensity changes
Higher variation generally indicates surface irregularities like dents or scratches

Based on a threshold:

Higher score → “possible dent/scratch”
Lower score → “no visible damage”

This acts as a baseline damage detection system, and the pipeline is designed to easily plug in advanced models like YOLO later.

Customer Intent

This part runs on Groq's LLM API—it figures out if the customer's after insurance, repairs, or just general stuff.

Multi-Modal Fusion

Pulls together the image analysis, text intent, and some business rules to make that JSON output.

FastAPI Setup

Real-time endpoint at POST /analyze_vehicle.
Handles full pipeline inference in a single request.

Datasets I Used

Vehicle Classification: Kaggle - ~5600 images across types
Damage Detection: Kaggle Car Damage - dent/scratch/shatter classes
Customer Intent: Kaggle customer support data for the logic
Vehicle Metadata: Kaggle CarDekho - fuel type, year, ownership, price

Tech Stack

Python
FastAPI
PyTorch (for ResNet18)
Groq LLM API
Pandas for metadata

How to Run It

Clone the repo: git clone <repo_link> && cd <project_folder>
pip install -r requirements.txt
uvicorn main:app --reload
Hit the Swagger UI at http://127.0.0.1:8000/docs

Sample Workflow

Upload a vehicle pic, type in the customer request, and it processes:

CV for vehicle type
image-based logic for damage detection
LLM for intent
metadata enrichment
final structured output

Next Steps -Further Enhancements

Upgrade to full YOLO damage detection
Fine-tune models on bigger datasets
Dockerize and deploy to the cloud
Add a dashboard UI

It nutshell includes:

End-to-end pipeline
Multi-modal reasoning
Real-time API
Transfer learning for fast development
Modular design for easy upgrades

Made by: Sukanya

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
api		api
models		models
services		services
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
train_vehicle.py		train_vehicle.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Modal Vehicle Intelligence Platform

System Architecture

Data Pipeline

What It Does

Vehicle Classification

Damage Detection

Customer Intent

Multi-Modal Fusion

FastAPI Setup

Datasets I Used

Tech Stack

How to Run It

Sample Workflow

Next Steps -Further Enhancements

It nutshell includes:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-Modal Vehicle Intelligence Platform

System Architecture

Data Pipeline

What It Does

Vehicle Classification

Damage Detection

Customer Intent

Multi-Modal Fusion

FastAPI Setup

Datasets I Used

Tech Stack

How to Run It

Sample Workflow

Next Steps -Further Enhancements

It nutshell includes:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages