- I built this Multi-Modal Vehicle Intelligence Platform that pulls together vehicle data from different sources and gives a clean service record.
- It mixes computer vision for spotting vehicles and damage, an LLM to understand what the customer really wants, some data engineering to tie in metadata, and FastAPI to make it all run super fast in real-time.
Main idea and Business requirement- Classify vehicle image (like from CCTV), the customer's text request, detects the damages, then output structured like below:
{
"vehicle_type": "SUV",
"detected_damage": ["scratch","dent"],
"customer_intent": "insurance claim",
"service_priority": "high"
}FastAPI (main.py, routes.py) serves as the central hub connecting all components:
Computer Vision Pipeline:
models/vehicle_model.pth(PyTorch ResNet18 for classification)detect_damage.py(lightweight damage detection logic using image-based features)models/vehicle_model.py(vehicle type detection)
LLM Pipeline:
models/service_model.py(Groq API for customer intent)
Data Processing:
data/metadata.py(Pandas + Kaggle CarDekho data)fusion_service.py(multi-modal logic + priority scoring)
API Endpoint: POST /analyze_vehicle
- Image Input → vehicle_model.py + detect_damage.py → vehicle_type & damage list
- Text Input → service_model.py (Groq LLM) → customer_intent
- Metadata → data/metadata.py → enrichment
- Fusion → fusion_service.py → final JSON output
I used ResNet18 with transfer learning, trained it on the Kaggle Vehicle Classification dataset (~5600 images), and tweaked the last layers for my custom classes.
Got a basic pipeline going with the Kaggle Car Damage Detection dataset.
Since the dataset didn’t have fully structured labels for direct classification, I implemented a lightweight, feature-based approach:
- The image is converted to grayscale and resized
- A gradient-based score is calculated to capture pixel intensity changes
- Higher variation generally indicates surface irregularities like dents or scratches
Based on a threshold:
- Higher score → “possible dent/scratch”
- Lower score → “no visible damage”
This acts as a baseline damage detection system, and the pipeline is designed to easily plug in advanced models like YOLO later.
This part runs on Groq's LLM API—it figures out if the customer's after insurance, repairs, or just general stuff.
Pulls together the image analysis, text intent, and some business rules to make that JSON output.
- Real-time endpoint at
POST /analyze_vehicle. - Handles full pipeline inference in a single request.
- Vehicle Classification: Kaggle - ~5600 images across types
- Damage Detection: Kaggle Car Damage - dent/scratch/shatter classes
- Customer Intent: Kaggle customer support data for the logic
- Vehicle Metadata: Kaggle CarDekho - fuel type, year, ownership, price
- Python
- FastAPI
- PyTorch (for ResNet18)
- Groq LLM API
- Pandas for metadata
- Clone the repo:
git clone <repo_link> && cd <project_folder> pip install -r requirements.txtuvicorn main:app --reload- Hit the Swagger UI at http://127.0.0.1:8000/docs
Upload a vehicle pic, type in the customer request, and it processes:
- CV for vehicle type
- image-based logic for damage detection
- LLM for intent
- metadata enrichment
- final structured output
- Upgrade to full YOLO damage detection
- Fine-tune models on bigger datasets
- Dockerize and deploy to the cloud
- Add a dashboard UI
- End-to-end pipeline
- Multi-modal reasoning
- Real-time API
- Transfer learning for fast development
- Modular design for easy upgrades
Made by: Sukanya