Skip to content

gradientgeeks/amazon-ml-challenge-2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML Challenge 2025: Smart Product Pricing Solution

Team Name: Gradient Geeks
Team Members: Suchana Hazra, Siddharth Sen, Uttam Mahata, Anurag Ghosh
Submission Date: 13/10/25


Overview

This repository contains the solution for the Smart Product Pricing challenge.
Our approach leverages multimodal data — textual product descriptions, images, and structured metadata — to predict product prices accurately. The solution uses text embeddings, image embeddings, dimensionality reduction, and gradient boosting models.


Approach

Problem Understanding

  • Predict product prices using text, image, and structured features.
  • Text descriptions contain valuable pricing cues but need cleaning.
  • Images provide visual cues related to product quality and category.
  • Redundant or sparse features were removed to improve model performance.

Solution Strategy

  • Text Processing: Clean text → embed using MiniLM → PCA for dimensionality reduction.
  • Image Processing: Preprocess images → embed using pretrained CNN/CLIP → PCA.
  • Feature Fusion: Concatenate text embeddings, image embeddings, and structured features.
  • Regression Models: Fit ensemble models (LightGBM, XGBoost, CatBoost) to predict prices.

Model Architecture

Product Text → Text Cleaning → MiniLM Embedding → PCA → → Concatenate → GBM Regressor → Price Prediction Product Image → Preprocessing → CNN/CLIP Embedding → PCA → / Structured Features → Clean/Encode → Concatenate → GBM Regressor → Price Prediction


Features & Pipelines

Text Pipeline

  • Cleaning: regex, lowercasing, punctuation removal, stopword removal
  • Embedding: MiniLM (384-dimensional)
  • PCA: reduced to 128 dimensions

Image Pipeline

  • Preprocessing: resize, normalize, convert to tensor
  • Embedding: Pretrained CNN/CLIP (2048-dimensional)
  • PCA: reduced to 128 dimensions

Structured Features

  • Drop redundant features
  • Encode categorical variables (target encoding / label encoding)

Regression Models

  • Gradient Boosting (LightGBM, XGBoost, CatBoost)
  • Hyperparameter tuning via cross-validation

Performance

Metric Score
SMAPE 0.047

Usage

  1. Clone the repository: https://github.com/gradientgeeks/amazon-ml-challenge-2025/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages