ML Challenge 2025: Smart Product Pricing Solution

Team Name: Gradient Geeks
Team Members: Suchana Hazra, Siddharth Sen, Uttam Mahata, Anurag Ghosh
Submission Date: 13/10/25

Overview

This repository contains the solution for the Smart Product Pricing challenge.
Our approach leverages multimodal data — textual product descriptions, images, and structured metadata — to predict product prices accurately. The solution uses text embeddings, image embeddings, dimensionality reduction, and gradient boosting models.

Approach

Problem Understanding

Predict product prices using text, image, and structured features.
Text descriptions contain valuable pricing cues but need cleaning.
Images provide visual cues related to product quality and category.
Redundant or sparse features were removed to improve model performance.

Solution Strategy

Text Processing: Clean text → embed using MiniLM → PCA for dimensionality reduction.
Image Processing: Preprocess images → embed using pretrained CNN/CLIP → PCA.
Feature Fusion: Concatenate text embeddings, image embeddings, and structured features.
Regression Models: Fit ensemble models (LightGBM, XGBoost, CatBoost) to predict prices.

Model Architecture

Product Text → Text Cleaning → MiniLM Embedding → PCA → → Concatenate → GBM Regressor → Price Prediction Product Image → Preprocessing → CNN/CLIP Embedding → PCA → / Structured Features → Clean/Encode → Concatenate → GBM Regressor → Price Prediction

Features & Pipelines

Text Pipeline

Cleaning: regex, lowercasing, punctuation removal, stopword removal
Embedding: MiniLM (384-dimensional)
PCA: reduced to 128 dimensions

Image Pipeline

Preprocessing: resize, normalize, convert to tensor
Embedding: Pretrained CNN/CLIP (2048-dimensional)
PCA: reduced to 128 dimensions

Structured Features

Drop redundant features
Encode categorical variables (target encoding / label encoding)

Regression Models

Gradient Boosting (LightGBM, XGBoost, CatBoost)
Hyperparameter tuning via cross-validation

Performance

Metric	Score
SMAPE	0.047

Usage

Clone the repository: https://github.com/gradientgeeks/amazon-ml-challenge-2025/

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
sample_code.py		sample_code.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ML Challenge 2025: Smart Product Pricing Solution

Overview

Approach

Problem Understanding

Solution Strategy

Model Architecture

Features & Pipelines

Text Pipeline

Image Pipeline

Structured Features

Regression Models

Performance

Usage

About

Uh oh!

Releases

Packages

Languages

gradientgeeks/amazon-ml-challenge-2025

Folders and files

Latest commit

History

Repository files navigation

ML Challenge 2025: Smart Product Pricing Solution

Overview

Approach

Problem Understanding

Solution Strategy

Model Architecture

Features & Pipelines

Text Pipeline

Image Pipeline

Structured Features

Regression Models

Performance

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages