AutoTrader Price Regression

Predicting second‑hand car prices with classic tabular ML.
Data 402 006 rows · 12 columns (target =price).
Models Linear Regression · Random Forest · Gradient Boosting · Voting Ensemble.

Project motivation

Buying a used car is a price‑sensitive decision.
The goal is to build transparent, reproducible baselines that predict price given mileage, age, fuel type and a handful of categorical descriptors.
Grades in the coursework are not the focus; clean code and solid discussion are.

Data

Source AutoTrader extract supplied by Manchester Metropolitan University.
The licence prohibits redistribution, so the CSV is not committed to this repository.
Rows 402 006 Columns 12 (all except price used as predictors).
Cleaning steps
- Trim outliers in mileage & price via 1.5 × IQR.
- Drop cars registered before 1975.
- Mode‑impute gaps in fuel_type, body_type, standard_colour.
Engineered features
- vehicle_age = 2024 – year_of_registration
- mileage_to_age_ratio = mileage / vehicle_age

See notebooks/01_autotrader_walkthrough.ipynb for the exact code.

Quick start

# clone repo
git clone https://github.com/hamzahassan9320/autotrader-price-regression.git
cd autotrader-price-regression

# place the CSV in the expected location
mkdir -p data
cp /path/to/Adverts.csv data/

# set up environment
conda create -n autotrader-price python=3.10
conda activate autotrader-price
pip install -r requirements.txt

# full pipeline
python -m src.train --csv data/Adverts.csv

# run the Streamlit app locally
streamlit run app.py

Tested with Python 3.10 and scikit‑learn 1.3.2.

4 · Notebook & code guide

file	purpose
`notebooks/01_autotrader_walkthrough.ipynb`	data snapshot, EDA, demos
`src/data.py`	load + cleanse CSV
`src/features.py`	feature engineering & preprocessing
`src/models.py`	pipelines · param grids · grid‑search helper
`src/train.py`	one‑shot CLI training run; saves models & plots
`src/visualise.py`	regenerates figures in `docs/images/`

5 · Results at a glance

model	CV MAE ↓	Test R²
Linear Regression	1 642 ± 394	0.79
Random Forest	1 831 ± 51	0.90
Gradient Boosting	2 742 ± 95	0.87
Voting Ensemble	1 894 ± 44	0.89

Random Forest brings the best MAE and R² without visible over‑fit.

6 · Model interpretation

SHAP beeswarm → global drivers (top features: vehicle_age, mileage).
SHAP waterfall → why a single advert (row 39) is priced ± £9 k.
Partial dependence → price drops near‑linearly with age; flattening after ~15 yrs hints at a market floor.

All figures live in docs/images/, regenerated by src/visualise.py.

7 · Directory layout

.
├── data/                # <empty> – you add Adverts.csv locally
├── notebooks/           # single exploratory notebook
├── src/                 # reusable code
├── configs/             # YAML config(s)
├── docs/images/         # plots for README
└── requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AutoTrader Price Regression

Table of contents

Project motivation

Data

Quick start

4 · Notebook & code guide

5 · Results at a glance

6 · Model interpretation

7 · Directory layout

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
docs/images		docs/images
models		models
notebook		notebook
src		src
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

HamzaHassan9320/autotrader-price-regression

Folders and files

Latest commit

History

Repository files navigation

AutoTrader Price Regression

Table of contents

Project motivation

Data

Quick start

4 · Notebook & code guide

5 · Results at a glance

6 · Model interpretation

7 · Directory layout

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages