This project focuses on analyzing customer behavior using data from an e-commerce or retail dataset. The primary objectives are:
- Exploratory Data Analysis (EDA): Understand customer profiles, transaction patterns, and product details.
- Customer Segmentation: Group customers into distinct clusters based on spending, transaction frequency, recency, and signup year.
- Lookalike Modeling: Identify customers similar to high-value customers based on behavioral features.
The project uses Python with libraries like Pandas, NumPy, scikit-learn, Matplotlib, and Seaborn, executed in Jupyter Notebooks.
The dataset consists of three CSV files:
- Customers.csv: Contains customer profile information (e.g.,
CustomerID,SignupDate). - Transactions.csv: Includes transaction details (e.g.,
TransactionID,CustomerID,ProductID,TotalValue,TransactionDate). - Products.csv: Provides product information (e.g.,
ProductID).
These files are used across EDA, clustering, and lookalike modeling tasks.
The repository contains the following files:
- 1.Abutalha_Shaikh_EDA.pdf: Summary of exploratory data analysis findings.
- 2.Abutalha_Shaikh_EDA.ipynb: Jupyter Notebook with EDA code and visualizations.
- 3.Abutalha_Shaikh_Lookalike.csv: Output file with lookalike customer mappings for the first 20 customers.
- 4.Abutalha_Shaikh_Lookalike.ipynb: Notebook implementing the lookalike model using cosine similarity.
- 5.Abutalha_Shaikh_Clustering.pdf: Report summarizing customer segmentation results.
- 6.Abutalha_Shaikh_Clustering.ipynb: Notebook implementing KMeans clustering for customer segmentation.
- README.md: This file, providing an overview and instructions.
- Objective: Uncover insights into customer demographics, transaction behaviors, and product trends.
- Key Findings:
- High-value customers drive significant revenue.
- Certain products are frequently purchased, indicating popularity.
- Transaction frequency and recency vary across customers, suggesting diverse engagement levels.
- Notebook:
2.Abutalha_Shaikh_EDA.ipynb - Report:
1.Abutalha_Shaikh_EDA.pdf
- Objective: Segment customers into meaningful groups based on behavioral and demographic features.
- Features:
TotalSpending: Total amount spent by each customer.Frequency: Number of transactions.Recency: Days since the last transaction.SignupYear: Year of customer signup.
- Method: KMeans clustering with 4 clusters, selected based on the Davies-Bouldin Index (DBI = 1.2528).
- Results:
- Cluster 0: Higher spending ($4596.93 mean), moderate frequency (6.61), signed up in 2022.
- Cluster 1: Lowest spending ($1742.44), low frequency (2.80), older purchases (141.30 days), signed up ~2022.58.
- Cluster 2: Highest spending ($5875.09), highest frequency (7.55), recent purchases (45.97 days), signed up ~2023.45.
- Cluster 3: Moderate spending ($3043.83), moderate frequency (4.66), signed up ~2023.71.
- Notebook:
6.Abutalha_Shaikh_Clustering.ipynb - Report:
5.Abutalha_Shaikh_Clustering.pdf
- Objective: Identify customers similar to high-value customers for targeted marketing.
- Features:
total_spending: Sum of transaction values.transaction_count: Number of unique transactions.avg_spending: Average transaction value.
- Method: Cosine similarity to compute customer similarity scores.
- Output: Top 3 lookalike customers for the first 20 customers, with similarity scores (e.g.,
C0001has lookalikesC0137,C0152,C0121with scores ~0.999). - Notebook:
4.Abutalha_Shaikh_Lookalike.ipynb - Output File:
3.Abutalha_Shaikh_Lookalike.csv
To run the notebooks, ensure you have Python 3.8+ installed. Follow these steps:
-
Clone the repository:
git clone https://github.com/38832/CustomerSegmentation.git cd CustomerSegmentation -
Install dependencies:
pip install -r requirements.txt
Create a
requirements.txtfile with:pandas numpy scikit-learn matplotlib seaborn jupyter -
Ensure the dataset files (
Customers.csv,Transactions.csv,Products.csv) are in the same directory as the notebooks. -
Launch Jupyter Notebook:
jupyter notebook
-
EDA:
- Open
2.Abutalha_Shaikh_EDA.ipynbto explore the data. - Run all cells to generate visualizations and statistics.
- Refer to
1.Abutalha_Shaikh_EDA.pdffor a summary of insights.
- Open
-
Customer Segmentation:
- Open
6.Abutalha_Shaikh_Clustering.ipynb. - Run the notebook to perform KMeans clustering and visualize clusters.
- Check the cluster summary and DBI (1.2528) in the output.
- See
5.Abutalha_Shaikh_Clustering.pdffor a detailed report.
- Open
-
Lookalike Modeling:
- Open
4.Abutalha_Shaikh_Lookalike.ipynb. - Run the notebook to compute cosine similarity and generate lookalike mappings.
- The output is saved as
Lookalike.csv(see3.Abutalha_Shaikh_Lookalike.csvfor results).
- Open
- EDA: Identified high-value customers and popular products, guiding segmentation and lookalike tasks.
- Segmentation: Formed 4 clusters with distinct spending and engagement patterns, enabling targeted marketing strategies.
- Lookalike Modeling: Produced high-similarity scores (often >0.99), indicating strong behavioral matches for the first 20 customers.
- EDA: Incorporate additional features like product categories or customer demographics.
- Segmentation: Test other clustering algorithms (e.g., DBSCAN, hierarchical clustering) or optimize the number of clusters using silhouette scores.
- Lookalike Modeling: Include more features (e.g., product preferences) or expand to all customers.
For questions or contributions, please contact [Your Name/Email] or open an issue on the repository.