predict_customer_churn_using_Machine_Learning

This project aims to identify bank customers who are likely to "churn" (leave the bank) by analyzing their demographic and financial profiles.

Project Overview

The notebook Churn_pipeline.ipynb follows a comprehensive data science workflow, from raw data ingestion to model performance evaluation. It utilizes the Churn_Modelling.csv dataset, which contains records of 10,000 customers.

Dataset Description

The dataset includes several features that influence a customer's decision to stay or leave:

1-Customer Profiles : Age, Gender, Geography (France, Germany, Spain).

2-Financial Metrics: Credit Score, Bank Balance, Estimated Salary.

3-Bank Relationship: Tenure (years with bank), Number of products, Active membership status, and Credit card ownership.

4-Target Variable: Exited (1 if the customer left, 0 if they stayed).

Data Cleaning & Preprocessing

The pipeline includes rigorous cleaning steps to ensure data quality: 1-Duplicate Removal: Identified and removed 2 duplicate entries.

2-Missing Values: Dropped rows with null values in critical columns like Geography and Age.

3-Feature Engineering:

-Categorical variables (Gender, Geography) were transformed using One-Hot Encoding.

-Numerical features (CreditScore, Age, Balance, EstimatedSalary) were normalized using StandardScaler to improve model convergence.

4-Data Split: The processed data was split into training and testing sets (typically 70/30).

Models and Performance

-Several classification algorithms were trained and compared. The notebook highlights a significant class imbalance in the target variable, which affects certain model metrics like the F1-score.

Model---------------------Accuracy---------- ROC-AUC Score

Random Forest--------------0.89-----------------0.96

K-Nearest Neighbors--------0.85-----------------0.92

Decision Tree--------------0.75-----------------0.83

Gaussian Naive Bayes-------0.75-----------------0.82

Logistic Regression--------0.74-----------------0.81

Support Vector Machine-----0.74-----------------0.81

Conclusion

The Random Forest Classifier is the best-performing model in this pipeline, achieving the highest accuracy and the best ability to distinguish between churners and non-churners (indicated by the 0.96 ROC-AUC). While KNN performs well after tuning, simpler linear models like Logistic Regression struggle to capture the non-linear complexities in this specific dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Churn_Modelling1.csv		Churn_Modelling1.csv
Churn_pipeline.ipynb		Churn_pipeline.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

predict_customer_churn_using_Machine_Learning

Project Overview

Dataset Description

Data Cleaning & Preprocessing

Models and Performance

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

predict_customer_churn_using_Machine_Learning

Project Overview

Dataset Description

Data Cleaning & Preprocessing

Models and Performance

Conclusion

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages