The goal of this project is to build a machine learning model that can predict whether a customer is likely to churn (leave a service) based on their historical data. This helps businesses proactively identify at-risk customers.
This project includes an interactive web application built with Streamlit. You can upload your own dataset (in the same format as the Telco dataset) and the app will automatically train the models and display the performance results.
- Make sure you have Python and the required libraries installed:
pip install streamlit pandas scikit-learn imblearn matplotlib seaborn
- Clone or download this repository to your local machine.
- Navigate to the project folder in your terminal and run the following command:
python -m streamlit run app.py
I followed these steps to build the prediction models:
- Data Loading & Cleaning: Loaded the Telco Customer Churn dataset and cleaned it by converting data types and handling missing values.
- Exploratory Data Analysis (EDA): Analyzed the data to understand the features and confirmed the class imbalance in the churn variable.
- Data Preprocessing: Encoded categorical features into numerical format and scaled the data using
StandardScaler. - Handling Imbalance: Used the SMOTE (Synthetic Minority Over-sampling Technique) on the training data to create a balanced dataset for the models to learn from.
- Modeling: Trained and evaluated three different classification models:
- Logistic Regression
- Decision Tree
- Simple Neural Network (MLPClassifier)
- Evaluation: Assessed the models based on precision, recall, and F1-score, focusing on the recall for the "Churn" class as the key performance metric.
The models were evaluated on an unseen test set. The Logistic Regression model performed the best, achieving a recall of 63% for the churn class. This means it was the most effective model at correctly identifying the customers who were actually going to churn.
| Model | Recall (for Churn) |
|---|---|
| Logistic Regression | 0.63 |
| Neural Network | 0.53 |
| Decision Tree | 0.53 |
- Python
- Pandas
- NumPy
- Scikit-learn
- Matplotlib & Seaborn
- Imblearn (for SMOTE)
- Jupyter Notebook