💳 Credit Risk Prediction

A Comparative Study of Classification Models and WOE-Based Feature Transformation

This project focuses on predicting whether a client will experience serious delinquency within the next two years using classification models. The dataset was preprocessed to ensure high data quality, including imputation, outlier treatment, and transformation using Weight of Evidence (WOE) to enhance interpretability and model effectiveness—especially for logistic regression.

Three machine learning models were evaluated to identify the best-performing approach in terms of accuracy, recall, and business relevance.

🧠 Models Compared

Logistic Regression (with and without WOE)
Random Forest Classifier
XGBoost Classifier

🧩 Feature Strategy

Raw Features – Cleaned but untransformed dataset
WOE-Transformed Features – Variables transformed using Weight of Evidence to support interpretability and improve logistic regression performance

🔍 Key Findings

Random Forest delivered the best balance of performance metrics:
- ROC AUC: 0.8569
- Recall: 0.72
- Precision: 0.22
Logistic Regression performed well with WOE-transformed features:
- Recall: 0.70
- Precision: 0.19
- Enabled creation of an interpretable scorecard
XGBoost had the highest precision for non-delinquents but a low recall (0.16), making it less suitable for minimizing false negatives.
Key predictors across models included:
- Revolving Utilization of Unsecured Lines
- Past Due Counts (30–59, 60–89, 90+)
- Age

🛠️ Tools & Libraries Used

Python, Jupyter Notebook
pandas, numpy, seaborn, matplotlib
scikit-learn, XGBoost
WOE Binning tools, scikit-plot

📁 Repository Structure

credit-risk-prediction/

data/ # Raw dataset
notebooks/ # Model training and evaluation (LR, RF, XGBoost)
results/ # Evaluation metrics, ROC curves, confusion matrices, scorecard
README.md # Project overview

📌 Conclusion

This project demonstrates that:

Random Forest is the most effective model for credit delinquency prediction, offering high recall and balanced precision.
Logistic Regression with WOE remains highly interpretable and practical for deployment via scorecards.
Combining Random Forest's accuracy with Logistic Regression's explainability provides a strong, business-ready solution for credit scoring systems.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
credit-score-classification-python-code.ipynb		credit-score-classification-python-code.ipynb
credit-score-result.pdf		credit-score-result.pdf
credit_score-raw-data.csv		credit_score-raw-data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

💳 Credit Risk Prediction

A Comparative Study of Classification Models and WOE-Based Feature Transformation

🧠 Models Compared

🧩 Feature Strategy

🔍 Key Findings

🛠️ Tools & Libraries Used

📁 Repository Structure

📌 Conclusion

About

Uh oh!

Releases

Packages

Languages

Leyan0109/Python_Classification-Model-Credit-Risk

Folders and files

Latest commit

History

Repository files navigation

💳 Credit Risk Prediction

A Comparative Study of Classification Models and WOE-Based Feature Transformation

🧠 Models Compared

🧩 Feature Strategy

🔍 Key Findings

🛠️ Tools & Libraries Used

📁 Repository Structure

📌 Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages