Credit Risk Analysis (Default Prediction)

This project looks at consumer credit data to understand what factors are associated with loan default and how different modeling choices affect risk decisions.

The goal is not to build the most complex model possible, but to show a realistic analytics workflow: exploring the data, building interpretable models, and evaluating tradeoffs that matter in a credit setting.

Problem

Lenders need to decide who to approve for credit while managing default risk.
Approving too many risky loans leads to losses. Being too conservative reduces revenue.

This project focuses on:

identifying drivers of default
comparing an interpretable baseline model with a stronger benchmark
understanding how decision thresholds change false approvals vs missed defaults

Data

The dataset comes from OpenML (German Credit, credit-g).
It contains borrower and loan attributes such as credit amount, duration, employment status, and savings.

The target variable is a binary default indicator (1 = default, 0 = non-default).

What the code does

Splits the data into train, validation, and test sets using stratification
Handles missing values and encodes categorical variables
Trains two models:
- Logistic regression (baseline, interpretable)
- Random forest (benchmark, non-linear)
Tunes the classification threshold using validation data
Evaluates performance on a held-out test set

Evaluation

Model performance is evaluated using metrics that reflect credit risk tradeoffs:

ROC-AUC
Precision–recall AUC
Confusion matrices at tuned thresholds

Threshold tuning highlights how recall for defaults increases at the cost of more false positives, which mirrors real approval decisions.

Plots and metrics are saved automatically for review.

Outputs

After running the pipeline, the following are generated:

Evaluation plots (ROC, precision–recall, confusion matrices)
A JSON file with model metrics and thresholds
Saved model artifacts

Running the project

python -m venv .venv
pip install -r requirements.txt
python runPipeline.py

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
runPipeline.py		runPipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Risk Analysis (Default Prediction)

Problem

Data

What the code does

Evaluation

Outputs

Running the project

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Credit Risk Analysis (Default Prediction)

Problem

Data

What the code does

Evaluation

Outputs

Running the project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages