Skip to content

simpledotorg/poc_scoring_partient_return

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Setting up local environment

Installing python

brew install python

Creating python virtal environment

Creating a virtual environement is a good way to ensure our work does not get poluted by other python libs downloaded on the same machine.

This thing needs to be done once:

rm -rf ./pythonvenv
python3 -m venv ./pythonvenv
source pythonvenv/bin/activate
pip install --upgrade pip
pip install pandas scikit-learn m2cgen pypmml sklearn2pmml joblib

in every new shell, you'll then need to do:

source pythonvenv/bin/activate

Getting Data to work on

The following MB reports can be used to download some data to train and evaluate our models:

Purpose Reference Date Link Expected Location
Full Extract (BD) 2025-01-01 Metabase data/full_extract.csv

Running the scoring

The following steps should be run:

bash   ./01.split_data_called_notcalled.sh
python ./02.split_data.py ./data/generated.called.patients.csv
python ./02.split_data.py ./data/generated.not_called.patients.csv
python ./13.notcalled.train_models.py
python ./14.notcalled.validate_models.py
python ./15.notcalled.evaluate_model_quality.py
python ./23.called.train_models.py
python ./24.called.validate_models.py
python ./25.called.evaluate_model_quality.py

or just run

bash 99.clean_all.sh; bash 90.run_all.sh

Model Quality Metrics Explained

This document explains the meaning and interpretation of the numerical metrics used to evaluate the performance of your predictive models (Classifier and Combined Metric) on the validation dataset.


1. Classification Metrics

These metrics assess how accurately your model predicts the binary outcome (\text{Visit}=1) or (\text{No Visit}=0) based on the raw chances\_to\_visit score.

Metric Meaning Interpretation Goal
AUC-ROC Area Under the Receiver Operating Characteristic Curve. Measures the model's ability to distinguish between the "Visit" class and the "No Visit" class across all possible probability thresholds. This is the best single measure of model separation quality. Closer to 1.0 (Typically >0.75 is good)
Precision Quality of Positive Predictions. Of all the patients the model predicted would visit, how many actually did? Crucial for addressing overconfidence (false positives). Closer to 1.0
Recall (Sensitivity) Completeness of Positive Predictions. Of all the patients who actually visited, what percentage did the model correctly identify? Closer to 1.0
F1 Score Balance. The harmonic mean of Precision and Recall. Useful when you need a balanced measure of performance, particularly if minimizing both false positives and false negatives is important. Closer to 1.0
Accuracy Overall Correctness. The total percentage of all predictions (both correct visits and correct no-shows) that were correct. (Note: Can be misleading if classes are highly imbalanced). Closer to 1.0

2. Combined Likelihood Metric Efficiency

These metrics specifically evaluate the quality of our goal prediction: the likelihood that a patient will visit AND do so with less than 30 days of delay (`chances_to_visit_under_30d`).

Metric Meaning Interpretation Goal
Combined Metric AUC-ROC Goal Ranking Quality. Measures how well your final combined score ranks the patients who actually met the $\text{Visit} + <30 \text{ Days}$ goal. Closer to 1.0
Combined Metric Precision Goal Confidence. If you select a cohort based on a high combined score, this is the percentage that truly met the $\text{Visit} + <30 \text{ Days}$ success condition. Closer to 1.0

3. Example of Result

==================================================
           MODEL QUALITY EVALUATION
==================================================
Accuracy                      : 0.9782
Precision                     : 0.9832
Recall (Sensitivity)          : 0.9949
F1 Score                      : 0.9890
AUC-ROC                       : 0.6370
Combined Metric AUC-ROC       : 0.6983
Combined Metric Precision     : 0.8866
==================================================
==================================================
           MODEL QUALITY EVALUATION
==================================================
Accuracy                      : 0.8693
Precision                     : 0.8131
Recall (Sensitivity)          : 0.9471
F1 Score                      : 0.8750
AUC-ROC                       : 0.9211
Combined Metric AUC-ROC       : 0.4811
Combined Metric Precision     : 1.0000
==================================================

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published