brew install python
Creating a virtual environement is a good way to ensure our work does not get poluted by other python libs downloaded on the same machine.
This thing needs to be done once:
rm -rf ./pythonvenv
python3 -m venv ./pythonvenv
source pythonvenv/bin/activate
pip install --upgrade pip
pip install pandas scikit-learn m2cgen pypmml sklearn2pmml joblib
in every new shell, you'll then need to do:
source pythonvenv/bin/activate
The following MB reports can be used to download some data to train and evaluate our models:
| Purpose | Reference Date | Link | Expected Location |
|---|---|---|---|
| Full Extract (BD) | 2025-01-01 | Metabase | data/full_extract.csv |
The following steps should be run:
bash ./01.split_data_called_notcalled.sh
python ./02.split_data.py ./data/generated.called.patients.csv
python ./02.split_data.py ./data/generated.not_called.patients.csv
python ./13.notcalled.train_models.py
python ./14.notcalled.validate_models.py
python ./15.notcalled.evaluate_model_quality.py
python ./23.called.train_models.py
python ./24.called.validate_models.py
python ./25.called.evaluate_model_quality.py
or just run
bash 99.clean_all.sh; bash 90.run_all.sh
This document explains the meaning and interpretation of the numerical metrics used to evaluate the performance of your predictive models (Classifier and Combined Metric) on the validation dataset.
These metrics assess how accurately your model predicts the binary outcome (\text{Visit}=1) or (\text{No Visit}=0) based on the raw chances\_to\_visit score.
| Metric | Meaning | Interpretation | Goal |
|---|---|---|---|
| AUC-ROC | Area Under the Receiver Operating Characteristic Curve. | Measures the model's ability to distinguish between the "Visit" class and the "No Visit" class across all possible probability thresholds. This is the best single measure of model separation quality. | Closer to 1.0 (Typically >0.75 is good) |
| Precision | Quality of Positive Predictions. | Of all the patients the model predicted would visit, how many actually did? Crucial for addressing overconfidence (false positives). | Closer to 1.0 |
| Recall (Sensitivity) | Completeness of Positive Predictions. | Of all the patients who actually visited, what percentage did the model correctly identify? | Closer to 1.0 |
| F1 Score | Balance. | The harmonic mean of Precision and Recall. Useful when you need a balanced measure of performance, particularly if minimizing both false positives and false negatives is important. | Closer to 1.0 |
| Accuracy | Overall Correctness. | The total percentage of all predictions (both correct visits and correct no-shows) that were correct. (Note: Can be misleading if classes are highly imbalanced). | Closer to 1.0 |
These metrics specifically evaluate the quality of our goal prediction: the likelihood that a patient will visit AND do so with less than 30 days of delay (`chances_to_visit_under_30d`).
| Metric | Meaning | Interpretation | Goal |
|---|---|---|---|
| Combined Metric AUC-ROC | Goal Ranking Quality. | Measures how well your final combined score ranks the patients who actually met the |
Closer to 1.0 |
| Combined Metric Precision | Goal Confidence. | If you select a cohort based on a high combined score, this is the percentage that truly met the |
Closer to 1.0 |
==================================================
MODEL QUALITY EVALUATION
==================================================
Accuracy : 0.9782
Precision : 0.9832
Recall (Sensitivity) : 0.9949
F1 Score : 0.9890
AUC-ROC : 0.6370
Combined Metric AUC-ROC : 0.6983
Combined Metric Precision : 0.8866
==================================================
==================================================
MODEL QUALITY EVALUATION
==================================================
Accuracy : 0.8693
Precision : 0.8131
Recall (Sensitivity) : 0.9471
F1 Score : 0.8750
AUC-ROC : 0.9211
Combined Metric AUC-ROC : 0.4811
Combined Metric Precision : 1.0000
==================================================