Add Permutation Importance #202

mayer79 · 2025-05-10T10:04:15Z

Implements #201

mayer79 · 2025-05-16T11:49:02Z

This is the current basic call:

import numpy as np
import polars as pl
from sklearn.linear_model import LinearRegression

from model_diagnostics.xai import plot_permutation_importance

rng = np.random.default_rng(1)
n = 1000

X = pl.DataFrame(
    {
        "area": rng.uniform(30, 120, n),
        "rooms": rng.choice([2.5, 3.5, 4.5], n),
        "age": rng.uniform(0, 100, n),
    }
)

y = X["area"] + 20 * X["rooms"] + rng.normal(0, 1, n)

model = LinearRegression()
model.fit(X, y)

_ = plot_permutation_importance(
    predict_function=model.predict,
    X=X,
    y=y,
)

The extended feature API allows to permute groups like this:

_ = plot_permutation_importance(
    predict_function=model.predict,
    features={"size": ["area", "rooms"], "age": "age"},
    X=X,
    y=y,
)

…xcept parrow

lorentzenchr · 2025-05-19T18:50:02Z

src/model_diagnostics/xai/permutation_importance.py

+from model_diagnostics.scoring import SquaredError
+
+
+def safe_copy(X):


Might be good to put safe_copy and safe_column_names into _utils.array and add tests for them. I think they cause the current CI failure.

My local tests are failing for the Python 3.9 environment only (pandas and pyarrow). I will move the functions to _utils.array, draft some unit tests, and rename safe_column_names() to get_column_names().

lorentzenchr · 2025-05-19T18:50:32Z

This will be a great addition! Thanks @mayer79

lorentzenchr · 2025-05-23T20:41:26Z

The failing test is in the python 3.9 env with
numpy 1.22.0
polars 1.0.0
scipy 1.10.0
pandas 1.5.3
pyarrow 11.0.0

Could you check if increasing one of the versions fixes the problem, e.g. polars version?

mayer79 · 2025-05-24T13:08:18Z

The failing test is in the python 3.9 env with numpy 1.22.0 polars 1.0.0 scipy 1.10.0 pandas 1.5.3 pyarrow 11.0.0

Could you check if increasing one of the versions fixes the problem, e.g. polars version?

The following changes in the 3.9 env would be necessary. I don't know how much it would hurt to abandon pandas 1

pyarrow 11 -> 13
pandas 1.5 -> 2.0

I have added some additional unit tests and moved safe_copy() and get_column_names() to array.py.

lorentzenchr · 2025-06-25T21:08:23Z

fyi, CI will fail due to new versions of polars and numpy. I am working on a fix.

lorentzenchr · 2025-06-26T20:34:09Z

Fix in #203, you need to sync (e.g. merge) with the main branch (and maybe hatch env prune on your local machine).

lorentzenchr

Review of array utils.

lorentzenchr · 2025-07-01T11:56:48Z

src/model_diagnostics/_utils/array.py

+                    # if not x.index.is_unique:
+                    # Pandas might error with:
+                    #   cannot reindex on an axis with duplicate labels
+                    # Try reindexing ourselves.
+                    x = x.reset_index(drop=True)
+                    # if not pd_values.index.is_unique:
+                    pd_values = pd_values.reset_index(drop=True)


Do you find a condition (if statement) such that this is only executed if (strictly) required?

Could you add a test case (to an existing test) or a new test that fails without this change?

Not sure. I think this test would fail with Pandas 1.5, because the index of the assignment is not matching:

def test_safe_assign_column_works_for_pandas_with_inconsistent_index(): """Test that safe_assign_column works for pandas dfs regarding indices.""" df = pd_DataFrame({"a": [0, 1, 2]}, index=[0, 1, 2]) if isinstance(df, SkipContainer): pytest.skip("Module for data container not imported.") df = safe_assign_column( df, values=pd_Series([10, 20, 30], index=[1, 1, 0]), column_index=0 ) expected = pd_DataFrame({"a": [10, 20, 30]}) assert_array_equal(df, expected)

But I don't know how to test because such test is skipped.

Now, when we remove pandas <2 support, safe_assign_column() does not need to be as strict as in the commited version.

src/model_diagnostics/_utils/array.py

src/model_diagnostics/_utils/tests/test_array.py

lorentzenchr · 2025-07-17T14:40:26Z

fyi: I am preparing to bump the minimum versions of python to 3.11 and numpy to 2. This implies polars 1.1.0, pandas >= 2.2.2 and pyarrow >= 16, see #206.

mayer79 · 2025-07-26T18:29:57Z

I have modified these aspects in the main functionality:

compute_permutation_importance() now returns both score differences and score ratios
Instead of standard deviations, the function returns standard errors
The plot function has received an argument which="difference" to select if score differences or ratios are to be plotted.
By default, the plot function shows approximate 95% CIs. The API as in your other functions, i.e., when confidence_level=0, no error bars are plotted.

Add compute_permutation_importance()

1c594f8

mayer79 self-assigned this May 10, 2025

mayer79 added the enhancement New feature or request label May 10, 2025

mayer79 marked this pull request as draft May 10, 2025 10:04

Replace ipynb by py

775a150

mayer79 changed the title ~~Add compute_permutation_importance()~~ Add Permutation Importance May 10, 2025

mayer79 added 4 commits May 10, 2025 12:24

Catch None values of n_repeats

0bf083e

doctest failure

f8485d0

add plot_permutation_importance()

0c7e6f6

Improve docstring

98e611a

mayer79 added 19 commits May 16, 2025 13:55

Linter

3f44810

remove base_score and n_repeats from output

7d3a4c7

docstring on features argument

7cdc7b7

calculate base score before stacking

63f7825

use scipy special to calculate t quantile

293ca1d

remove reset_index()

c312f1f

Fix doctest

0132dc5

Allow max_display=None

2877208

Add unit tests for plot

42705b9

Add error message for max_display

262540b

Remove wrong Optional typing

e5dcc6f

Replace boolean function argument

5215e53

Linter

ff21a64

Expand docstring of plot()

e5c65dd

simpler safe_select_column()

7f696eb

Replace safe_get_column() by get_second_dimension()

617308e

drop safe_index_rows_1d()

32b13d2

Clarify that np.split() works on all relevant prediction containers e…

8b5d36b

…xcept parrow

First unit tests on calculate_permutation_importance()

62535c0

remove empty line

59a6d6a

lorentzenchr reviewed May 19, 2025

View reviewed changes

mayer79 added 5 commits May 24, 2025 13:43

Move and rename helper functions

05f3143

Use small x instead of capital X

bd7bbb9

Add unit test for get_column_names()

196b2c9

Add unit test for safe_copy()

eba4c8a

Add unit test to check if calculations have side effects

6126b53

mayer79 added 2 commits May 24, 2025 15:09

test typing failures

c9397d0

Add unit test

8fc6117

mayer79 marked this pull request as ready for review May 29, 2025 08:37

mayer79 added 3 commits June 25, 2025 21:34

stop reset_index() to add column "index"

c298514

Bump minimal pyarrow version to comply with scikit-learn

45a3042

not only non-unique indices are problematic with pandas 1

319f453

mayer79 added 4 commits June 27, 2025 11:57

Merge branch 'main' into enh-permutation-importance

def1779

Add tests to catch error messages

c9dce8c

Improve unit tests

e937365

Lint

b5b2b3f

lorentzenchr reviewed Jul 1, 2025

View reviewed changes

mayer79 added 5 commits July 26, 2025 16:43

Return both diff and ratio, and stderr each

7a26912

Fix failing tests

bccd443

Apply some of the suggestions on array review

8b39774

add a hypothetical test on safe_copy

ae168af

fix docstring example output

9466db5

		from model_diagnostics.scoring import SquaredError


		def safe_copy(X):

Add Permutation Importance #202

Are you sure you want to change the base?

Add Permutation Importance #202

Uh oh!

Conversation

mayer79 commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mayer79 commented May 16, 2025

Uh oh!

lorentzenchr May 19, 2025

Choose a reason for hiding this comment

Uh oh!

mayer79 May 20, 2025

Choose a reason for hiding this comment

Uh oh!

lorentzenchr commented May 19, 2025

Uh oh!

lorentzenchr commented May 23, 2025

Uh oh!

mayer79 commented May 24, 2025

Uh oh!

lorentzenchr commented Jun 25, 2025

Uh oh!

lorentzenchr commented Jun 26, 2025

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

mayer79 Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mayer79 commented Jul 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mayer79 commented May 10, 2025 •

edited

Loading

lorentzenchr commented Jul 17, 2025 •

edited

Loading