Fix `drop_sel` for a MultiIndex #10863

owena11 · 2025-10-17T14:10:14Z

This is a proposed fix to #10862 to allow consistent usage between sel and drop_sel for MultiIndexes. The current implementation seems to convert the labels to an array to handle the edge case of labels of type xr.DataArray. This change makes that edge case more explict and delegates the reponsibility for converting to an array if needed down to the indexer.

Exisitng indexes already do this and so the change shouldn't have any preformance implications:

pandas.Index.drop - here
pandas.MultiIndex.drop - here
Closes Differing behaviour between sel and drop_sel for MultiIndexes #10862
Tests added

Most labels passed into `drop_sel` can be handled by the underlying libraries, and will covert to an array as the current implementation does. xr.DataArray is a special case that is supported as set of labels but doesn't interact well with pandas coversion to an array.

welcome · 2025-10-17T14:10:17Z

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

owena11 · 2025-10-17T14:14:56Z

xarray/core/dataset.py

        Data variables:
            A        (x, y) int64 32B 0 2 3 5
        """
+        from xarray.core.dataarray import DataArray


I've imported DataArray here as that seems to fit with the style in other parts of the codebase (here), however I'm not entirely sure why this is done.

I think that dataarray.py imports dataset.py already (historically Dataset predates DataArray slightly), so this avoids a recursive import.

Great, thanks for the explanation👍

max-sixty · 2025-10-19T19:22:21Z

I don't know this area that well, though it does seem to solve the immediate problem.

would anyone have alternative approaches that get at the root of the issue?

otherwise I would suggest merging this, maybe with a comment explaining, and then nothing prevents refactoring later...

(adding @benbovy , the indexing Czar, though others will also know more)

owena11 · 2025-10-20T13:10:24Z

I don't know this area that well,

I have to admit the same! To try and provide more context I've dived down the git blame rabbit hole to explain the original reason for the cast to an array type within the code.

It was introduced in this PR #3177. The PR intorduces a whole bunch of type hints for the old drop method before the split to drop_vars / drop_sel refereced indexes: 'OrderedDict[Any, pd.Index]'. At that time the Index.drop didn't have a type signature apart from a docstring in pandas unless they're stored elsewhere. So the best theory I have is that is was related to matching the types from somewhere in the chain.

shoyer

Thanks @owena11 !

shoyer · 2025-10-28T23:57:41Z

xarray/core/dataset.py

        Data variables:
            A        (x, y) int64 32B 0 2 3 5
        """
+        from xarray.core.dataarray import DataArray


I think that dataarray.py imports dataset.py already (historically Dataset predates DataArray slightly), so this avoids a recursive import.

shoyer · 2025-10-29T00:00:59Z

xarray/core/dataset.py

+            # Most conversion to arrays is better handled in the indexer, however
+            # DataArrays are a special case where the underlying libraries don't provide
+            # a good conversition.
+            if isinstance(labels_for_dim, DataArray):
+                labels_for_dim = np.asarray(labels_for_dim)


If you wanted to make this a little safer, could add:

Suggested change

# Most conversion to arrays is better handled in the indexer, however

# DataArrays are a special case where the underlying libraries don't provide

# a good conversition.

if isinstance(labels_for_dim, DataArray):

labels_for_dim = np.asarray(labels_for_dim)

# Most conversion to arrays is better handled in the indexer, however

# DataArrays are a special case where the underlying libraries don't provide

# a good conversition.

if isinstance(labels_for_dim, DataArray):

if labels_for_dim.dims not in ((), (dim,)):

raise ValueError(

"cannot use drop_sel() with DataArray values with "

"along dimensions other than the dimensions being "

f"indexed along: {labels_for_dim}"

)

labels_for_dim = np.asarray(labels_for_dim)

But this LGTM to me! Definitely an incremental improvement.

Commit and then reverted, highlighed by the tests it might break peoples current usage where a DataArray gets assigned the default dim names (i.e dim_0 etc) .

Also despite thinking this would nudge sel and drop_sel to be more consistent. After checking neither enforcing the alignment between dims for selecting with a DataArray:

>>> data = xr.Dataset({"x": ["a", "b"]}) >>> data.sel(x=xr.DataArray(["a",], dims=("y",))) <xarray.Dataset> Size: 4B Dimensions: (y: 1) Coordinates: x (y) <U1 4B 'a' Dimensions without coordinates: y Data variables: *empty*

So would propose leaving this change for now.

Right, sel() imposes the dimensions and coordinates of the indexer rather than checking for alignment. It is not obvious to me what the inverse of that would be!

Add additional validation for `drop_sel` such that when selecting with a DataArray the dimensions must be named consistently between the DataArray and the dimension you're dropping the selection from. This matches the behaviour with `sel`. Co-authored-by: Stephan Hoyer <shoyer@google.com>

for more information, see https://pre-commit.ci

This reverts commit ca372a1.

This reverts commit 293fa1f.

welcome · 2025-10-29T16:23:12Z

Congratulations on completing your first pull request! Welcome to Xarray! We are proud of you, and hope to see you again!

owena11 · 2025-10-29T16:41:10Z

Thanks for the time reviewing @shoyer & @max-sixty!

owena11 added 2 commits October 17, 2025 14:59

Add test for dropping multiindex labels

d288643

owena11 changed the title ~~Fix drop_sel MultiIndex~~ Fix drop_sel for a MultiIndex Oct 17, 2025

owena11 commented Oct 17, 2025

View reviewed changes

owena11 and others added 3 commits October 21, 2025 18:28

Merge branch 'main' into 10862_fix_dropsel_multiindex

7b09905

Merge branch 'main' into 10862_fix_dropsel_multiindex

f1089d0

Merge branch 'main' into 10862_fix_dropsel_multiindex

22aadff

max-sixty added the plan to merge Final call for comments label Oct 28, 2025

shoyer approved these changes Oct 29, 2025

View reviewed changes

owena11 and others added 4 commits October 29, 2025 09:56

[pre-commit.ci] auto fixes from pre-commit.com hooks

ca372a1

for more information, see https://pre-commit.ci

Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

7d23fed

This reverts commit ca372a1.

Revert "Validation for DataArray indexing "

8950155

This reverts commit 293fa1f.

shoyer merged commit 7f57f01 into pydata:main Oct 29, 2025
37 checks passed

owena11 deleted the 10862_fix_dropsel_multiindex branch October 29, 2025 16:41

Uh oh!

Uh oh!

Fix drop_sel for a MultiIndex #10863

Fix drop_sel for a MultiIndex #10863

Conversation

owena11 commented Oct 17, 2025

Uh oh!

welcome bot commented Oct 17, 2025

Uh oh!

owena11 Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

shoyer Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

owena11 Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

max-sixty commented Oct 19, 2025

Uh oh!

owena11 commented Oct 20, 2025

Uh oh!

shoyer left a comment

Choose a reason for hiding this comment

Uh oh!

shoyer Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

shoyer Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

owena11 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

shoyer Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

welcome bot commented Oct 29, 2025

Uh oh!

owena11 commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix `drop_sel` for a MultiIndex #10863

Fix `drop_sel` for a MultiIndex #10863

owena11 Oct 29, 2025 •

edited

Loading