Skip to content

Conversation

@shoyer
Copy link
Member

@shoyer shoyer commented Oct 15, 2025

This PR adds support for indexing with multiple items as a list of paths in DataTree.__getitem__, e.g., tree[['first', 'second']].

It also includes internal improvements to NodePath (now renamed to TreePath):

  • Rename NodePath to TreePath to make its name slightly more obvious
  • Automatically normalize paths in the TreePath constructor
  • Use joinpath() and normalized tree paths to simplify implementations of _get_item and _set_item.

- Rename `NodePath` to `TreePath` to make its name slightly more obvious
- Automatically normalize paths in the `TreePath` constructor
- Use `joinpath()` and normalized tree paths to simplify implementations
  of `_get_item` and `_set_item`.

None of these changes are user facing.
@shoyer shoyer requested a review from TomNicholas October 15, 2025 00:04
@github-actions github-actions bot added topic-backends topic-zarr Related to zarr storage library topic-DataTree Related to the implementation of a DataTree class io labels Oct 15, 2025
@shoyer shoyer changed the title Internal improvements to NodePath (renamed to TreePath) Support multiple items in DataTree.__getitem__ and improve NodePath (renamed to TreePath) Oct 15, 2025
@keewis
Copy link
Collaborator

keewis commented Oct 22, 2025

looks like there was a similar attempt in #10400, in case it helps

According to our policy, we can drop python=3.11 from 2026-04-04 onwards – you can simulate this by passing today to minimum_versions:

python minimum_versions.py --policy ci/policy.yaml --today 2026-04-04 ci/requirements/min-all-deps.yml

@shoyer
Copy link
Member Author

shoyer commented Oct 29, 2025

This is ready for review.

The main thing this could use is clear documentation, to explain that in the case of indexing multiple keys, the resulting DataTree is always defined relative to the node being indexed. This is rather different from the API proposed in #10400, which tries to index the selected variables at each node.

Ideally we could supply this functionality in a dedicated method (which would also make it easier to document), e.g., DataTree.subset() as we discussed last week at the Xarray meeting. This could be similar to the existing discussion about adding a public API for Dataset._copy_listed(): #3894

cc @eni-awowale


def test_getitem_on_child(self) -> None:
data = DataTree.from_dict({"a/b/c": 0, "a/d": 1, "e": 2})
child = data.children["a"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be the distinction between the future .subset method and [['a']] selection?

For example with a datatree like:

dt1 = xr.DataTree.from_dict({'/': xr.Dataset(coords={'x': [1, 2, 3]}), '/a': xr.Dataset({'n': 1}), '/a/b': xr.Dataset({'foo': 1})})
<xarray.DataTree>
Group: /Dimensions:  (x: 3)
│   Coordinates:
│     * x        (x) int64 24B 1 2 3
└── Group: /aDimensions:  ()
    │   Data variables:
    │       n        int64 8B 1
    └── Group: /a/b
            Dimensions:  ()
            Data variables:
                foo      int64 8B 1

When we select with dt1[['a/b']] we get a datatree that has an empty "a" group, with coordinates from the root group so:

<xarray.DataTree>
Group: /
└── Group: /a
    └── Group: /a/b
            Dimensions:  (x: 3)
            Coordinates:
              * x        (x) int64 24B 1 2 3
            Data variables:
                foo      int64 8B 1

So .subset would do something like dt1.children['a'][['/a/b']], we get a datatree that only returns the b group with the "x" coordinates from root group, so:

<xarray.DataTree 'a'>
Group: /
└── Group: /b
        Dimensions:  (x: 3)
        Coordinates:
          * x        (x) int64 24B 1 2 3
        Data variables:
            foo      int64 8B 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

io topic-backends topic-DataTree Related to the implementation of a DataTree class topic-zarr Related to zarr storage library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants