-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Support multiple items in DataTree.__getitem__ and improve NodePath (renamed to TreePath) #10854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Rename `NodePath` to `TreePath` to make its name slightly more obvious - Automatically normalize paths in the `TreePath` constructor - Use `joinpath()` and normalized tree paths to simplify implementations of `_get_item` and `_set_item`. None of these changes are user facing.
|
looks like there was a similar attempt in #10400, in case it helps According to our policy, we can drop python minimum_versions.py --policy ci/policy.yaml --today 2026-04-04 ci/requirements/min-all-deps.yml |
|
This is ready for review. The main thing this could use is clear documentation, to explain that in the case of indexing multiple keys, the resulting DataTree is always defined relative to the node being indexed. This is rather different from the API proposed in #10400, which tries to index the selected variables at each node. Ideally we could supply this functionality in a dedicated method (which would also make it easier to document), e.g., cc @eni-awowale |
|
|
||
| def test_getitem_on_child(self) -> None: | ||
| data = DataTree.from_dict({"a/b/c": 0, "a/d": 1, "e": 2}) | ||
| child = data.children["a"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be the distinction between the future .subset method and [['a']] selection?
For example with a datatree like:
dt1 = xr.DataTree.from_dict({'/': xr.Dataset(coords={'x': [1, 2, 3]}), '/a': xr.Dataset({'n': 1}), '/a/b': xr.Dataset({'foo': 1})})
<xarray.DataTree>
Group: /
│ Dimensions: (x: 3)
│ Coordinates:
│ * x (x) int64 24B 1 2 3
└── Group: /a
│ Dimensions: ()
│ Data variables:
│ n int64 8B 1
└── Group: /a/b
Dimensions: ()
Data variables:
foo int64 8B 1When we select with dt1[['a/b']] we get a datatree that has an empty "a" group, with coordinates from the root group so:
<xarray.DataTree>
Group: /
└── Group: /a
└── Group: /a/b
Dimensions: (x: 3)
Coordinates:
* x (x) int64 24B 1 2 3
Data variables:
foo int64 8B 1So .subset would do something like dt1.children['a'][['/a/b']], we get a datatree that only returns the b group with the "x" coordinates from root group, so:
<xarray.DataTree 'a'>
Group: /
└── Group: /b
Dimensions: (x: 3)
Coordinates:
* x (x) int64 24B 1 2 3
Data variables:
foo int64 8B 1
This PR adds support for indexing with multiple items as a list of paths in
DataTree.__getitem__, e.g.,tree[['first', 'second']].It also includes internal improvements to
NodePath(now renamed toTreePath):NodePathtoTreePathto make its name slightly more obviousTreePathconstructorjoinpath()and normalized tree paths to simplify implementations of_get_itemand_set_item.