Skip to content

Add support for validating Zarr folders with format V3 using custom walk-through and tensorstore#1614

Merged
yarikoptic merged 15 commits intodandi:masterfrom
candleindark:zarr-format-v3
Apr 23, 2025
Merged

Add support for validating Zarr folders with format V3 using custom walk-through and tensorstore#1614
yarikoptic merged 15 commits intodandi:masterfrom
candleindark:zarr-format-v3

Conversation

@candleindark
Copy link
Copy Markdown
Member

@candleindark candleindark commented Apr 16, 2025

This PR brings the following changes

  1. It provides a temporary workaround to support validation of Zarr format V3 objects, Zarr arrays or groups, before zarr-python 3.x upgrade is done. It closes zarr 3 support #1609.
  2. In achieving point 1, this PR provides a new definition of get_zarr_format_version() provided and uses this definition. It closes Make use of get_zarr_format_version to provide version of zarr #1610.
  3. Additionally, this PR corrects the use of zarr.open() in validation. zarr.open() has a default mode of a, read and write. In this mode, zarr.open() would attempt to modify a Zarr store if it deems the store as an invalid one, corrupting the store. This PR changes the use of zarr.open() from a mode to r mode, read-only.

Note:
Zarr format 3 objects/stores are added to the git repo for testing since with zarr-python 2.x, we can't generate such stores dynamically.

Remaining TODOs:

  • Add/update tests

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 16, 2025

Codecov Report

Attention: Patch coverage is 85.13514% with 11 lines in your changes missing coverage. Please review.

Project coverage is 88.69%. Comparing base (88c3275) to head (8ba6a8f).
Report is 96 commits behind head on master.

Files with missing lines Patch % Lines
dandi/files/zarr.py 82.25% 11 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1614      +/-   ##
==========================================
+ Coverage   88.58%   88.69%   +0.11%     
==========================================
  Files          78       78              
  Lines       10937    10991      +54     
==========================================
+ Hits         9688     9749      +61     
+ Misses       1249     1242       -7     
Flag Coverage Δ
unittests 88.69% <85.13%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yarikoptic yarikoptic changed the title Add support of Zarr format V3 Add support for validating Zarr format V3 using custom walk-through and tensorstore Apr 16, 2025
@yarikoptic yarikoptic changed the title Add support for validating Zarr format V3 using custom walk-through and tensorstore Add support for validating Zarr folders with format V3 using custom walk-through and tensorstore Apr 16, 2025
A top level func to validate a Zarr format
V3 LocalStore
This is no longer needed. Individual arrays
are located within `_ts_validate_zarr3()`
@candleindark candleindark force-pushed the zarr-format-v3 branch 2 times, most recently from e3a0735 to e4233da Compare April 17, 2025 17:14
A function to validate a Zarr format V3
array in a LocalStore with the tensorstore
To avoid import of dandi when there is already
`from dandi import ...`
Zarr format V3 handling is done though tensorstore
It is no longer needed for all Zarr validation errors
are now return as a `ValidationResult`
This is to achieve consistency with other validation funcs
in the package
This is to achieve consistency with other validation funcs
in the package
@candleindark
Copy link
Copy Markdown
Member Author

@yarikoptic Is a Zarr store, the *.zarr directory in the filesystem the represent a Zarr object, array or group, considered as a file in dandi-cli? I see code such as https://github.com/dandi/dandi-cli/blob/6518334b60c541890d83e293d000f350280dfbad/dandi/files/zarr.py#L265C21-L265C37 treat a directory as file by using Scope.FILE to mark a validation result.

Currently, I used Scope.FOLDER to mark validation error regarding Zarr objects. Should I be consistent and use Scope.FILE intead?

@yarikoptic
Copy link
Copy Markdown
Member

File scope was used to reflect the fact that it applies to that single "Zarr dataset", for which we have a single asset in DANDI, and not a directory which might contain multiple files/assets... Let's go with File for now, although we might want to improve or change semantic here later on

So to be consistent with other usage of the mark in
this project
`zarr.open()` have a default mode of `a`, read and write.
In this mode, `zarr.open()` would attempt to modify a Zarr
store if it deems the store as an invalid one, corrupting
the store
@candleindark candleindark added bug Something isn't working zarr enhancement New feature or request labels Apr 17, 2025
@yarikoptic yarikoptic added the minor Increment the minor version when merged label Apr 17, 2025
@candleindark
Copy link
Copy Markdown
Member Author

@yarikoptic This PR is ready for review. The remaining lines to be tested are either too simple or too tricky to worth investing time for testing this temporary workaround.

@candleindark candleindark marked this pull request as ready for review April 17, 2025 23:01
@candleindark candleindark requested a review from yarikoptic April 17, 2025 23:01
@yarikoptic yarikoptic added the release Create a release when this pr is merged label Apr 23, 2025
@yarikoptic yarikoptic merged commit 1524188 into dandi:master Apr 23, 2025
43 of 48 checks passed
@github-actions
Copy link
Copy Markdown

🚀 PR was released in 0.68.0 🚀

@yarikoptic yarikoptic mentioned this pull request Apr 23, 2025
@candleindark candleindark deleted the zarr-format-v3 branch May 22, 2025 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request minor Increment the minor version when merged release Create a release when this pr is merged released zarr

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make use of get_zarr_format_version to provide version of zarr zarr 3 support

3 participants