Add versioned Zarr design doc by waxlamp · Pull Request #2702 · dandi/dandi-archive

waxlamp · 2026-02-06T17:57:44Z

No description provided.

doc/design/versioned-zarr.md

Co-authored-by: Kabilar Gunalan <kabi@mit.edu>

doc/design/versioned-zarr.md

satra

overall looks reasonable. is the copy-on-write really future work or actually easier for most smaller zarrs that are similar in size to nwb files. (since nwb zarr is as much part of this equation as ome zarr). perhaps these two relevant zarr data should be added in the executive summary so we don't forget.

doc/design/versioned-zarr.md

yarikoptic

initial incomplete pass comments

doc/design/versioned-zarr.md

yarikoptic · 2026-02-09T20:57:41Z

doc/design/versioned-zarr.md

+
+DANDI considers Zarrs to be a special type of asset, one that is not associated with an “asset blob” (i.e., a single file) but rather a specialized Zarr record that knows how to refer to an S3 prefix containing all of the chunks for that Zarr. Because Zarrs are large and complex, making a copy of the Zarr when it is updated (as is done for blob assets) is not feasible. This is essential to publishing a Dandiset, since a published version must contain an immutable set of assets (which may go on to be “edited” in copy-on-write fashion in future versions); as such, DANDI currently does not allow publishing of Zarr-bearing Dandisets.
+
+This design offers a way of handling Zarrs that enables making lightweight snapshots of a Zarr Archive that are suitable for publishing.


FWIW, I think it might be worth noting that so far here there is nothing Zarr specific in our need from Zarrs for this design doc. It is rather to support a "folder container" (with multiple files in a hierarchy) as a single asset, as opposed to a single "file blob".

I think thinking of it this way could help to avoid "overfitting for Zarr", and potentially then later allow for other, non-Zarr, use cases demanding similar "directories" support.

That's a good point. Do you have examples of other directory-based storage formats that behave as a single entity?

Still, I think it's better to keep the Zarr-specific framing for this doc. We have just the Zarr type to worry about now, and it's a heavy enough lift to get it right that I think generalizing to "folder assets" and then respecializing to Zarr will get us into trouble.

But I can leave a note here (with examples of data formats) that the technology we're developing for Zarrs might generalize to "folder assets".

doc/design/versioned-zarr.md

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

waxlamp · 2026-03-02T19:02:05Z

is the copy-on-write really future work or actually easier for most smaller zarrs that are similar in size to nwb files. (since nwb zarr is as much part of this equation as ome zarr).

This comes up naturally in the addenda I've put into the document in the last few days. I've added a commit (f177d6b) that mentions Zarr NWB explicitly.

yarikoptic · 2026-03-02T21:53:20Z

doc/design/versioned-zarr.md

+
+Because versioned Zarrs are not usually available directly via S3 (since many of their chunks will be “buried” under more current versions of that object), maintaining manifest files for each version of a Zarr appearing in a published Dandiset maintains the structure of these Zarrs directly in S3 (similarly to the manifest files currently recording which assets belong to a published Dandiset). This preserves the integrity of the bucket itself as a standalone data store (independent of the DANDI API and web application), but also provides an optimization for clients.
+
+Specifically, clients such as the DANDI CLI currently interact with Zarrs by asking the API for each chunk. For an operation such as “download a Zarr”, this can result in hundreds of requests per second to the API. However, if the CLI learned to first retrieve the manifest file, then issue requests directly to S3 to retrieve the chunks, all of the redirection burden is relieved (both for the API and the client itself).


I think it might be best to immediately identify properties or even a schema of the manifest file(s) and what metadata to be contained there in so that we could inform design of client in the scope of

Partial zarr download/upload (stage: Design + Implementation) dandi-cli#1816

There we reviewed the aspect of establishing storing a hierarchical checksum (so not only ETags on keys but ETags on folders), which apparently used to be dumped to S3 but that facility was removed in

Always clear checksum files during zarr ingestion #1390

Remove use of checksum files in zarr ingestion #1395

since was not used yet. But in scope of RF dandi-cli and overall integrity checking I think it would be great to reintroduce that. Note that our currently implemented prototype of manifests (used for webdav) is including ETag per key but not per "folder":

an example dump of a manifest head

❯ curl --silent https://datasets.datalad.org/dandi/zarr-manifests/zarr-manifests-v2-sorted/001/e3b/001e3b6d-26fb-463f-af28-520a25680ab4/326273bcc8730474323a66ea4e3daa49-113328--97037755426.json | jq . | head -n 40 { "schemaVersion": 2, "fields": [ "versionId", "lastModified", "size", "ETag" ], "statistics": { "entries": 113328, "depth": 5, "totalSize": 97037755426, "lastModified": "2022-04-23T23:08:58+00:00", "zarrChecksum": "326273bcc8730474323a66ea4e3daa49-113328--97037755426" }, "entries": { ".zattrs": [ "JKOglMKYg0dIr1ngNR_dguFZUioWE8MZ", "2022-04-23T23:08:58+00:00", 7901, "b8421735714f196291810afd48aa012d" ], ".zgroup": [ "joh9Hhu.uOtmHTewXyAXbI4mzi8AKjcb", "2022-03-16T02:39:41+00:00", 24, "e20297935e73dd0154104d4ea53040ab" ], ".zmetadata": [ "lFCKAw3nHHT66S14.hEaGXm9sBL9eYvW", "2022-04-23T23:08:58+00:00", 14970, "358d9e012ae0154f66d5e6d73ced977c" ], "0": { ".zarray": [ "I5ucOkyHjZcqQJhurzfi307VmxSSkm1M", "2022-03-16T02:39:43+00:00", 449, "a45bc329195c9d6fea8cdd9a46562771"

yarikoptic · 2026-03-03T16:39:45Z

doc/design/versioned-zarr.md

+
+### Zarr chunk storage
+
+Zarr chunks will continue to be stored in the DANDI S3 bucket as normal (in particular, enabling S3 to continue serving as a Zarr backend for third-party applications). Whenever changes are made to a Zarr and it is then finalized, an additional step will occur to record the current makeup of the Zarr chunks as an immutable set of database rows mapping each chunk’s path to its versioned S3 object. As the Zarr continues to mutate, new such immutable snapshots will be produced; the latest such snapshot will become a permanent part of any published Dandiset. Garbage collection routines will work to clean up unreachable versions of Zarrs. As an example, consider a Zarr with the following chunks stored in S3 (listed with a notional version ID in parentheses):


so is GC part of this design or not -- should be made explicit

yarikoptic · 2026-03-03T16:41:58Z

doc/design/versioned-zarr.md

+
+### Zarr chunk storage
+
+Zarr chunks will continue to be stored in the DANDI S3 bucket as normal (in particular, enabling S3 to continue serving as a Zarr backend for third-party applications). Whenever changes are made to a Zarr and it is then finalized, an additional step will occur to record the current makeup of the Zarr chunks as an immutable set of database rows mapping each chunk’s path to its versioned S3 object. As the Zarr continues to mutate, new such immutable snapshots will be produced; the latest such snapshot will become a permanent part of any published Dandiset. Garbage collection routines will work to clean up unreachable versions of Zarrs. As an example, consider a Zarr with the following chunks stored in S3 (listed with a notional version ID in parentheses):


... it is then finalized...

is that via POST /zarr/{id}/finalize/ client has to do? needs to be clarified what "then finalized" means here exactly

note that it is where having checksums for folders within zarr manifests could speed up full finalization on small patch fixes tremendously

what happens if never called? (e.g. could be Finalize "Pending" zarr if all upload URLs must have timed out by now #2051)

yarikoptic · 2026-03-03T16:43:19Z

doc/design/versioned-zarr.md

+2. **What are the performance characteristics of the DANDI API Zarr backend (compared to the S3 Zarr backend) and the manifest-driven access scheme?** If clients do not see a significant performance loss from using the DANDI Zarr backend (or if specialized clients can use manifest-driven access logic), then maintaining an S3 backend for every Zarr (at the costs described above) becomes less attractive for the overall system.
+3. **What are the costs and benefits of the S3 materialization design?** The S3 materialization design attempt to bridge between the current proposal and this alternative proposal by still providing an avenue to hosting Zarrs in an S3 backend-compatible way.
+
+### Data Integrity Issues with S3


we already have s3://zarr-checksums which according to claude cover ~72% of zarrs in the archive (I didn't check). but the point is that it should be noted here if we keep them, refactor, remove, or what?

Add versioned Zarr design doc

96c4761

waxlamp requested review from kabilar, satra and yarikoptic February 6, 2026 17:57

waxlamp added the design-doc Involves creating or discussing a design document label Feb 6, 2026

waxlamp marked this pull request as draft February 6, 2026 17:57

kabilar reviewed Feb 6, 2026

View reviewed changes

doc/design/versioned-zarr.md Outdated Show resolved Hide resolved

Apply suggestion from @kabilar

e6d9829

Co-authored-by: Kabilar Gunalan <kabi@mit.edu>

satra reviewed Feb 7, 2026

View reviewed changes

doc/design/versioned-zarr.md Show resolved Hide resolved

satra reviewed Feb 7, 2026

View reviewed changes

doc/design/versioned-zarr.md Show resolved Hide resolved

satra reviewed Feb 7, 2026

View reviewed changes

yarikoptic reviewed Feb 9, 2026

View reviewed changes

doc/design/versioned-zarr.md Show resolved Hide resolved

yarikoptic reviewed Feb 9, 2026

View reviewed changes

waxlamp and others added 6 commits February 26, 2026 11:01

Add discussion of alternate design

2c6698d

Add references to prior relevant design docs

7de38fb

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

Make note about "shard" and "chunk" terminology

e1f2ad5

Define "latest Zarr"

2ca87f5

Add discussion of "dirty read" issue

67f65ca

Mention Zarr NWB as a use case for chunk deduplication

f177d6b

yarikoptic reviewed Mar 2, 2026

View reviewed changes

yarikoptic reviewed Mar 3, 2026

View reviewed changes

kabilar self-requested a review March 26, 2026 22:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add versioned Zarr design doc#2702

Add versioned Zarr design doc#2702
waxlamp wants to merge 8 commits intomasterfrom
versioned-zarr-design-doc

waxlamp commented Feb 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

satra left a comment

Uh oh!

Uh oh!

yarikoptic left a comment

Uh oh!

Uh oh!

Uh oh!

yarikoptic Feb 9, 2026

Uh oh!

waxlamp Feb 26, 2026

Uh oh!

Uh oh!

waxlamp commented Mar 2, 2026

Uh oh!

yarikoptic Mar 2, 2026

Uh oh!

yarikoptic Mar 3, 2026

Uh oh!

yarikoptic Mar 3, 2026 •

edited

Loading

Uh oh!

yarikoptic Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		DANDI considers Zarrs to be a special type of asset, one that is not associated with an “asset blob” (i.e., a single file) but rather a specialized Zarr record that knows how to refer to an S3 prefix containing all of the chunks for that Zarr. Because Zarrs are large and complex, making a copy of the Zarr when it is updated (as is done for blob assets) is not feasible. This is essential to publishing a Dandiset, since a published version must contain an immutable set of assets (which may go on to be “edited” in copy-on-write fashion in future versions); as such, DANDI currently does not allow publishing of Zarr-bearing Dandisets.

		This design offers a way of handling Zarrs that enables making lightweight snapshots of a Zarr Archive that are suitable for publishing.


		Because versioned Zarrs are not usually available directly via S3 (since many of their chunks will be “buried” under more current versions of that object), maintaining manifest files for each version of a Zarr appearing in a published Dandiset maintains the structure of these Zarrs directly in S3 (similarly to the manifest files currently recording which assets belong to a published Dandiset). This preserves the integrity of the bucket itself as a standalone data store (independent of the DANDI API and web application), but also provides an optimization for clients.

		Specifically, clients such as the DANDI CLI currently interact with Zarrs by asking the API for each chunk. For an operation such as “download a Zarr”, this can result in hundreds of requests per second to the API. However, if the CLI learned to first retrieve the manifest file, then issue requests directly to S3 to retrieve the chunks, all of the redirection burden is relieved (both for the API and the client itself).


		### Zarr chunk storage

		Zarr chunks will continue to be stored in the DANDI S3 bucket as normal (in particular, enabling S3 to continue serving as a Zarr backend for third-party applications). Whenever changes are made to a Zarr and it is then finalized, an additional step will occur to record the current makeup of the Zarr chunks as an immutable set of database rows mapping each chunk’s path to its versioned S3 object. As the Zarr continues to mutate, new such immutable snapshots will be produced; the latest such snapshot will become a permanent part of any published Dandiset. Garbage collection routines will work to clean up unreachable versions of Zarrs. As an example, consider a Zarr with the following chunks stored in S3 (listed with a notional version ID in parentheses):

Conversation

waxlamp commented Feb 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

satra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yarikoptic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yarikoptic Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

waxlamp Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

waxlamp commented Mar 2, 2026

Uh oh!

yarikoptic Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

yarikoptic Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

yarikoptic Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yarikoptic Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yarikoptic Mar 3, 2026 •

edited

Loading