Skip to content

Add versioned Zarr design doc#2702

Draft
waxlamp wants to merge 8 commits intomasterfrom
versioned-zarr-design-doc
Draft

Add versioned Zarr design doc#2702
waxlamp wants to merge 8 commits intomasterfrom
versioned-zarr-design-doc

Conversation

@waxlamp
Copy link
Copy Markdown
Member

@waxlamp waxlamp commented Feb 6, 2026

No description provided.

@waxlamp waxlamp added the design-doc Involves creating or discussing a design document label Feb 6, 2026
@waxlamp waxlamp marked this pull request as draft February 6, 2026 17:57
Co-authored-by: Kabilar Gunalan <kabi@mit.edu>
Copy link
Copy Markdown
Member

@satra satra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks reasonable. is the copy-on-write really future work or actually easier for most smaller zarrs that are similar in size to nwb files. (since nwb zarr is as much part of this equation as ome zarr). perhaps these two relevant zarr data should be added in the executive summary so we don't forget.

Copy link
Copy Markdown
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial incomplete pass comments


DANDI considers Zarrs to be a special type of asset, one that is not associated with an “asset blob” (i.e., a single file) but rather a specialized Zarr record that knows how to refer to an S3 prefix containing all of the chunks for that Zarr. Because Zarrs are large and complex, making a copy of the Zarr when it is updated (as is done for blob assets) is not feasible. This is essential to publishing a Dandiset, since a published version must contain an immutable set of assets (which may go on to be “edited” in copy-on-write fashion in future versions); as such, DANDI currently does not allow publishing of Zarr-bearing Dandisets.

This design offers a way of handling Zarrs that enables making lightweight snapshots of a Zarr Archive that are suitable for publishing.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think it might be worth noting that so far here there is nothing Zarr specific in our need from Zarrs for this design doc. It is rather to support a "folder container" (with multiple files in a hierarchy) as a single asset, as opposed to a single "file blob".

I think thinking of it this way could help to avoid "overfitting for Zarr", and potentially then later allow for other, non-Zarr, use cases demanding similar "directories" support.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Do you have examples of other directory-based storage formats that behave as a single entity?

Still, I think it's better to keep the Zarr-specific framing for this doc. We have just the Zarr type to worry about now, and it's a heavy enough lift to get it right that I think generalizing to "folder assets" and then respecializing to Zarr will get us into trouble.

But I can leave a note here (with examples of data formats) that the technology we're developing for Zarrs might generalize to "folder assets".

@waxlamp
Copy link
Copy Markdown
Member Author

waxlamp commented Mar 2, 2026

is the copy-on-write really future work or actually easier for most smaller zarrs that are similar in size to nwb files. (since nwb zarr is as much part of this equation as ome zarr).

This comes up naturally in the addenda I've put into the document in the last few days. I've added a commit (f177d6b) that mentions Zarr NWB explicitly.


Because versioned Zarrs are not usually available directly via S3 (since many of their chunks will be “buried” under more current versions of that object), maintaining manifest files for each version of a Zarr appearing in a published Dandiset maintains the structure of these Zarrs directly in S3 (similarly to the manifest files currently recording which assets belong to a published Dandiset). This preserves the integrity of the bucket itself as a standalone data store (independent of the DANDI API and web application), but also provides an optimization for clients.

Specifically, clients such as the DANDI CLI currently interact with Zarrs by asking the API for each chunk. For an operation such as “download a Zarr”, this can result in hundreds of requests per second to the API. However, if the CLI learned to first retrieve the manifest file, then issue requests directly to S3 to retrieve the chunks, all of the redirection burden is relieved (both for the API and the client itself).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be best to immediately identify properties or even a schema of the manifest file(s) and what metadata to be contained there in so that we could inform design of client in the scope of

There we reviewed the aspect of establishing storing a hierarchical checksum (so not only ETags on keys but ETags on folders), which apparently used to be dumped to S3 but that facility was removed in

since was not used yet. But in scope of RF dandi-cli and overall integrity checking I think it would be great to reintroduce that. Note that our currently implemented prototype of manifests (used for webdav) is including ETag per key but not per "folder":

an example dump of a manifest head
❯ curl --silent https://datasets.datalad.org/dandi/zarr-manifests/zarr-manifests-v2-sorted/001/e3b/001e3b6d-26fb-463f-af28-520a25680ab4/326273bcc8730474323a66ea4e3daa49-113328--97037755426.json | jq . | head -n 40
{
  "schemaVersion": 2,
  "fields": [
    "versionId",
    "lastModified",
    "size",
    "ETag"
  ],
  "statistics": {
    "entries": 113328,
    "depth": 5,
    "totalSize": 97037755426,
    "lastModified": "2022-04-23T23:08:58+00:00",
    "zarrChecksum": "326273bcc8730474323a66ea4e3daa49-113328--97037755426"
  },
  "entries": {
    ".zattrs": [
      "JKOglMKYg0dIr1ngNR_dguFZUioWE8MZ",
      "2022-04-23T23:08:58+00:00",
      7901,
      "b8421735714f196291810afd48aa012d"
    ],
    ".zgroup": [
      "joh9Hhu.uOtmHTewXyAXbI4mzi8AKjcb",
      "2022-03-16T02:39:41+00:00",
      24,
      "e20297935e73dd0154104d4ea53040ab"
    ],
    ".zmetadata": [
      "lFCKAw3nHHT66S14.hEaGXm9sBL9eYvW",
      "2022-04-23T23:08:58+00:00",
      14970,
      "358d9e012ae0154f66d5e6d73ced977c"
    ],
    "0": {
      ".zarray": [
        "I5ucOkyHjZcqQJhurzfi307VmxSSkm1M",
        "2022-03-16T02:39:43+00:00",
        449,
        "a45bc329195c9d6fea8cdd9a46562771"


### Zarr chunk storage

Zarr chunks will continue to be stored in the DANDI S3 bucket as normal (in particular, enabling S3 to continue serving as a Zarr backend for third-party applications). Whenever changes are made to a Zarr and it is then finalized, an additional step will occur to record the current makeup of the Zarr chunks as an immutable set of database rows mapping each chunk’s path to its versioned S3 object. As the Zarr continues to mutate, new such immutable snapshots will be produced; the latest such snapshot will become a permanent part of any published Dandiset. Garbage collection routines will work to clean up unreachable versions of Zarrs. As an example, consider a Zarr with the following chunks stored in S3 (listed with a notional version ID in parentheses):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so is GC part of this design or not -- should be made explicit


### Zarr chunk storage

Zarr chunks will continue to be stored in the DANDI S3 bucket as normal (in particular, enabling S3 to continue serving as a Zarr backend for third-party applications). Whenever changes are made to a Zarr and it is then finalized, an additional step will occur to record the current makeup of the Zarr chunks as an immutable set of database rows mapping each chunk’s path to its versioned S3 object. As the Zarr continues to mutate, new such immutable snapshots will be produced; the latest such snapshot will become a permanent part of any published Dandiset. Garbage collection routines will work to clean up unreachable versions of Zarrs. As an example, consider a Zarr with the following chunks stored in S3 (listed with a notional version ID in parentheses):
Copy link
Copy Markdown
Member

@yarikoptic yarikoptic Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... it is then finalized...

  • is that via POST /zarr/{id}/finalize/ client has to do? needs to be clarified what "then finalized" means here exactly

    • note that it is where having checksums for folders within zarr manifests could speed up full finalization on small patch fixes tremendously
  • what happens if never called? (e.g. could be Finalize "Pending" zarr if all upload URLs must have timed out by now #2051)

2. **What are the performance characteristics of the DANDI API Zarr backend (compared to the S3 Zarr backend) and the manifest-driven access scheme?** If clients do not see a significant performance loss from using the DANDI Zarr backend (or if specialized clients can use manifest-driven access logic), then maintaining an S3 backend for every Zarr (at the costs described above) becomes less attractive for the overall system.
3. **What are the costs and benefits of the S3 materialization design?** The S3 materialization design attempt to bridge between the current proposal and this alternative proposal by still providing an avenue to hosting Zarrs in an S3 backend-compatible way.

### Data Integrity Issues with S3
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have s3://zarr-checksums which according to claude cover ~72% of zarrs in the archive (I didn't check). but the point is that it should be noted here if we keep them, refactor, remove, or what?

@kabilar kabilar self-requested a review March 26, 2026 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

design-doc Involves creating or discussing a design document

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants