Skip to content

Support multi-part uploads for Zarrs #2758

@yarikoptic

Description

@yarikoptic

Prompted by

ATM we use single part "put_object" for zarrs:

92 class ZarrViewSet(ReadOnlyModelViewSet):
283 def create_files(self, request, zarr_id):
286 with transaction.atomic():
301 urls = [
302: zarr_archive.storage.generate_presigned_put_object_url(

The immediate use case of LINC project (attn @kabilar @satra) apparently has multi-GB files within zarrs!

Max single-part upload size is 5GB and there dandi-cli had succeeded with 2.6GB upload and potentially others were even larger (apparently we do not depict error properly there if that was the case).

In any case -- with v3 sharding it is logical to expect large files within zarr "filesets". Hence we need to implement support for multi-part uploads, and likely as consistent with regular assets uploads as possible/feasible.

Components affected and requiring analysis/changes:

  • dandi-archive --
  • dandi-cli --
  • backups2datalad -- likely we use for minting keys and/or checks
  • dandidav -- "not sure" -- it might in particular in case of manifests and checksums/ETags there
  • Add versioned Zarr design doc #2702 -- same as per dandidav
  • ...?

Also worth checking aiming forward:

  • what are the file sizes within nwb-zarr's AIND works with (@alejoe91 ?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    zarrIssues with Zarr hosting/processing/etc.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions