Skip to content

Enable sharding for Zarr files for external aerodynamics pipeline#41

Merged
saikrishnanc-nv merged 4 commits intoNVIDIA:mainfrom
saikrishnanc-nv:saikrishnanc/sharding
Dec 3, 2025
Merged

Enable sharding for Zarr files for external aerodynamics pipeline#41
saikrishnanc-nv merged 4 commits intoNVIDIA:mainfrom
saikrishnanc-nv:saikrishnanc/sharding

Conversation

@saikrishnanc-nv
Copy link
Contributor

This PR enables sharding, for Zarr files produced by the external aerodynamics pipeline.
This follows Zarr docs, and roughly creates ~1 GB shards, each of which contain ~1000 chunks each of which are ~1 MB in size.
This is being done to reduce number of files for large files (volume files for example), while maintaining fast random access (because of chunking).

Tests are also being added.

@saikrishnanc-nv
Copy link
Contributor Author

/blossom-ci

Copy link
Collaborator

@coreyjadams coreyjadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I have an IO benchmarking script for DrivaerML, we could do some performance tests on the sharding configuration if you want. But it's impossible to hit it perfectly in all cases. Since chunk size and chunks per shard is configurable, these are good defaults.

@saikrishnanc-nv saikrishnanc-nv merged commit 8465c46 into NVIDIA:main Dec 3, 2025
1 check passed
@saikrishnanc-nv saikrishnanc-nv deleted the saikrishnanc/sharding branch December 3, 2025 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants