Skip to content

Making explicit chunk cache handling policies and enabling a custom autochunker with a fall back to h5py's build-in autochunker#741

Merged
mkuehbach merged 19 commits intomasterfrom
modifiable_hfive_chunking
Mar 17, 2026
Merged

Making explicit chunk cache handling policies and enabling a custom autochunker with a fall back to h5py's build-in autochunker#741
mkuehbach merged 19 commits intomasterfrom
modifiable_hfive_chunking

Conversation

@mkuehbach
Copy link
Collaborator

@mkuehbach mkuehbach commented Feb 12, 2026

Motivation:

  • Customized chunking for compressed storage at the dataset level to allow tailoring the chunk layout better when slicing directions have a priori known usage biases towards a specific direction, like in EM, APM and other high volume techniques
  • Making explicit the fact that internally chunking can be modified using buffer settings that take different effect on different hardware, like sequential or true parallel file systems. Currently configured with the default values that were hidden in the internals of the h5py library.

…nly those modifications pertaining to customized chunking related work, and adding the customize auto chunker code snippet from pynxtools-em and pynxtools-apm here, next step copy over the documentation from the refactoring_compression feature branch and add to that existent documentation details about what this feature branch adds as additional functionalities for customizing the chunking
@mkuehbach mkuehbach changed the title Making explicit chunk cache handling policies and enabling a custom autochunker with a fall back to h5py original autochunker Making explicit chunk cache handling policies and enabling a custom autochunker with a fall back to h5py's build-in autochunker Feb 12, 2026
Copy link
Collaborator

@RubelMozumder RubelMozumder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tests for it. Multiple tests would be helpful to get the error before any unwanted breaks in production.

@mkuehbach
Copy link
Collaborator Author

There are no tests for it. Multiple tests would be helpful to get the error before any unwanted breaks in production.

I could add a test_chunk.py probing the return values of some instances of custom auto chunker

@mkuehbach
Copy link
Collaborator Author

There are no tests for it. Multiple tests would be helpful to get the error before any unwanted breaks in production.

I could add a test_chunk.py probing the return values of some instances of custom auto chunker

Tests added

Copy link
Collaborator

@lukaspie lukaspie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good, just one additional comment. Thanks!

@mkuehbach mkuehbach merged commit 3fffd20 into master Mar 17, 2026
10 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants