Adding explicit support for blosc but keeping deflate the default#747
Adding explicit support for blosc but keeping deflate the default#747
Conversation
| if ( | ||
| PYNX_ENABLE_BLOSC | ||
| and importlib.util.find_spec("hdf5plugin") is not None | ||
| and importlib.util.find_spec("blosc2") is not None | ||
| ): |
There was a problem hiding this comment.
Either we add these two dependencies to optional in the pyproject.toml then we have a check for them here. Or we keep the pyproject with these as mandatory dependencies and skip the check.
| NTHREADS_BLOSC = blosc2.set_nthreads(max(int(os.cpu_count() / 2), 1)) | ||
| # do not oversubscribe to use hyperthreading cores |
There was a problem hiding this comment.
Why is this necessary? Aren't there any sane defaults in the hdf5lib already?
| if compression_filter == "gzip": | ||
| compression_config = dict( | ||
| compression=compression_filter, | ||
| compression_opts=compression_strength, | ||
| ) | ||
| else: # by virtue of construction blosc | ||
| compression_config = hdf5plugin.Blosc2(cname="zstd", clevel=9) | ||
| grp.create_dataset( | ||
| entry_name, | ||
| data=data["compress"], | ||
| compression=compression_filter, | ||
| chunks=chunking_strategy(data), | ||
| compression_opts=compression_strength, | ||
| **compression_config, |
There was a problem hiding this comment.
We can have one function that deals with our compression needs instead of if statements here.
| # use only when it is acceptable to work with blosc2-compressed content downstreams | ||
| # mind that doing so in C/C++, Matlab, and Fortran application requires specific | ||
| # linking of these apps with a customized HDF5 library that links to the blosc library | ||
| # consider that using blosc sets explicit a certain number of cores eligible for | ||
| # doing compression and decompression work that may drain resources when pynxtools | ||
| # is used in conjunction with other apps and services like NOMAD | ||
| # check the set_nthreads in writer.py to modify accordingly for your best practice |
There was a problem hiding this comment.
Too much of an in code comment. Let's simplify and use some defaults or add this to a more accessible README.md section/link to another .md and/or to the docs.
|
Can you also add a small text in this PR even for now of what changes this introduces to the way the user interacts with this? I believe the writer.py now expects a "filter" key in the Template object. It will be nice to know the "user interface" changes in the PR. Thanks for introducing this. I hope it makes it easier for the large datasets we run into. |
Uh oh!
There was an error while loading. Please reload this page.