Adding explicit support for blosc but keeping deflate the default by mkuehbach · Pull Request #747 · FAIRmat-NFDI/pynxtools

mkuehbach · 2026-03-17T13:02:34Z

Adding whats required to use blosc2 but deactivated
A comment to the compression md in the learning section with instruction on how to activate would be good to add

sherjeelshabih · 2026-03-18T09:37:39Z

src/pynxtools/dataconverter/writer.py

+if (
+    PYNX_ENABLE_BLOSC
+    and importlib.util.find_spec("hdf5plugin") is not None
+    and importlib.util.find_spec("blosc2") is not None
+):


Either we add these two dependencies to optional in the pyproject.toml then we have a check for them here. Or we keep the pyproject with these as mandatory dependencies and skip the check.

sherjeelshabih · 2026-03-18T09:38:51Z

src/pynxtools/dataconverter/writer.py

+    NTHREADS_BLOSC = blosc2.set_nthreads(max(int(os.cpu_count() / 2), 1))
+    # do not oversubscribe to use hyperthreading cores


Why is this necessary? Aren't there any sane defaults in the hdf5lib already?

sherjeelshabih · 2026-03-18T09:39:53Z

src/pynxtools/dataconverter/writer.py

+                    if compression_filter == "gzip":
+                        compression_config = dict(
+                            compression=compression_filter,
+                            compression_opts=compression_strength,
+                        )
+                    else:  # by virtue of construction blosc
+                        compression_config = hdf5plugin.Blosc2(cname="zstd", clevel=9)
                    grp.create_dataset(
                        entry_name,
                        data=data["compress"],
                        compression=compression_filter,
                        chunks=chunking_strategy(data),
-                        compression_opts=compression_strength,
+                        **compression_config,


We can have one function that deals with our compression needs instead of if statements here.

sherjeelshabih · 2026-03-18T09:43:45Z

src/pynxtools/dataconverter/chunk.py

+# use only when it is acceptable to work with blosc2-compressed content downstreams
+# mind that doing so in C/C++, Matlab, and Fortran application requires specific
+# linking of these apps with a customized HDF5 library that links to the blosc library
+# consider that using blosc sets explicit a certain number of cores eligible for
+# doing compression and decompression work that may drain resources when pynxtools
+# is used in conjunction with other apps and services like NOMAD
+# check the set_nthreads in writer.py to modify accordingly for your best practice


Too much of an in code comment. Let's simplify and use some defaults or add this to a more accessible README.md section/link to another .md and/or to the docs.

sherjeelshabih · 2026-03-18T09:46:40Z

Can you also add a small text in this PR even for now of what changes this introduces to the way the user interacts with this? I believe the writer.py now expects a "filter" key in the Template object. It will be nice to know the "user interface" changes in the PR.

Thanks for introducing this. I hope it makes it easier for the large datasets we run into.

atomprobe-tc added 2 commits March 17, 2026 14:01

carried over from NXapm run-through

d4daf41

linting

32439fa

mkuehbach changed the title ~~carried over from NXapm run-through~~ Adding explicit support for blosc but keeping deflate the default Mar 17, 2026

atomprobe-tc added 2 commits March 17, 2026 14:31

invert logic

9d14e6c

linting

4b4d629

mkuehbach requested review from lukaspie and sherjeelshabih March 17, 2026 13:40

lint testing

b43a060

sherjeelshabih requested changes Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding explicit support for blosc but keeping deflate the default#747

Adding explicit support for blosc but keeping deflate the default#747
mkuehbach wants to merge 5 commits intomasterfrom
add_blosc_but_keep_deflate_the_default

mkuehbach commented Mar 17, 2026 •

edited

Loading

Uh oh!

sherjeelshabih Mar 18, 2026

Uh oh!

sherjeelshabih Mar 18, 2026

Uh oh!

sherjeelshabih Mar 18, 2026

Uh oh!

sherjeelshabih Mar 18, 2026

Uh oh!

sherjeelshabih commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		NTHREADS_BLOSC = blosc2.set_nthreads(max(int(os.cpu_count() / 2), 1))
		# do not oversubscribe to use hyperthreading cores

Conversation

mkuehbach commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sherjeelshabih Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

sherjeelshabih Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

sherjeelshabih Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

sherjeelshabih Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

sherjeelshabih commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mkuehbach commented Mar 17, 2026 •

edited

Loading