Reduction workflow for ess nmx files #146

YooSunYoung · 2025-06-19T11:11:58Z

This is a workflow for actual nexus files collected with our filewriter.

It also has a helper to ignore empty logs but I can't guarantee that it always works.
for example, if any of the transformation is empty and one of transformation is time dependent, then the whole transformation is time dependent, therefore we can't really decide the rotation matrix.

However, it's little likely to happen.

YooSunYoung · 2025-06-19T11:14:25Z

src/ess/nmx/executables.py

@aaronfinke this is the whole workflow for the real nmx nexus files.

You can use similar command line like mcstas workflow after installing this branch with pip:

pip install -e .

essnmx-reduce \ --input_file docs/user-guide/977695_00052796.hdf \ --output_file scipp_out.h5 \ --verbose \ --detector_ids 0 \ --chunk_size 1_000

I didn't use sciline in this case or our generic workflow since it's a very simple workflow.

Getting the following error when trying to process one of Doro's files:

Traceback (most recent call last): File "/Users/aaronfinke/miniforge3/envs/scipp/bin/essnmx-reduce", line 8, in <module> sys.exit(main()) ~~~~^^ File "/Users/aaronfinke/essnmx/src/ess/nmx/executables.py", line 387, in main reduction( ~~~~~~~~~^ input_file=input_file, ^^^^^^^^^^^^^^^^^^^^^^ ...<8 lines>... logger=logger, ^^^^^^^^^^^^^^ ) ^ File "/Users/aaronfinke/essnmx/src/ess/nmx/executables.py", line 287, in reduction toa_bin_edges = toa_bin_edges.to(unit=da.bins.coords['event_time_offset'].unit) File "/Users/aaronfinke/miniforge3/envs/scipp/lib/python3.13/site-packages/scipp/core/operations.py", line 293, in to raise ValueError("Must provide dtype or unit or both") ValueError: Must provide dtype or unit or both

@aaronfinke
Hmm... strange... I tried all three files but it was okay.

I did find some bugs so I updated this branch yesterday though. Did you pull the changes?

Also, if you need to quickly have the 2d plot, here is the code snippet:

%matplotlib widget import plopp as pp import scippnexus as snx with snx.File('/Users/sunyoungyoo/ESS/essnmx/testfile-20250625-192510.hdf') as f: dg = f['/entry/instrument/detector_panel_0/'][()] display(dg) display(dg['data']) da_2d = dg['data'].hist() da_3d = dg['data'].hist(event_time_offset=50) display(pp.plot(da_2d, norm='log', title="2D Histogram of Detector Data")) pp.slicer( da_3d, title="3D Histogram of Detector Data", norm='log', keep=['x_pixel_offset', 'y_pixel_offset'], )

FYI, the error about toa_bin_edges.to(unit=da.bins.coords['event_time_offset'].unit) sounds like the da.bins.coords['event_time_offset'].unit is just returning None, which is why the .to(unit=...) is failing with that error.
My guess is that it's because you have a slightly outdated version of Scipp, where if you have binned data, you may have to do da.bins.coords['event_time_offset'].bins.unit instead (note the extra .bins before the last .unit).

This was changed in 25.03.0

Yeah sorry I should have mentioned this in this tread.

I already added the lower pin: #146 (comment)

YooSunYoung · 2025-06-19T11:17:46Z

src/ess/nmx/nexus.py

 )


+def _fallback_compute_positions(dg: sc.DataGroup) -> sc.DataGroup:


This is the hack we need to handle the empty nxlogs.

aaronfinke · 2025-06-19T11:37:05Z

src/ess/nmx/executables.py

+    )
+
+
+def calculate_number_of_chunks(detector_gr: snx.Group, *, chunk_size: int = 0) -> int:


We need to check how the filewriter is writing to HDF5. If it is writing data in chunks already, we should just use that. It's more efficient

Yes we talked about implementing iter_chunk on scippnexus with our team.
So I'll apply it here once it's done.

Either way, we'll allow both since even the chunk might not fit in the memory for some reason...

Can the handling of chunks be approached via the StreamProcessor?

Mcstas executables uses StreamProcessor indeed but it's little likely that we need to process the real measurement data chunk by chunk so there was very little motivation to wrap it into a StreamProcessor.

Also the workflow itself is much more simple compared to any other workflows we have. There is more of loss than benefit for using sciline for NMX. Making it even less motivating to use StreamProcessor...

YooSunYoung · 2025-06-25T19:53:33Z

@aaronfinke I have tried this with the data that Doro gave me (/dmsc/scipp/nmx/cern_202506/testfile-20250625-192510.hdf)
and then found some bugs so fixed them...!

This one only has one detector so I didn't plot them in 3 D but indeed it'll be nice to be able to plot them in 3 D without Dials... So I'll prioritize #141 this one

YooSunYoung · 2025-06-27T09:30:40Z

The error was because of this:

https://scipp.github.io/development/adr/0019-binned-data-unit-dtype-python-properties.html

I added the lower pin of scipp...

nvaytet · 2025-07-09T11:38:31Z

src/ess/nmx/executables.py

+    )
+
+
+def calculate_number_of_chunks(detector_gr: snx.Group, *, chunk_size: int = 0) -> int:


Can the handling of chunks be approached via the StreamProcessor?

nvaytet · 2025-07-09T11:56:16Z

src/ess/nmx/executables.py

+    )
+
+
+def main() -> None:


The approach using executables, as well as not using Sciline, is quite different from the other ess* projects.

I'm not against having executables to make reduction available in an additional manner, but this sounds like it could be applicable to other workflows as well? Should we try to generalize in essreduce?

NMX users will not be expected to use/know python, and Jupyter notebooks are unnecessary. Indeed the use case is quite different from other instruments at ESS. Executables are far simpler to use with the NMX pipeline.

Should we try to generalize in essreduce?

Or maybe copier template...?

SimonHeybrock · 2025-07-11T06:36:31Z

src/ess/nmx/executables.py

+def reduction(
+    *,
+    input_file: pathlib.Path,
+    output_file: pathlib.Path,
+    chunk_size: int = 1_000,
+    detector_ids: list[int | str],
+    compression: bool = False,
+    logger: logging.Logger | None = None,
+    min_toa: sc.Variable | int = 0,
+    max_toa: sc.Variable | int = int((1 / 14) * 1_000),  # Default for ESS NMX
+    toa_bin_edges: sc.Variable | int = 250,
+    fast_axis: Literal['x', 'y'] | None = None,  # 'x', 'y', or None to auto-detect
+    display: Callable | None = None,  # For Jupyter notebook display
+) -> sc.DataGroup:


Definitely not a request to change anything here now, just a thought/note for the future:

In scipp/esslivedata#404 I moved to using Pydantic for defining workflow parameters. As this allows for including bounds/validation, titles, and descriptions it may be a strategy we should explore more widely in the future, including when making scripts like this. It might make it simpler to document and to auto-setup an argparser, for example.

…reduction.

…t type

nvaytet · 2025-09-23T11:08:07Z

src/ess/nmx/_executable_helper.py

+    )
+    input_arg_group = parser.add_argument_group("Input Options")
+    input_arg_group.add_argument(
+        "--input_file", type=str, help="Path to the input file", required=True


Will you ever want to process several files in one go? Say using wildcards like essnmx-reduce --input_file docs/user-guide/977695_*.hdf

Maybe we can just leave that for a later iteration.

Yeah good point...
We will want to automate the reduction for NMX so I only thought of a single file input but we should definately support multiple files as input. I opened an issue #150

nvaytet · 2025-09-23T11:08:55Z

src/ess/nmx/_executable_helper.py

+    input_arg_group.add_argument(
+        "--nbins",
+        type=int,
+        default=51,


51 seems like an odd number. Is it 51 bin edges for 50 bins?

Yes.... it looks like mcstas executable was interpreting it as a number of bin-edges instead of bins.
I updated it to be number of bins and updated the default to be 50.

nvaytet · 2025-09-23T11:10:21Z

src/ess/nmx/_executable_helper.py

+        "--compression",
+        type=int,
+        default=1,
+        choices=[0, 1],


Can we have True or False instead of ints? Or just a boolean flag, if --compression is not specified then no compression is applied?

No... because we want to allow one more compression option that uses different type of compression.

Then should it be a string? Something more descriptive than an integer?

And the fix had to be more invasive than I expected so I opened a separate issue that fixes the problem... #151

nvaytet · 2025-09-23T11:13:10Z

src/ess/nmx/_executable_helper.py

+    return parser
+
+
+def build_logger(args: argparse.Namespace) -> logging.Logger:


What is build refering to here?

instantiate the Logger (or retrieve the existing logger)

configure the logger according to the command line argument.

nvaytet · 2025-09-23T11:34:16Z

src/ess/nmx/executables.py

+    however, then the whole range of `event_time_offset` should have been histogrammed
+    so that the unwrapping can be applied.
+    i.e. `min_toa` should be 0 and `max_toa` should be 1/14 seconds
+    for 14 Hz pulse frequency.


I'm not sure I get why we don't want to use the GenericTofWorkflow. The flight path is 158m long for NMX. The wavelength range is 2-10 A. This means that the range of arrival times at 158m is about 80-400ms, which spans several pulse periods and is not contained within 0-71ms. Depending on which wavelength range is selected for a given run, one may have to add 1 or more pulse periods to the event_time_offset.
So unwrapping will be indispensable?

I think we should do what we did in Odin: make a Nexus file out of mcstas data (apparently Aaron has done some work in getting good sampling for just this task?) and make a workflow that works on that. This way, we don't have to maintain a mcstas and a nexus reduction workflow.

Unless I am missing something?

We can use GenericTofWorkflow but it was just not easy to debug when something goes wrong, compared to when it's just simple flat script. GenericTofWorkflow has extra things that we don't need for NMX. i.e. we need to save event-time-offset coordinates, not the time-of-flight or wavelength.
And as far as I understood, NMX will chop the neutron beam so that it fits all in one pulse period, within single frame, effectively making it similart to short-pulse.

But maybe we do need wavelength... so in that case we should just use the GenericTofWorkflow.
A minimum viable product was needed for the ILL test and one of the requirements was that Aaron should be able to fix it himself quickly.

I think we should do what we did in Odin: make a Nexus file out of mcstas data (apparently Aaron has done some work in getting good sampling for just this task?) and make a workflow that works on that. This way, we don't have to maintain a mcstas and a nexus reduction workflow.

Yeeees we should...! That's what Aaron is working on : D ... The mcstas workflow was written a loooong time ago... when we were still hoping that mcstas just writes a nexus file ....

I think we should do what we did in Odin: make a Nexus file out of mcstas data (apparently Aaron has done some work in getting good sampling for just this task?) and make a workflow that works on that. This way, we don't have to maintain a mcstas and a nexus reduction workflow.

That would be ideal, and I have a nice workflow for doing the sampling of McStas NMX event data. It is almost complete.

nvaytet · 2025-10-21T11:45:13Z

src/ess/nmx/executables.py

+    We also do not apply frame unwrapping or pulse skipping here,
+    as it is not expected from NMX experiments.
+
+    Frame unwrapping may be applied later on the result of this function if needed


If you want to defer the time-of-flight calculation to a later PR, we should at least amend this comment to say that we still need to implement tof computation.
If not, this will just get forgotten.

I added the comment here and opened an issue about this #152

src/ess/nmx/executables.py

nvaytet · 2025-10-21T13:10:35Z

src/ess/nmx/mcstas/executables.py

    )

+
+def main() -> None:


Can you add a few tests that run the executable on some files and try a couple of things (like arguments are correctly passed on to the reduction etc..)?

@nvaytet I added a unit test that runs the command line and then check if the output reflects the input argument. I only added very minimum checking as we are expecting some updates to the interface later.

YooSunYoung requested a review from aaronfinke June 19, 2025 11:12

YooSunYoung commented Jun 19, 2025

View reviewed changes

aaronfinke reviewed Jun 19, 2025

View reviewed changes

nvaytet reviewed Jul 9, 2025

View reviewed changes

SimonHeybrock reviewed Jul 11, 2025

View reviewed changes

YooSunYoung force-pushed the real-nexus branch from edaeeb1 to d90151c Compare July 31, 2025 07:55

YooSunYoung requested a review from nvaytet August 5, 2025 13:15

YooSunYoung and others added 15 commits September 19, 2025 09:35

Extract common part of command line helper.

cb9d38c

Extract chunk size argument since it means different numbers in real …

f081a21

…reduction.

Add fallback compute position helper.

f4d2481

Update fallback compute position function.

3a3e99f

Handle already folded data.

a503874

Add executable module for real data.

08c6de1

Return all result at the end.

0e250c0

Remove unecessary comment [skip ci]

614824d

Allow setting time bin edge from arguments.

0419fa2

Add docstring [skip ci]

7037f79

Fix fast axis decision function and update compression option argumen…

9750e3e

…t type

Remove unecessary copy and add logging.

0e579ec

Hardcode event time offset unit and make chunk size default to -1

b9fa93e

Apply scipp lower pin.

cba5343

fixed unit issue on even_time_offset

1051032

YooSunYoung force-pushed the real-nexus branch from 92a6950 to 1051032 Compare September 19, 2025 07:35

nvaytet reviewed Sep 23, 2025

View reviewed changes

YooSunYoung mentioned this pull request Sep 23, 2025

Allow regex for input files #150

Open

nbins to be a number of bins instead of number of bin-edges [skip cip]

825043e

YooSunYoung mentioned this pull request Sep 24, 2025

Explicitly select compression option in the data reduction call. #151

Open

3 tasks

YooSunYoung and others added 2 commits September 24, 2025 10:34

Use enum instead of integer.

a686978

"--compression" arg should be a string

145dcbe

YooSunYoung requested a review from nvaytet October 20, 2025 13:59

nvaytet reviewed Oct 21, 2025

View reviewed changes

YooSunYoung commented Oct 21, 2025

View reviewed changes

src/ess/nmx/executables.py Show resolved Hide resolved

Add comment about tof/wavelength

a3122a2

nvaytet reviewed Oct 21, 2025

View reviewed changes

YooSunYoung added 4 commits October 28, 2025 19:27

Add small nexus file for testing.

0471890

Add command line CLI testing.

1d3da4b

Add command line CLI testing.

20af912

Test with non-default values.

64dd2c6

YooSunYoung added the scipp-retreat-2025 label Oct 28, 2025

		)


		def _fallback_compute_positions(dg: sc.DataGroup) -> sc.DataGroup:

		)


		def calculate_number_of_chunks(detector_gr: snx.Group, *, chunk_size: int = 0) -> int:

		return parser


		def build_logger(args: argparse.Namespace) -> logging.Logger:

Uh oh!

Reduction workflow for ess nmx files #146

Are you sure you want to change the base?

Reduction workflow for ess nmx files #146

Uh oh!

Conversation

YooSunYoung commented Jun 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YooSunYoung Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nvaytet Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YooSunYoung commented Jun 25, 2025

Uh oh!

YooSunYoung commented Jun 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SimonHeybrock Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nvaytet Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YooSunYoung Jun 26, 2025 •

edited

Loading

nvaytet Jul 11, 2025 •

edited

Loading

SimonHeybrock Jul 11, 2025 •

edited

Loading

nvaytet Sep 23, 2025 •

edited

Loading

YooSunYoung Oct 21, 2025 •

edited

Loading

YooSunYoung Oct 28, 2025 •

edited

Loading