Skip to content

Conversation

@YooSunYoung
Copy link
Member

This is a workflow for actual nexus files collected with our filewriter.

It also has a helper to ignore empty logs but I can't guarantee that it always works.
for example, if any of the transformation is empty and one of transformation is time dependent, then the whole transformation is time dependent, therefore we can't really decide the rotation matrix.

However, it's little likely to happen.

@YooSunYoung YooSunYoung requested a review from aaronfinke June 19, 2025 11:12
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronfinke this is the whole workflow for the real nmx nexus files.

You can use similar command line like mcstas workflow after installing this branch with pip:

pip install -e .
essnmx-reduce \
  --input_file docs/user-guide/977695_00052796.hdf \
  --output_file scipp_out.h5 \
  --verbose \
  --detector_ids 0 \
  --chunk_size 1_000

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't use sciline in this case or our generic workflow since it's a very simple workflow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting the following error when trying to process one of Doro's files:

Traceback (most recent call last):
  File "/Users/aaronfinke/miniforge3/envs/scipp/bin/essnmx-reduce", line 8, in <module>
    sys.exit(main())
             ~~~~^^
  File "/Users/aaronfinke/essnmx/src/ess/nmx/executables.py", line 387, in main
    reduction(
    ~~~~~~~~~^
        input_file=input_file,
        ^^^^^^^^^^^^^^^^^^^^^^
    ...<8 lines>...
        logger=logger,
        ^^^^^^^^^^^^^^
    )
    ^
  File "/Users/aaronfinke/essnmx/src/ess/nmx/executables.py", line 287, in reduction
    toa_bin_edges = toa_bin_edges.to(unit=da.bins.coords['event_time_offset'].unit)
  File "/Users/aaronfinke/miniforge3/envs/scipp/lib/python3.13/site-packages/scipp/core/operations.py", line 293, in to
    raise ValueError("Must provide dtype or unit or both")
ValueError: Must provide dtype or unit or both

Copy link
Member Author

@YooSunYoung YooSunYoung Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronfinke
Hmm... strange... I tried all three files but it was okay.

I did find some bugs so I updated this branch yesterday though. Did you pull the changes?

Also, if you need to quickly have the 2d plot, here is the code snippet:

%matplotlib widget

import plopp as pp
import scippnexus as snx

with snx.File('/Users/sunyoungyoo/ESS/essnmx/testfile-20250625-192510.hdf') as f:
    dg = f['/entry/instrument/detector_panel_0/'][()]
    display(dg)
    display(dg['data'])
    da_2d = dg['data'].hist()
    da_3d = dg['data'].hist(event_time_offset=50)


display(pp.plot(da_2d, norm='log', title="2D Histogram of Detector Data"))
pp.slicer(
    da_3d,
    title="3D Histogram of Detector Data",
    norm='log',
    keep=['x_pixel_offset', 'y_pixel_offset'],
)

Copy link
Member

@nvaytet nvaytet Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the error about toa_bin_edges.to(unit=da.bins.coords['event_time_offset'].unit) sounds like the da.bins.coords['event_time_offset'].unit is just returning None, which is why the .to(unit=...) is failing with that error.
My guess is that it's because you have a slightly outdated version of Scipp, where if you have binned data, you may have to do da.bins.coords['event_time_offset'].bins.unit instead (note the extra .bins before the last .unit).

This was changed in 25.03.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry I should have mentioned this in this tread.

I already added the lower pin: #146 (comment)

)


def _fallback_compute_positions(dg: sc.DataGroup) -> sc.DataGroup:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the hack we need to handle the empty nxlogs.

)


def calculate_number_of_chunks(detector_gr: snx.Group, *, chunk_size: int = 0) -> int:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check how the filewriter is writing to HDF5. If it is writing data in chunks already, we should just use that. It's more efficient

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we talked about implementing iter_chunk on scippnexus with our team.
So I'll apply it here once it's done.

Either way, we'll allow both since even the chunk might not fit in the memory for some reason...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the handling of chunks be approached via the StreamProcessor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mcstas executables uses StreamProcessor indeed but it's little likely that we need to process the real measurement data chunk by chunk so there was very little motivation to wrap it into a StreamProcessor.

Also the workflow itself is much more simple compared to any other workflows we have. There is more of loss than benefit for using sciline for NMX. Making it even less motivating to use StreamProcessor...

@YooSunYoung
Copy link
Member Author

@aaronfinke I have tried this with the data that Doro gave me (/dmsc/scipp/nmx/cern_202506/testfile-20250625-192510.hdf)
and then found some bugs so fixed them...!

This one only has one detector so I didn't plot them in 3 D but indeed it'll be nice to be able to plot them in 3 D without Dials... So I'll prioritize #141 this one

@YooSunYoung
Copy link
Member Author

The error was because of this:

https://scipp.github.io/development/adr/0019-binned-data-unit-dtype-python-properties.html

I added the lower pin of scipp...

)


def calculate_number_of_chunks(detector_gr: snx.Group, *, chunk_size: int = 0) -> int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the handling of chunks be approached via the StreamProcessor?

)


def main() -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach using executables, as well as not using Sciline, is quite different from the other ess* projects.

I'm not against having executables to make reduction available in an additional manner, but this sounds like it could be applicable to other workflows as well? Should we try to generalize in essreduce?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NMX users will not be expected to use/know python, and Jupyter notebooks are unnecessary. Indeed the use case is quite different from other instruments at ESS. Executables are far simpler to use with the NMX pipeline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to generalize in essreduce?

Or maybe copier template...?

Comment on lines 141 to 154
def reduction(
*,
input_file: pathlib.Path,
output_file: pathlib.Path,
chunk_size: int = 1_000,
detector_ids: list[int | str],
compression: bool = False,
logger: logging.Logger | None = None,
min_toa: sc.Variable | int = 0,
max_toa: sc.Variable | int = int((1 / 14) * 1_000), # Default for ESS NMX
toa_bin_edges: sc.Variable | int = 250,
fast_axis: Literal['x', 'y'] | None = None, # 'x', 'y', or None to auto-detect
display: Callable | None = None, # For Jupyter notebook display
) -> sc.DataGroup:
Copy link
Member

@SimonHeybrock SimonHeybrock Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not a request to change anything here now, just a thought/note for the future:

In scipp/esslivedata#404 I moved to using Pydantic for defining workflow parameters. As this allows for including bounds/validation, titles, and descriptions it may be a strategy we should explore more widely in the future, including when making scripts like this. It might make it simpler to document and to auto-setup an argparser, for example.

)
input_arg_group = parser.add_argument_group("Input Options")
input_arg_group.add_argument(
"--input_file", type=str, help="Path to the input file", required=True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you ever want to process several files in one go? Say using wildcards like essnmx-reduce --input_file docs/user-guide/977695_*.hdf

Maybe we can just leave that for a later iteration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point...
We will want to automate the reduction for NMX so I only thought of a single file input but we should definately support multiple files as input. I opened an issue #150

input_arg_group.add_argument(
"--nbins",
type=int,
default=51,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

51 seems like an odd number. Is it 51 bin edges for 50 bins?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.... it looks like mcstas executable was interpreting it as a number of bin-edges instead of bins.
I updated it to be number of bins and updated the default to be 50.

"--compression",
type=int,
default=1,
choices=[0, 1],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have True or False instead of ints? Or just a boolean flag, if --compression is not specified then no compression is applied?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No... because we want to allow one more compression option that uses different type of compression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then should it be a string? Something more descriptive than an integer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the fix had to be more invasive than I expected so I opened a separate issue that fixes the problem... #151

return parser


def build_logger(args: argparse.Namespace) -> logging.Logger:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is build refering to here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • instantiate the Logger (or retrieve the existing logger)
  • configure the logger according to the command line argument.

however, then the whole range of `event_time_offset` should have been histogrammed
so that the unwrapping can be applied.
i.e. `min_toa` should be 0 and `max_toa` should be 1/14 seconds
for 14 Hz pulse frequency.
Copy link
Member

@nvaytet nvaytet Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I get why we don't want to use the GenericTofWorkflow. The flight path is 158m long for NMX. The wavelength range is 2-10 A. This means that the range of arrival times at 158m is about 80-400ms, which spans several pulse periods and is not contained within 0-71ms. Depending on which wavelength range is selected for a given run, one may have to add 1 or more pulse periods to the event_time_offset.
So unwrapping will be indispensable?

I think we should do what we did in Odin: make a Nexus file out of mcstas data (apparently Aaron has done some work in getting good sampling for just this task?) and make a workflow that works on that. This way, we don't have to maintain a mcstas and a nexus reduction workflow.

Unless I am missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use GenericTofWorkflow but it was just not easy to debug when something goes wrong, compared to when it's just simple flat script. GenericTofWorkflow has extra things that we don't need for NMX. i.e. we need to save event-time-offset coordinates, not the time-of-flight or wavelength.
And as far as I understood, NMX will chop the neutron beam so that it fits all in one pulse period, within single frame, effectively making it similart to short-pulse.

But maybe we do need wavelength... so in that case we should just use the GenericTofWorkflow.
A minimum viable product was needed for the ILL test and one of the requirements was that Aaron should be able to fix it himself quickly.

I think we should do what we did in Odin: make a Nexus file out of mcstas data (apparently Aaron has done some work in getting good sampling for just this task?) and make a workflow that works on that. This way, we don't have to maintain a mcstas and a nexus reduction workflow.

Yeeees we should...! That's what Aaron is working on : D ... The mcstas workflow was written a loooong time ago... when we were still hoping that mcstas just writes a nexus file ....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do what we did in Odin: make a Nexus file out of mcstas data (apparently Aaron has done some work in getting good sampling for just this task?) and make a workflow that works on that. This way, we don't have to maintain a mcstas and a nexus reduction workflow.

That would be ideal, and I have a nice workflow for doing the sampling of McStas NMX event data. It is almost complete.

@YooSunYoung YooSunYoung requested a review from nvaytet October 20, 2025 13:59
We also do not apply frame unwrapping or pulse skipping here,
as it is not expected from NMX experiments.
Frame unwrapping may be applied later on the result of this function if needed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to defer the time-of-flight calculation to a later PR, we should at least amend this comment to say that we still need to implement tof computation.
If not, this will just get forgotten.

Copy link
Member Author

@YooSunYoung YooSunYoung Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the comment here and opened an issue about this #152

)


def main() -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a few tests that run the executable on some files and try a couple of things (like arguments are correctly passed on to the reduction etc..)?

Copy link
Member Author

@YooSunYoung YooSunYoung Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvaytet I added a unit test that runs the command line and then check if the output reflects the input argument. I only added very minimum checking as we are expecting some updates to the interface later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants