-
Couldn't load subscription status.
- Fork 1
Reduction workflow for ess nmx files #146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronfinke this is the whole workflow for the real nmx nexus files.
You can use similar command line like mcstas workflow after installing this branch with pip:
pip install -e .essnmx-reduce \
--input_file docs/user-guide/977695_00052796.hdf \
--output_file scipp_out.h5 \
--verbose \
--detector_ids 0 \
--chunk_size 1_000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't use sciline in this case or our generic workflow since it's a very simple workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting the following error when trying to process one of Doro's files:
Traceback (most recent call last):
File "/Users/aaronfinke/miniforge3/envs/scipp/bin/essnmx-reduce", line 8, in <module>
sys.exit(main())
~~~~^^
File "/Users/aaronfinke/essnmx/src/ess/nmx/executables.py", line 387, in main
reduction(
~~~~~~~~~^
input_file=input_file,
^^^^^^^^^^^^^^^^^^^^^^
...<8 lines>...
logger=logger,
^^^^^^^^^^^^^^
)
^
File "/Users/aaronfinke/essnmx/src/ess/nmx/executables.py", line 287, in reduction
toa_bin_edges = toa_bin_edges.to(unit=da.bins.coords['event_time_offset'].unit)
File "/Users/aaronfinke/miniforge3/envs/scipp/lib/python3.13/site-packages/scipp/core/operations.py", line 293, in to
raise ValueError("Must provide dtype or unit or both")
ValueError: Must provide dtype or unit or both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronfinke
Hmm... strange... I tried all three files but it was okay.
I did find some bugs so I updated this branch yesterday though. Did you pull the changes?
Also, if you need to quickly have the 2d plot, here is the code snippet:
%matplotlib widget
import plopp as pp
import scippnexus as snx
with snx.File('/Users/sunyoungyoo/ESS/essnmx/testfile-20250625-192510.hdf') as f:
dg = f['/entry/instrument/detector_panel_0/'][()]
display(dg)
display(dg['data'])
da_2d = dg['data'].hist()
da_3d = dg['data'].hist(event_time_offset=50)
display(pp.plot(da_2d, norm='log', title="2D Histogram of Detector Data"))
pp.slicer(
da_3d,
title="3D Histogram of Detector Data",
norm='log',
keep=['x_pixel_offset', 'y_pixel_offset'],
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, the error about toa_bin_edges.to(unit=da.bins.coords['event_time_offset'].unit) sounds like the da.bins.coords['event_time_offset'].unit is just returning None, which is why the .to(unit=...) is failing with that error.
My guess is that it's because you have a slightly outdated version of Scipp, where if you have binned data, you may have to do da.bins.coords['event_time_offset'].bins.unit instead (note the extra .bins before the last .unit).
This was changed in 25.03.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry I should have mentioned this in this tread.
I already added the lower pin: #146 (comment)
| ) | ||
|
|
||
|
|
||
| def _fallback_compute_positions(dg: sc.DataGroup) -> sc.DataGroup: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the hack we need to handle the empty nxlogs.
| ) | ||
|
|
||
|
|
||
| def calculate_number_of_chunks(detector_gr: snx.Group, *, chunk_size: int = 0) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check how the filewriter is writing to HDF5. If it is writing data in chunks already, we should just use that. It's more efficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we talked about implementing iter_chunk on scippnexus with our team.
So I'll apply it here once it's done.
Either way, we'll allow both since even the chunk might not fit in the memory for some reason...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the handling of chunks be approached via the StreamProcessor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mcstas executables uses StreamProcessor indeed but it's little likely that we need to process the real measurement data chunk by chunk so there was very little motivation to wrap it into a StreamProcessor.
Also the workflow itself is much more simple compared to any other workflows we have. There is more of loss than benefit for using sciline for NMX. Making it even less motivating to use StreamProcessor...
|
@aaronfinke I have tried this with the data that Doro gave me (/dmsc/scipp/nmx/cern_202506/testfile-20250625-192510.hdf) This one only has one detector so I didn't plot them in 3 D but indeed it'll be nice to be able to plot them in 3 D without Dials... So I'll prioritize #141 this one |
|
The error was because of this: https://scipp.github.io/development/adr/0019-binned-data-unit-dtype-python-properties.html I added the lower pin of scipp... |
| ) | ||
|
|
||
|
|
||
| def calculate_number_of_chunks(detector_gr: snx.Group, *, chunk_size: int = 0) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the handling of chunks be approached via the StreamProcessor?
| ) | ||
|
|
||
|
|
||
| def main() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach using executables, as well as not using Sciline, is quite different from the other ess* projects.
I'm not against having executables to make reduction available in an additional manner, but this sounds like it could be applicable to other workflows as well? Should we try to generalize in essreduce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NMX users will not be expected to use/know python, and Jupyter notebooks are unnecessary. Indeed the use case is quite different from other instruments at ESS. Executables are far simpler to use with the NMX pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we try to generalize in essreduce?
Or maybe copier template...?
| def reduction( | ||
| *, | ||
| input_file: pathlib.Path, | ||
| output_file: pathlib.Path, | ||
| chunk_size: int = 1_000, | ||
| detector_ids: list[int | str], | ||
| compression: bool = False, | ||
| logger: logging.Logger | None = None, | ||
| min_toa: sc.Variable | int = 0, | ||
| max_toa: sc.Variable | int = int((1 / 14) * 1_000), # Default for ESS NMX | ||
| toa_bin_edges: sc.Variable | int = 250, | ||
| fast_axis: Literal['x', 'y'] | None = None, # 'x', 'y', or None to auto-detect | ||
| display: Callable | None = None, # For Jupyter notebook display | ||
| ) -> sc.DataGroup: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not a request to change anything here now, just a thought/note for the future:
In scipp/esslivedata#404 I moved to using Pydantic for defining workflow parameters. As this allows for including bounds/validation, titles, and descriptions it may be a strategy we should explore more widely in the future, including when making scripts like this. It might make it simpler to document and to auto-setup an argparser, for example.
92a6950 to
1051032
Compare
| ) | ||
| input_arg_group = parser.add_argument_group("Input Options") | ||
| input_arg_group.add_argument( | ||
| "--input_file", type=str, help="Path to the input file", required=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will you ever want to process several files in one go? Say using wildcards like essnmx-reduce --input_file docs/user-guide/977695_*.hdf
Maybe we can just leave that for a later iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point...
We will want to automate the reduction for NMX so I only thought of a single file input but we should definately support multiple files as input. I opened an issue #150
src/ess/nmx/_executable_helper.py
Outdated
| input_arg_group.add_argument( | ||
| "--nbins", | ||
| type=int, | ||
| default=51, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
51 seems like an odd number. Is it 51 bin edges for 50 bins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.... it looks like mcstas executable was interpreting it as a number of bin-edges instead of bins.
I updated it to be number of bins and updated the default to be 50.
src/ess/nmx/_executable_helper.py
Outdated
| "--compression", | ||
| type=int, | ||
| default=1, | ||
| choices=[0, 1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have True or False instead of ints? Or just a boolean flag, if --compression is not specified then no compression is applied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No... because we want to allow one more compression option that uses different type of compression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then should it be a string? Something more descriptive than an integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the fix had to be more invasive than I expected so I opened a separate issue that fixes the problem... #151
| return parser | ||
|
|
||
|
|
||
| def build_logger(args: argparse.Namespace) -> logging.Logger: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is build refering to here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- instantiate the Logger (or retrieve the existing logger)
- configure the logger according to the command line argument.
| however, then the whole range of `event_time_offset` should have been histogrammed | ||
| so that the unwrapping can be applied. | ||
| i.e. `min_toa` should be 0 and `max_toa` should be 1/14 seconds | ||
| for 14 Hz pulse frequency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I get why we don't want to use the GenericTofWorkflow. The flight path is 158m long for NMX. The wavelength range is 2-10 A. This means that the range of arrival times at 158m is about 80-400ms, which spans several pulse periods and is not contained within 0-71ms. Depending on which wavelength range is selected for a given run, one may have to add 1 or more pulse periods to the event_time_offset.
So unwrapping will be indispensable?
I think we should do what we did in Odin: make a Nexus file out of mcstas data (apparently Aaron has done some work in getting good sampling for just this task?) and make a workflow that works on that. This way, we don't have to maintain a mcstas and a nexus reduction workflow.
Unless I am missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use GenericTofWorkflow but it was just not easy to debug when something goes wrong, compared to when it's just simple flat script. GenericTofWorkflow has extra things that we don't need for NMX. i.e. we need to save event-time-offset coordinates, not the time-of-flight or wavelength.
And as far as I understood, NMX will chop the neutron beam so that it fits all in one pulse period, within single frame, effectively making it similart to short-pulse.
But maybe we do need wavelength... so in that case we should just use the GenericTofWorkflow.
A minimum viable product was needed for the ILL test and one of the requirements was that Aaron should be able to fix it himself quickly.
I think we should do what we did in Odin: make a Nexus file out of mcstas data (apparently Aaron has done some work in getting good sampling for just this task?) and make a workflow that works on that. This way, we don't have to maintain a mcstas and a nexus reduction workflow.
Yeeees we should...! That's what Aaron is working on : D ... The mcstas workflow was written a loooong time ago... when we were still hoping that mcstas just writes a nexus file ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do what we did in Odin: make a Nexus file out of mcstas data (apparently Aaron has done some work in getting good sampling for just this task?) and make a workflow that works on that. This way, we don't have to maintain a mcstas and a nexus reduction workflow.
That would be ideal, and I have a nice workflow for doing the sampling of McStas NMX event data. It is almost complete.
| We also do not apply frame unwrapping or pulse skipping here, | ||
| as it is not expected from NMX experiments. | ||
| Frame unwrapping may be applied later on the result of this function if needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to defer the time-of-flight calculation to a later PR, we should at least amend this comment to say that we still need to implement tof computation.
If not, this will just get forgotten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the comment here and opened an issue about this #152
| ) | ||
|
|
||
|
|
||
| def main() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a few tests that run the executable on some files and try a couple of things (like arguments are correctly passed on to the reduction etc..)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nvaytet I added a unit test that runs the command line and then check if the output reflects the input argument. I only added very minimum checking as we are expecting some updates to the interface later.
This is a workflow for actual nexus files collected with our filewriter.
It also has a helper to ignore empty logs but I can't guarantee that it always works.
for example, if any of the transformation is empty and one of transformation is time dependent, then the whole transformation is time dependent, therefore we can't really decide the rotation matrix.
However, it's little likely to happen.