Migrate default readers to multi format reader#743
Merged
Conversation
sherjeelshabih
approved these changes
Mar 13, 2026
Collaborator
sherjeelshabih
left a comment
There was a problem hiding this comment.
The PR looks fine to me. I also tested conversion locally and it works. I'll just leave some comments and you can merge as you see fit.
Thank you so much for this! It's nice to have the same config format all around.
All default readers (json_yml, example, json_map) now inherit from MultiFormatReader instead of BaseReader, eliminating duplicated file-dispatch, template creation, and logging logic. Changes: - MultiFormatReader: expose incoming NXDL template as self.nxdl_template so subclasses can access documented-path structure in setup_template() - YamlJsonReader: reduced to a near-empty MultiFormatReader subclass; print()-based warnings replaced by logger; "default"/"objects" pseudo-extensions removed (use setup_template()/handle_objects()) - ExampleReader: uses handle_json_file() + setup_template(); iterates self.nxdl_template instead of the read() template argument - JsonMapReader: extension handlers populate self.data/self.mapping; setup_template() builds result Template from nxdl_template using the existing fill_documented/fill_undocumented helpers (unchanged, still importable by pynxtools_xrd); HDF5 magic-byte detection replaced by explicit .hdf5/.h5/.nxs extension handlers All 281 dataconverter tests pass.
Pipeline changes:
- handle_objects() now runs BEFORE file dispatch so in-memory data is
available to extension handlers (matches original JsonMapReader semantics)
- post_process() return type changed to dict | None; returned data is
added to the template, making it a useful production hook rather than
a pure side-effect method
- file_paths=None is handled gracefully (sorted() on None no longer crashes)
- local variable renamed: template → result to avoid shadowing the argument
Built-in handlers (register in self.extensions to activate):
- handle_eln_file(path): parses YAML/JSON ELN via parse_yml; respects
CONVERT_DICT and REPLACE_NESTED class attributes — eliminates per-plugin
boilerplate across every MultiFormatReader-based plugin
- set_config_file(path): registers JSON config with a logged warning on
replacement — another universal pattern now available for free
Built-in callbacks:
- get_eln_data() default now reads from self.eln_data (populated by
handle_eln_file) instead of returning {} (was a bug: {} is truthy and
suppressed the "not found" path in resolve_special_keys)
- self.eln_data: dict[str, Any] added to __init__
Documentation:
- Class docstring describes full pipeline with numbered steps and built-in
hook contract
- All hook methods have docstrings explaining when they are called, what
key/path mean, and what to return
- Add conftest.py to ensure local src takes precedence in test runs - Add tests/dataconverter/test_multi_reader.py: 18 tests covering the pipeline contract (hook execution order, return value aggregation, extension dispatch, ELN handling, config file management, and instance dict isolation) - Move `extensions` and `processing_order` from mutable class attributes to instance attributes initialised in `__init__`, preventing accidental cross-instance mutation - Fix `handle_eln_file` to derive the NXDL entry name from `get_entry_names()` instead of hardcoding "entry"
…onfig setup_template() contract: - Must return a static dict independent of template structure and reader data - Updated docstring makes this explicit - Removed self.nxdl_template (and Optional import) — no template reference available to hooks by design ExampleReader: - setup_template() now returns only hardcoded NXtest entries (links, virtual datasets, compression, program_name) - Generic field fill loop moved to read() override where template is naturally available; excluded prefixes extracted to module-level constants JsonMapReader: - Rewritten to use fill_from_config + @DaTa tokens, matching the pattern used by pynxtools-mpes/xps/raman - New mapping_to_config() converts legacy /data/path values to @DaTa:path - get_data() resolves @DaTa: tokens via get_val_nested_keystring_from_dict - _handle_json_file() for .mapping. files populates self.config_dict directly - Removed fill_documented, fill_undocumented, fill_attributes, is_path (pynxtools-xrd migrated separately to not depend on these) Tests: - Replaced test_nxdl_template_available_in_setup_template with test_setup_template_returns_empty_dict_by_default
- ExampleReader reverts from MultiFormatReader to BaseReader: single read() method, no MultiFormatReader pipeline overhead needed - JsonMapReader emits DeprecationWarning when a .mapping.json file is used; users should migrate to a config file via the -c flag - Update module docstring and docs to reflect deprecation - Add test asserting DeprecationWarning is emitted for .mapping.json
…mapping.json cases - Add tests/data/dataconverter/readers/json_map/data.config.json with @DaTa: tokens - test_has_correct_read_func[JsonMapReader] now uses set_config_file() + data.config.json - test_json_map_reader_with_config_file: explicit test for the -c flag path (no warning) - test_json_map_reader_mapping_json_emits_deprecation_warning: explicit test for deprecated .mapping.json format
…ure; log deprecation warning
Co-authored-by: Sherjeel Shabih <shabihsherjeel@gmail.com>
78bf6eb to
d635b51
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This replaces the existing logic for the
JsonMapandJsonYamlreaders with the syntax provided by theMultiFormatReader. This is to align with pynxtools plugins (xps, mpes, raman, soon xrd) who already use this structure. The syntax was already similar to the one from theMultiFormatReader, so the changes should be relatively simple. As a compromise, I also kept the existing syntax for mapping from the raw data file as well as amapping.jsonfile. These are deprecated now and would later be removed.Also updates the docs for the
MultiFormatReaderas well as the example implementation inexamples.Closes #477