Skip to content

Migrate default readers to multi format reader#743

Merged
lukaspie merged 18 commits intomasterfrom
migrate-default-readers-to-multi-format-reader
Mar 13, 2026
Merged

Migrate default readers to multi format reader#743
lukaspie merged 18 commits intomasterfrom
migrate-default-readers-to-multi-format-reader

Conversation

@lukaspie
Copy link
Copy Markdown
Collaborator

@lukaspie lukaspie commented Mar 11, 2026

This replaces the existing logic for the JsonMap and JsonYaml readers with the syntax provided by the MultiFormatReader. This is to align with pynxtools plugins (xps, mpes, raman, soon xrd) who already use this structure. The syntax was already similar to the one from the MultiFormatReader, so the changes should be relatively simple. As a compromise, I also kept the existing syntax for mapping from the raw data file as well as a mapping.jsonfile. These are deprecated now and would later be removed.

Also updates the docs for the MultiFormatReader as well as the example implementation in examples.

Closes #477

Copy link
Copy Markdown
Collaborator

@sherjeelshabih sherjeelshabih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks fine to me. I also tested conversion locally and it works. I'll just leave some comments and you can merge as you see fit.

Thank you so much for this! It's nice to have the same config format all around.

Comment thread docs/reference/built-in-readers.md Outdated
Comment thread docs/reference/built-in-readers.md
Comment thread src/pynxtools/dataconverter/convert.py Outdated
lukaspie and others added 18 commits March 13, 2026 14:15
All default readers (json_yml, example, json_map) now inherit from
MultiFormatReader instead of BaseReader, eliminating duplicated
file-dispatch, template creation, and logging logic.

Changes:
- MultiFormatReader: expose incoming NXDL template as self.nxdl_template
  so subclasses can access documented-path structure in setup_template()
- YamlJsonReader: reduced to a near-empty MultiFormatReader subclass;
  print()-based warnings replaced by logger; "default"/"objects"
  pseudo-extensions removed (use setup_template()/handle_objects())
- ExampleReader: uses handle_json_file() + setup_template(); iterates
  self.nxdl_template instead of the read() template argument
- JsonMapReader: extension handlers populate self.data/self.mapping;
  setup_template() builds result Template from nxdl_template using the
  existing fill_documented/fill_undocumented helpers (unchanged, still
  importable by pynxtools_xrd); HDF5 magic-byte detection replaced by
  explicit .hdf5/.h5/.nxs extension handlers

All 281 dataconverter tests pass.
Pipeline changes:
- handle_objects() now runs BEFORE file dispatch so in-memory data is
  available to extension handlers (matches original JsonMapReader semantics)
- post_process() return type changed to dict | None; returned data is
  added to the template, making it a useful production hook rather than
  a pure side-effect method
- file_paths=None is handled gracefully (sorted() on None no longer crashes)
- local variable renamed: template → result to avoid shadowing the argument

Built-in handlers (register in self.extensions to activate):
- handle_eln_file(path): parses YAML/JSON ELN via parse_yml; respects
  CONVERT_DICT and REPLACE_NESTED class attributes — eliminates per-plugin
  boilerplate across every MultiFormatReader-based plugin
- set_config_file(path): registers JSON config with a logged warning on
  replacement — another universal pattern now available for free

Built-in callbacks:
- get_eln_data() default now reads from self.eln_data (populated by
  handle_eln_file) instead of returning {} (was a bug: {} is truthy and
  suppressed the "not found" path in resolve_special_keys)
- self.eln_data: dict[str, Any] added to __init__

Documentation:
- Class docstring describes full pipeline with numbered steps and built-in
  hook contract
- All hook methods have docstrings explaining when they are called, what
  key/path mean, and what to return
- Add conftest.py to ensure local src takes precedence in test runs
- Add tests/dataconverter/test_multi_reader.py: 18 tests covering the
  pipeline contract (hook execution order, return value aggregation,
  extension dispatch, ELN handling, config file management, and
  instance dict isolation)
- Move `extensions` and `processing_order` from mutable class attributes
  to instance attributes initialised in `__init__`, preventing accidental
  cross-instance mutation
- Fix `handle_eln_file` to derive the NXDL entry name from
  `get_entry_names()` instead of hardcoding "entry"
…onfig

setup_template() contract:
- Must return a static dict independent of template structure and reader data
- Updated docstring makes this explicit
- Removed self.nxdl_template (and Optional import) — no template reference
  available to hooks by design

ExampleReader:
- setup_template() now returns only hardcoded NXtest entries (links,
  virtual datasets, compression, program_name)
- Generic field fill loop moved to read() override where template is
  naturally available; excluded prefixes extracted to module-level constants

JsonMapReader:
- Rewritten to use fill_from_config + @DaTa tokens, matching the pattern
  used by pynxtools-mpes/xps/raman
- New mapping_to_config() converts legacy /data/path values to @DaTa:path
- get_data() resolves @DaTa: tokens via get_val_nested_keystring_from_dict
- _handle_json_file() for .mapping. files populates self.config_dict directly
- Removed fill_documented, fill_undocumented, fill_attributes, is_path
  (pynxtools-xrd migrated separately to not depend on these)

Tests:
- Replaced test_nxdl_template_available_in_setup_template with
  test_setup_template_returns_empty_dict_by_default
- ExampleReader reverts from MultiFormatReader to BaseReader: single
  read() method, no MultiFormatReader pipeline overhead needed
- JsonMapReader emits DeprecationWarning when a .mapping.json file is
  used; users should migrate to a config file via the -c flag
- Update module docstring and docs to reflect deprecation
- Add test asserting DeprecationWarning is emitted for .mapping.json
…mapping.json cases

- Add tests/data/dataconverter/readers/json_map/data.config.json with @DaTa: tokens
- test_has_correct_read_func[JsonMapReader] now uses set_config_file() + data.config.json
- test_json_map_reader_with_config_file: explicit test for the -c flag path (no warning)
- test_json_map_reader_mapping_json_emits_deprecation_warning: explicit test for deprecated .mapping.json format
Co-authored-by: Sherjeel Shabih <shabihsherjeel@gmail.com>
@lukaspie lukaspie force-pushed the migrate-default-readers-to-multi-format-reader branch from 78bf6eb to d635b51 Compare March 13, 2026 13:15
@lukaspie lukaspie merged commit 326df08 into master Mar 13, 2026
17 checks passed
@lukaspie lukaspie deleted the migrate-default-readers-to-multi-format-reader branch March 13, 2026 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add tests for multi format reader

2 participants