Skip to content

Storage helper#221

Open
liamhuber wants to merge 25 commits intomainfrom
live-storage
Open

Storage helper#221
liamhuber wants to merge 25 commits intomainfrom
live-storage

Conversation

@liamhuber
Copy link
Copy Markdown
Member

@liamhuber liamhuber commented Apr 2, 2026

This PR works with the "live" dataful objects (first introduced in #185) to provide convenience of operation with bagofholding. If you save a flowrep.api.schemas.LiveWorkflow object to a bagofholding.H5File, you can use the flowrep.api.tools.LexicalBagBrowser to easily browse and load just the nodes and ports using their "lexical path".

A "lexical path" is a new concept to flowrep, so I'm open to pushback that we want to do this another way, but it's old-hat in pyiron_workflow. It's just the "."-concatenated set of node names, "inputs"/"outputs", and port names used to reach any node or port.

Let's take an easy but non-trivial example of a for-loop. We can create it from some python code, and use the built-in toy WfMS to get a data-populated LiveWorkflow object from it:

import flowrep as fr
import flowrep.api.schemas as frs
import flowrep.api.tools as frt

@fr.atomic
def rescale(value: int | float, factor: float = 1.0) -> float:
    result = value * factor
    return result


@fr.workflow
def rescale_all(items: list[int | float], factor: float = 1.0) -> list[float]:
    results = []
    for item in items:
        scaled = rescale(item, factor)
        results.append(scaled)
    return results


@fr.atomic
def int_list(n: int) -> list[int]:
    return [i for i in range(n)]


@fr.workflow
def scaled_range(n: int, scale: float = 1.0) -> list[float]:
    to_scale = int_list(n)
    scaled = rescale_all(to_scale, scale)
    return scaled

executed = frt.run_recipe(scaled_range.flowrep_recipe, n=5)
executed.output_ports
# {'scaled': OutputPort(value=[0.0, 1.0, 2.0, 3.0, 4.0], annotation=list[float])}

The supposition is that a user then dumps this result into an H5Bag:

import bagofholding as boh

boh.H5Bag.save(executed, "executed.h5", overwrite_existing=True)

Then, if they come back later they can browse it lexically to see the topological contents:

browser = frt.LexicalBagBrowser("executed.h5")
browser.list_paths()
Which prints all the available lexical paths
['inputs.n',
 'inputs.scale',
 'outputs.scaled',
 'int_list_0',
 'int_list_0.inputs.n',
 'int_list_0.outputs.output_0',
 'rescale_all_0',
 'rescale_all_0.inputs.factor',
 'rescale_all_0.inputs.items',
 'rescale_all_0.outputs.results',
 'rescale_all_0.for_0',
 'rescale_all_0.for_0.inputs.factor',
 'rescale_all_0.for_0.inputs.items',
 'rescale_all_0.for_0.outputs.results',
 'rescale_all_0.for_0.body_0',
 'rescale_all_0.for_0.body_0.inputs.factor',
 'rescale_all_0.for_0.body_0.inputs.item',
 'rescale_all_0.for_0.body_0.outputs.scaled',
 'rescale_all_0.for_0.body_0.rescale_0',
 'rescale_all_0.for_0.body_0.rescale_0.inputs.factor',
 'rescale_all_0.for_0.body_0.rescale_0.inputs.value',
 'rescale_all_0.for_0.body_0.rescale_0.outputs.result',
 'rescale_all_0.for_0.body_1',
 'rescale_all_0.for_0.body_1.inputs.factor',
 'rescale_all_0.for_0.body_1.inputs.item',
 'rescale_all_0.for_0.body_1.outputs.scaled',
 'rescale_all_0.for_0.body_1.rescale_0',
 'rescale_all_0.for_0.body_1.rescale_0.inputs.factor',
 'rescale_all_0.for_0.body_1.rescale_0.inputs.value',
 'rescale_all_0.for_0.body_1.rescale_0.outputs.result',
 'rescale_all_0.for_0.body_2',
 'rescale_all_0.for_0.body_2.inputs.factor',
 'rescale_all_0.for_0.body_2.inputs.item',
 'rescale_all_0.for_0.body_2.outputs.scaled',
 'rescale_all_0.for_0.body_2.rescale_0',
 'rescale_all_0.for_0.body_2.rescale_0.inputs.factor',
 'rescale_all_0.for_0.body_2.rescale_0.inputs.value',
 'rescale_all_0.for_0.body_2.rescale_0.outputs.result',
 'rescale_all_0.for_0.body_3',
 'rescale_all_0.for_0.body_3.inputs.factor',
 'rescale_all_0.for_0.body_3.inputs.item',
 'rescale_all_0.for_0.body_3.outputs.scaled',
 'rescale_all_0.for_0.body_3.rescale_0',
 'rescale_all_0.for_0.body_3.rescale_0.inputs.factor',
 'rescale_all_0.for_0.body_3.rescale_0.inputs.value',
 'rescale_all_0.for_0.body_3.rescale_0.outputs.result',
 'rescale_all_0.for_0.body_4',
 'rescale_all_0.for_0.body_4.inputs.factor',
 'rescale_all_0.for_0.body_4.inputs.item',
 'rescale_all_0.for_0.body_4.outputs.scaled',
 'rescale_all_0.for_0.body_4.rescale_0',
 'rescale_all_0.for_0.body_4.rescale_0.inputs.factor',
 'rescale_all_0.for_0.body_4.rescale_0.inputs.value',
 'rescale_all_0.for_0.body_4.rescale_0.outputs.result']

Where we have an arbitrary number of node names, which brings us down into a subgraph at each step; then "inputs" or "outputs" to access an IO panel, and finally a port name.

From which we can freely reload things based on the simplified lexical path, e.g. nodes

reloaded_node = browser.load("rescale_all_0")
isinstance(reloaded_node, frs.LiveWorkflow)
# True

or port data

browser.load("rescale_all_0.for_0.body_1.inputs.item")
# InputPort(value=1, annotation=None, default=NOT_DATA)

For simplicity, you get a clean error if you try to load an IO panel -- we always get back a live port or node

try:
    browser.load("rescale_all_0.inputs")
except ValueError as e:
    print(e)
# Path terminated in 'inputs'. Please select an individual port to load from among ('factor', 'items')

Users in a jupyter environment get a graphical browser analogous to the one available in bagofholding:

Screenshot 2026-04-01 at 8 31 33 PM

Out of scope here is adding this to the user guide notebook, which will wait until #215 is finished. I expect it will broadly follow the content here in this comment.

I am also considering refactoring all the "live" stuff (live, wfms, storage, widget) down into its own submodule to keep the main level cleaner. Since users access stuff via the API and aren't impacted by such a restructure, I'd say that's out of scope here too.

On the technical front, I swapped the LiveNode fields from pyiron_snippets.dotdict.DotDict objects to plain dict. This didn't require any further changes since we never actually leveraged the dot-access in tests or documentation so far. The upside is that bagofholding treats string key dictionaries very nicely and it made writing the path adapters much easier.
Since bagofholding is still v0, I pin hard to explicit versions here when validating that the bag we're browsing is actually browseable. This can be relaxed later, but for now since we control both packages and boh isn't releasing at a fast cadence I don't think it hurts much.

In the live dataclass fields. This makes the H5Bag structure much, much clearer since now all the fields are simple `StrKeyDict`, which get extra nice treatment in the H5 paths

Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
To be more generic and make room for other optional dependencies

Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
With a widget helper

Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

Binder 👈 Launch a binder notebook on branch pyiron/flowrep/live-storage

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.92%. Comparing base (881ec07) to head (2b1740a).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #221      +/-   ##
==========================================
+ Coverage   98.81%   98.92%   +0.10%     
==========================================
  Files          30       32       +2     
  Lines        2022     2225     +203     
==========================================
+ Hits         1998     2201     +203     
  Misses         24       24              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@liamhuber
Copy link
Copy Markdown
Member Author

@samwaseda, it's still not clear to me how knowledge graphs will link to serialization files that hold arbitrary python data. It's possible that we will be able to leverage this sort of browsing directly, but even if we can't I hope it provides a nice interim proof-of-concept for quick-loading from a rich serialization format (i.e. one capable of both metadata and nearly-arbitrary python objects).

@niklassiemer and @mbruns91, I would like this on your radar too, since you guys have been handling lots of data storage tasks.

Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
@liamhuber
Copy link
Copy Markdown
Member Author


      The following packages are incompatible
      ├─ bagofholding =0.1.5 * is installable and it requires
      │  └─ numpy >=1.26.4,<2.4.0 *, which can be installed;
      └─ numpy =2.4.2 * is not installable because it conflicts with any installable versions previously reported.

Yeah, fair. I will wait until I've got green check marks before pinging for review, but I won't be able to manage that tonight so I'm going to call it here.

@liamhuber liamhuber marked this pull request as draft April 2, 2026 03:48
@samwaseda samwaseda requested a review from Copilot April 2, 2026 09:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a “lexical path” storage helper for browsing/loading LiveWorkflow content from bagofholding H5 bags, plus an optional Jupyter widget for interactive exploration.

Changes:

  • Introduces flowrep.storage with LexicalBagBrowser, bag validation, lexical path listing, and lexical-path-based loading.
  • Adds flowrep.widget.LexicalBagTree (ipytree-based) for graphical browsing of bag contents.
  • Updates CI/env/config and tests to cover storage + widget behavior and adds a simple workflow recipe maker for tests.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/flowrep/storage.py Implements lexical browsing/loading and bag validation for persisted LiveWorkflow objects.
src/flowrep/widget.py Adds ipytree widget for lazy, hierarchical exploration of lexical paths.
src/flowrep/live.py Switches LiveNode port/node mappings from DotDict to plain dict for better bag compatibility.
src/flowrep/api/tools.py Exposes LexicalBagBrowser via public API tools.
tests/unit/test_storage.py Adds unit tests for validation, lexical path listing, and loading behavior.
tests/unit/test_widget.py Adds unit tests for widget construction, selection, and lazy expansion.
tests/flowrep_static/makers.py Adds a simple workflow recipe builder used by the new tests.
pyproject.toml Adds optional dependency groups for storage and storage-widget.
docs/environment.yml Adds bag/widget deps for docs builds.
.binder/environment.yml Adds bag/widget deps for Binder environment.
.ci_support/environment-optional.yml Adds optional deps to a dedicated conda env file used in CI.
.ci_support/lower-bounds.yml Adds lower-bound pins for the new optional deps.
.github/workflows/push-pull.yml / .github/workflows/daily.yml Switch CI to use environment-optional.yml instead of the prior env file set.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

liamhuber added 17 commits April 2, 2026 07:15
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
I had just assumed 0.2.0 would exist, but it doesn't

Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
If the bag was created using a dev version, just cross your fingers and go for it.

Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
And test against all exception branches

Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
It conflicts too easily with other stuff (here the browser method widget())

Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
It guarded against the "input_ports" or "output_ports" being missing, but these default to empty dictionaries so the H5 file will always have the group even when it's empty. This is validated with the new accompanying test.

Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
For the same reasons we removed them in the main storage tool. Basic construction is also validated against a workflow with an empty IO panel

Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
@liamhuber liamhuber marked this pull request as ready for review April 2, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants