Skip to content

Conversation

@edan-bainglass
Copy link
Member

@edan-bainglass edan-bainglass commented Aug 24, 2025

Some parts of the ORM module do not pass a serialization round trip. This PR addresses this, but further discussion is needed regarding what should actually be (de)serialized. If interested, good to read aiidateam/AEP#40 and discussion on #6255. Some discussion regarding mismatch between an object's constructor args and its Model fields can be found here.

Good to also discuss if the use of pydantic should be extended to ORM instance creation in general, not only by way of from_serialized. This is not implemented in this PR (and is out of scope) but can be addressed in a follow-up PR if deemed "correct".

Open questions

  • There is still the matter of repository_content. The current implementation uses base64 encoding/decoding, but this is clearly not ideal for large files. Some discussion was had w.r.t switching to links to some online storage of files. TBD
  • File serialization is TBD in general, for arrays also

Updates

The PR now also introduces the following:

  • Escape hatch for Data plugins via attributes
  • A guard against unhandled attributes in Data plugins (not allowed in the backend node)
  • A dedicated InputModel - a derived view of the defined entity Model suitable for Entity creation

Note for PR reviewers

There are a few changes that are likely out of scope. These will move to dedicated PRs prior to merge of this PR.

@edan-bainglass edan-bainglass marked this pull request as draft August 24, 2025 08:03
@codecov
Copy link

codecov bot commented Aug 24, 2025

Codecov Report

❌ Patch coverage is 84.95822% with 54 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.52%. Comparing base (d53330c) to head (ffdb1cb).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/aiida/orm/nodes/data/array/kpoints.py 48.72% 20 Missing ⚠️
src/aiida/orm/nodes/node.py 79.17% 10 Missing ⚠️
src/aiida/orm/nodes/data/array/array.py 74.08% 7 Missing ⚠️
src/aiida/orm/nodes/data/array/trajectory.py 77.28% 5 Missing ⚠️
...rc/aiida/storage/psql_dos/orm/querybuilder/main.py 16.67% 5 Missing ⚠️
src/aiida/orm/nodes/data/code/installed.py 63.64% 4 Missing ⚠️
src/aiida/orm/entities.py 96.00% 2 Missing ⚠️
src/aiida/orm/fields.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6990      +/-   ##
==========================================
- Coverage   79.60%   79.52%   -0.07%     
==========================================
  Files         566      566              
  Lines       43538    43700     +162     
==========================================
+ Hits        34655    34749      +94     
- Misses       8883     8951      +68     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@edan-bainglass edan-bainglass force-pushed the fix-node-serialization branch 4 times, most recently from 9b7cfbd to d83ced6 Compare August 24, 2025 11:33
@edan-bainglass
Copy link
Member Author

@khsrali regarding one of the failed tests...

tests/orm/nodes/data/test_remote_stash.py::test_constructor_invalid_folder[stash_mode-copy] - ValueError: `RemoteStashFolderData` can only be used with `stash_mode == StashMode.COPY`

In this PR, I lift some validation to the top of the constructor (of stash data classes - see 91e4ad5), to avoid any operations if the object is bound to fail. However, this seems to introduce failures in testing them. It suggests that perhaps there is some order to the operations, though I don't see it. Can you please comment?

@edan-bainglass
Copy link
Member Author

@khsrali regarding one of the failed tests...

tests/orm/nodes/data/test_remote_stash.py::test_constructor_invalid_folder[stash_mode-copy] - ValueError: `RemoteStashFolderData` can only be used with `stash_mode == StashMode.COPY`

In this PR, I lift some validation to the top of the constructor (of stash data classes - see 91e4ad5), to avoid any operations if the object is bound to fail. However, this seems to introduce failures in testing them. It suggests that perhaps there is some order to the operations, though I don't see it. Can you please comment?

It was checking for TypeError, which is what happens when you delay the stash_mode check and provide a non-Enum type for it. But there is no need for it, as the stash_mode check covers that issue as well. I updated the test to parameterize also over error, checking for ValueError in the case of the invalid stash_mode.

@edan-bainglass
Copy link
Member Author

@GeigerJ2 okay, this is ready for others to inspect. I did my best to isolate the commits and provided comments on each. Happy to discuss further.

Pinging also @agoscinski @superstar54 if interested.

Pinging @sphuber for input/feedback, if he has time.

@edan-bainglass
Copy link
Member Author

It may be possible to rely on the post models of aiida-restapi as a reference for defining ORM constructor parameters, as the post models are intended to represent serialized objects passed to the REST API for object construction. Looking into this.

@edan-bainglass
Copy link
Member Author

edan-bainglass commented Oct 8, 2025

@danielhollas what is this about?

Critical: potential `verdi` speed problem: `aiida.orm.fields` module is imported which is not in: ('aiida.brokers', 'aiida.cmdline', 'aiida.common', 'aiida.manage', 'aiida.plugins', 'aiida.restapi')

Nevermind. I see that importing orm in the cmdline raises alarm bells. Removed...

@edan-bainglass edan-bainglass force-pushed the fix-node-serialization branch from 516a6df to 2c3e5df Compare October 8, 2025 16:54
@danielhollas
Copy link
Collaborator

Nevermind. I see that importing orm in the cmdline raises alarm bells. Removed...

Nice, the system works. :-) Feel free to improve the error message here to make it more obvious that this is about the aiida.cmdline module.

@edan-bainglass edan-bainglass force-pushed the fix-node-serialization branch from d6e44ff to 0b37ef6 Compare October 8, 2025 18:17
@edan-bainglass
Copy link
Member Author

Nevermind. I see that importing orm in the cmdline raises alarm bells. Removed...

Nice, the system works. :-) Feel free to improve the error message here to make it more obvious that this is about the aiida.cmdline module.

#7059

@edan-bainglass
Copy link
Member Author

@danielhollas what do you think about ignoring Ruff N806 - "~variable should be lowercase"? See case below. Model is not an instance, but a class. Naming it model would be misleading, hence I went with Model.

Model = cls.Model.as_input_model()
^^^^^ N806

@danielhollas
Copy link
Collaborator

@danielhollas what do you think about ignoring Ruff N806 - "~variable should be lowercase"?

Do you mean ignoring locally (fine) or globally?

@edan-bainglass
Copy link
Member Author

edan-bainglass commented Oct 8, 2025

@danielhollas what do you think about ignoring Ruff N806 - "~variable should be lowercase"?

Do you mean ignoring locally (fine) or globally?

Global. There are many cases when the variable is a class, not an instance. I've pushed this in my last commit just to verify that it works. Okay with removing it in favor of local handling, but would like to hear the reason against a global N806 rule.

Update

Nice. Ignoring N806 globally raised a whole lot of RUF100 due to the codebase being littered with local N806 rules. I'd say that supports my case 🙂

@edan-bainglass
Copy link
Member Author

@danielhollas done for tonight. Will revisit this in the morning 😴

@danielhollas
Copy link
Collaborator

Nice. Ignoring N806 globally raised a whole lot of RUF100 due to the codebase being littered with local N806 rules. I'd say that supports my case 🙂

Yeah, running git grep N806 indeed does get a lot of hits (although most of them are in tests/).

Seems fine to remove it, or alternatively, use the https://docs.astral.sh/ruff/settings/#lint_pep8-naming_extend-ignore-names configuration to automatically ignore some common patterns (e.g. the one you have here, and common class names produced by factory functions (SinglefileData = DataFactory('core.singlefile')

e.g. something like this in pyproject.toml

[tool.ruff.lint.pep8-naming]
ignore-names = ["[A-Z]*Data", "Model"]

In any case please open a separate PR for that so we don't polute this one with bikeshedding discussion and unrelated changes.

@edan-bainglass
Copy link
Member Author

In any case please open a separate PR for that so we don't polute this one with bikeshedding discussion and unrelated changes.

Thanks @danielhollas. Then for this one, since there are only a few cases in my PR, I will locally ignore them. Will open a PR for the pattern handling shortly after.

@edan-bainglass edan-bainglass force-pushed the fix-node-serialization branch 4 times, most recently from 5faafb0 to a4a5b3e Compare October 9, 2025 06:04
Pydantic provides via its model configuration
`ser_json_bytes` and `val_json_bytes`. Here we set
both to 'base64', globally stating that `bytes` are
to be (de)serialized as 'base64'. This covers
`SinglefileData.contents`, `ArrayData.arrays`, and
`Node.repository_content`.
Comment on lines +55 to +56
'computer': 'dbcomputer_id',
'user': 'aiidauser_id',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikibonacci this is what I meant by adding mappings in the backend.

Comment on lines +84 to +89
ALIAS_MAP = {
'id': 'pk',
'dbcomputer_id': 'computer',
'user_id': 'user',
'dbnode_id': 'node',
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used later in generating the QB warning for unrecognized keys.

Comment on lines 1020 to 1027
keys = []
for key in alias._sa_class_manager.mapper.c.keys():
if colalias := ALIAS_MAP.get(key):
keys.append(f'{key} (alias: {colalias})')
else:
keys.append(key)
raise ValueError(
'{} is not a column of {}\nValid columns are:\n{}'.format(
colname, alias, '\n'.join(alias._sa_class_manager.mapper.c.keys())
)
'{} is not a column of {}\nValid columns are:\n{}'.format(colname, alias, '\n'.join(keys))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikibonacci happy to modify this. This preserves the old keys, but places the schema keys upfront. For example,

orm.QueryBuilder().append(
    orm.Node,
    project=['unrecognized'],
).first()

yields

ValueError: unrecognized is not a column of aliased(DbNode)
Valid columns are:
pk (alias for id)
uuid
node_type
process_type
label
description
ctime
mtime
attributes
extras
repository_metadata
computer (alias for dbcomputer_id)
user (alias for user_id)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

5 participants