Unify model naming convention #123

rul048 · 2025-12-10T02:59:36Z

Summary

This PR standardizes how MatCalc loads universal MLIPs across multiple backend libraries (MatGL, MACE, GRACE, etc.) as a solution of Issue #120.

Different backend libraries currently use inconsistent or non-canonical model names (e.g., "small-omat-0", "medium-mpa-0", "GRACE-1L-OMAT-medium-base", "TensorNet-MatPES-PBE-v2025.1-PES"), which leads to ambiguity, user confusion, and repeated conversion logic scattered across the codebase.

To fix this, the PR introduces a unified model naming convention. The canonical identifier format is:

Architecture-Dataset-Functional-Version-Size

Examples:

TensorNet-MatPES-r2SCAN-v2025.1-S
MACE-MP-PBE-0-M
GRACE-OMAT-PBE-0-L

The alias conversion table (ID_TO_ALIAS) can map the commonly used case-insensitive abbreviations or allies (eg., MACE-MP-0, MACE-MP-PBE-0, MACE-MP-0-M) entered by the user to the canonical identifiers used in MatCalc. And the backend name conversion table (ID_TO_NAME) can map these identifiers to the model names that each backend library actually uses so that users can write:

load_universal("mace")
load_universal("mace-mp-0")
load_universal("tensornet")
load_universal("grace-oam")

Use abbreviations will load the most advanced model in the model family.

Major changes:

feature 1: Introduced a unified, backend-agnostic model naming convention (----).
feature 2: Centralized model resolution using ID_TO_NAME (canonical ID -> backend name) and ID_TO_ALIAS (user aliases -> canonical ID).
feature 3: Updated load_universal() to support flexible MACE loading via name.startswith("mace"), enabling both load_universal("mace") (SOTA model) and load_universal("mace-mp-0-m").
feature 4: Updated load_universal() to support flexible GRACE loading via name.startswith("grace"), enabling both load_universal("grace") (SOTA model), load_universal("grace-oam-pbe-0-l") and load_universal("GRACE-2L-OMAT-large-ft-AM").
feature 5: Expanded alias coverage and added tests for MACE and GRACE model resolution.
fix 1: Removed TensorPotential, as it is a backend library name (GRACE) rather than a model name.

Todos

If this is work in progress, what else needs to be done?

feature 6: Expand alias coverage to handle additional legacy and inconsistent model names.

Checklist

Google format doc strings added. Check with ruff.
Type annotations included. Check with mypy.
Tests added for new features/fixes.
If applicable, new classes/functions/modules have duecredit @due.dcite decorators to reference relevant papers by DOI (example)

Tip: Install pre-commit hooks to auto-check types and linting before every commit:

pip install -U pre-commit
pre-commit install

shyuep · 2025-12-10T03:23:37Z

Good start. But I don't think we should use model aliases in such a manner. It is very confusing that MACE-mp-0-small maps to "small". It is completely unintuitive. An "alias" is a shortened name for a commonly used model (and you should not have too many of these since commonly used by definition means a few).

This is what I propose:

Matcalc defines the correct way to name a model. E.g., < architecture>-<dataset>-version-S/M/L (small medium large). Example: MACE-MP-0-M
Model aliases provide shortened names for the most common models. E.g., MACE-MP-0 maps to MACE-MP-0-M. TensorNet-PBE maps to TensorNet-MatPES-PBE-v2025.1. We can be less strict with model aliases since everyone has their pet naming convention.
Within the model loading code, we parse the full name (e.g., MACE-MP-0-M) to actually load the correct model using the API provided by the model developer. E.g., Loading either MACE-MP-0 will correctly alias to MACE-MP-0-M, which in the relevant section, the code loads with the medium keyword.

I would even argue that for <dataset>, we want to be explicit in the functional (even though for most datasets, it is implied).

All names need to be processed in a case-insensitive manner.

@Andrew-S-Rosen Welcome your views too.

Andrew-S-Rosen · 2025-12-10T04:08:44Z

Thanks for the ping and for tackling this, @rul048. While conceptually "simple", I think #120 is incredibly important to address and am glad you are working on this. I agree with your comments, @shyuep.

Matcalc defines the correct way to name a model. E.g., < architecture>--version-S/M/L (small medium large). Example: MACE-MP-0-M

I do not have much of an opinion so long as it is internally consistent and captures all the necessary nuance.

TensorNet-PBE maps to TensorNet-MatPES-PBE-v2025.1. We can be less strict with model aliases since everyone has their pet naming convention.

Presumably, if a TensorNet-MatPES-PBE-v2026.1 were to come out next month, then TensorNet-PBE would be remapped to this newer version in a future release of matgl. On one hand, it makes sense to return the "best" version for the user. Of course, this downside of this is that when the alias is remapped, then there will be breaking changes if the user upgrades without reading a CHANGELOG. I don't see a way around that. You get what you ask for with the alias.

Within the model loading code, we parse the full name (e.g., MACE-MP-0-M) to actually load the correct model using the API provided by the model developer. E.g., Loading either MACE-MP-0 will correctly alias to MACE-MP-0-M, which in the relevant section, the code loads with the medium keyword.

Yup.

Change names from list comprehension to set comprehension. Signed-off-by: Runze Liu <146490083+rul048@users.noreply.github.com>

rul048 · 2025-12-22T02:59:56Z

Compared to the previous version, the latest commit splits model resolution into two explicit stages. It introduces two separate mappings: ID_TO_ALIAS (user-facing aliases -> canonical IDs) and ID_TO_NAME (canonical IDs -> backend-specific model names). This makes the canonical identifier the shared “source of truth” inside MatCalc, while aliases become purely a user-compatibility layer, and backend names become purely an implementation detail.

Along with this change, the PR description now formalizes a simpler canonical identifier format at the MatCalc level (Architecture-Dataset-Functional-Version-Size) rather than treating the canonical name as a variable-length.

shyuep · 2025-12-23T22:31:53Z

src/matcalc/utils.py

+
+# Keys must be lowercase and represent canonical identifiers
+# Values are the actual model names passed to the backend libraries.
+ID_TO_NAME = {


I wouldn't bother ensuring IDs are lower case here. It is better to use the proper name capitalization. When checking, we can make it a non-case-sensitive check

Agreed. I’ve kept the canonical capitalization for the IDs (eg., TensorNet-MatPES-r2SCAN-v2025.1-S), and the resolution is now case-insensitive when matching user input.

shyuep · 2025-12-23T22:32:20Z

src/matcalc/utils.py

 }

+# Common aliases and abbreviations will load the most advanced or widely used model.
+ID_TO_ALIAS = {


This is non-intuitive. It should be ALIAS_TO_ID. And it is a many-to-one mapping.

shyuep · 2025-12-25T19:18:51Z

src/matcalc/utils.py

+    "M3GNet-MatPES-r2SCAN-v2025.1-S": "M3GNet-MatPES-r2SCAN-v2025.1-PES",
+    "CHGNet-MatPES-PBE-v2025.2-M": "CHGNet-MatPES-PBE-2025.2.10-2.7M-PES",
+    "CHGNet-MatPES-r2SCAN-v2025.2-M": "CHGNet-MatPES-r2SCAN-2025.2.10-2.7M-PES",
+    "MACE-MP-PBE-0-S": "small",


As I mentioned in my previous review, the mapping should not be to MACE-MP-PBE-0-S to small. What does small even mean? This is the argument to the MACE init method. In that case, you should parse "MACE-MP-PBE-0-S" to note that the last letter is S and use small as the input.

I know. I agree that it is much better if we can parse the naming pattern rather than mapping. The issue here is the backend MACE library uses model argument to load model, but the naming across different families are not consistent, so the same semantic “size” appears in different positions and formats.

For original MPtrj trained models: the model itself is directly [size], like small/medium/large;
For modified MPtrj trained models: the model is [size]-[version], like small-0b2;
For MPA or OMAT trained models, the model is [size]-[dataset]-[version], like small-omat-0;
For MATPES trained models, the model is [architecture]-[dataset]-[functional]-[version].

For parsing these cases, we first need to determine which model family a given canonical identifier belongs to, and then construct the model argument in whatever format the MACE backend expects for that specific family. I dont find a good way to parse all these types effectively. The implementation will be more complex than direct mapping.

Even if you want to do mapping, the MACE specific mappings can be within the mace loading part of the code. It should not be that we create chaos on the general ID to name mapping for one code. Anything that is specific to one model family should be isolated within that model family itself.

Unify model naming convention

f9e835c

rul048 changed the title ~~Unify model naming convention~~ [WIP] Unify model naming convention Dec 10, 2025

rul048 and others added 4 commits December 9, 2025 20:45

Fix the pytest

9ce8db0

Fix the pytest

84db012

pre-commit auto-fixes

dc095a9

Fix the pytest

35deade

Change names from list comprehension to set comprehension. Signed-off-by: Runze Liu <146490083+rul048@users.noreply.github.com>

Andrew-S-Rosen mentioned this pull request Dec 15, 2025

Make it clearer how to pick specific MLIPs and do so in a future-proof manner Quantum-Accelerators/quacc#2780

Open

rul048 and others added 2 commits December 19, 2025 16:46

Resolve ID and alias

7742415

pre-commit auto-fixes

e1be217

rul048 changed the title ~~[WIP] Unify model naming convention~~ Unify model naming convention Dec 22, 2025

shyuep reviewed Dec 23, 2025

View reviewed changes

rul048 and others added 7 commits December 23, 2025 16:34

Fix

3d21d37

Fix

620e984

Fix

cc684fa

pre-commit auto-fixes

0fdf829

Fix

260035a

Fix

192a2c2

pre-commit auto-fixes

95a9578

rul048 requested a review from shyuep December 25, 2025 05:34

shyuep reviewed Dec 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unify model naming convention #123

Unify model naming convention #123

Uh oh!

rul048 commented Dec 10, 2025 •

edited

Loading

Uh oh!

shyuep commented Dec 10, 2025 •

edited

Loading

Uh oh!

Andrew-S-Rosen commented Dec 10, 2025

Uh oh!

rul048 commented Dec 22, 2025 •

edited

Loading

Uh oh!

shyuep Dec 23, 2025

Uh oh!

rul048 Dec 24, 2025

Uh oh!

shyuep Dec 23, 2025

Uh oh!

shyuep Dec 25, 2025

Uh oh!

rul048 Dec 25, 2025 •

edited

Loading

Uh oh!

shyuep Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Unify model naming convention #123

Are you sure you want to change the base?

Unify model naming convention #123

Uh oh!

Conversation

rul048 commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Major changes:

Todos

Checklist

Uh oh!

shyuep commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Andrew-S-Rosen commented Dec 10, 2025

Uh oh!

rul048 commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shyuep Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

rul048 Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

shyuep Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

shyuep Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

rul048 Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shyuep Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rul048 commented Dec 10, 2025 •

edited

Loading

shyuep commented Dec 10, 2025 •

edited

Loading

rul048 commented Dec 22, 2025 •

edited

Loading

rul048 Dec 25, 2025 •

edited

Loading