Skip to content

Handle NAGLCharges#1206

Merged
mattwthompson merged 56 commits intomainfrom
naglcharges-handler
Aug 19, 2025
Merged

Handle NAGLCharges#1206
mattwthompson merged 56 commits intomainfrom
naglcharges-handler

Conversation

@j-wags
Copy link
Copy Markdown
Member

@j-wags j-wags commented Apr 23, 2025

Description

Implements NAGLCharges as currently drafted in SMIRNOFF EP 11.

This interfaces with a PR in the OpenFF Toolkit at openforcefield/openff-toolkit#2048.

The spec change also includes the definition of some new model fetching and verification behavior, which can be found at openforcefield/openff-nagl-models#61, but which isn't required for these tests to pass.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.89%. Comparing base (35def0c) to head (281c1a8).
⚠️ Report is 43 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1206      +/-   ##
==========================================
+ Coverage   93.87%   93.89%   +0.01%     
==========================================
  Files          72       72              
  Lines        6258     6275      +17     
==========================================
+ Hits         5875     5892      +17     
  Misses        383      383              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

match="all_permutations",
distance="0.8 * angstrom ** 1",
charge_increment1="0.0 * elementary_charge ** 1",
charge_increment1="0.123 * elementary_charge ** 1",
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needed to be a nonzero value for a nagl test, happy to make a separate fixture or put this in the test directly to avoid possible cross contamination.

@j-wags j-wags changed the title [WIP] Handle NAGLCharges Handle NAGLCharges Jul 9, 2025
numpy.testing.assert_allclose(expected_charges_unitless, assigned_charges_unitless)


class TestNAGLChargesErrorHandling:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these tests below here are largely AI-assisted and curated by me, which is to say that I'm not offended at all if you think some are only marginally valuable and deserve deletion.

@j-wags j-wags marked this pull request as ready for review July 9, 2025 23:41
@j-wags j-wags requested a review from mattwthompson as a code owner July 9, 2025 23:41
@j-wags j-wags assigned mattwthompson and unassigned j-wags Jul 10, 2025
Copy link
Copy Markdown
Member

@mattwthompson mattwthompson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good - I'm really happy that the implementation was relatively simple; since it "quacks" much like other charge assignment methods, the investment in the complexity that already exists in smirnoff/_nonbonded.py is finally paying off in a way.

Since this is such a critical piece of infrastructure, I wan to be more through than normal when adding it in. There are a few tests/categories of that I think are missing:

  • interactions with charge_from_molecules
  • charge assignment logging
  • making sure a system containing a large molecule takes a reasonable amount of time


hexane_diol.assign_partial_charges("openff-gnn-am1bcc-0.1.0-rc.3.pt")
# Leave the ToolkitAM1BCC tag in openff-2.1.0 to ensure that the NAGLCharges handler takes precedence
ff = ForceField("openff-2.1.0.offxml")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use 2.1.0 specifically? There are some other Sage version(s) already in fixtures and re-using those would cut down on some repetitive code

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! I'll remove usages of these where possible and replace with fixtures

Comment on lines +188 to +190

with pytest.raises(MissingPackageError, match="NAGL software isn't present"):
Interchange.from_smirnoff(force_field=ff, topology=hexane_diol.to_topology())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if a user did charge_from_molecules=[hexane_diol]?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should succeed, and I've added to this test to ensure it does.

Comment on lines +688 to +696
from openff.toolkit.utils.toolkits import GLOBAL_TOOLKIT_REGISTRY

partial_charge_method = parameter_handler.model_file
if "NAGL" not in GLOBAL_TOOLKIT_REGISTRY.__repr__():
raise MissingPackageError(
"The force field has a NAGLCharges section, but the NAGL software isn't "
"present in GLOBAL_TOOLKIT_REGISTRY",
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this block is doing the toolkit's job for it - what would it look like if this was removed? I have to assume the toolkit handles this error and it would bubble up to the user through Interchange in a legible way

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... This is masking a nasty side effect of shoving everything through Molecule.assign_partial_charges, which is that the Toolkit treats model names like any other charge method (that is, it doesn't know that the call is supposed to get routed to NAGLToolkitWrapper). If we didn't have this, the error raised here would be:

E       ValueError: No registered toolkits can provide the capability "assign_partial_charges" for args "()" and kwargs "{'molecule': Molecule with name '' and SMILES '[H][O][C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[O][H]', 'partial_charge_method': 'openff-gnn-am1bcc-0.1.0-rc.3.pt', 'use_conformers': None, 'strict_n_conformers': False, 'normalize_partial_charges': True, '_cls': <class 'openff.toolkit.topology.molecule.Molecule'>}"
E       Available toolkits are: [ToolkitWrapper around The RDKit version 2025.03.4]
E        ToolkitWrapper around The RDKit version 2025.03.4 <class 'openff.toolkit.utils.exceptions.ChargeMethodUnavailableError'> : partial_charge_method 'openff-gnn-am1bcc-0.1.0-rc.3.pt' is not available from RDKitToolkitWrapper. Available charge methods are {'mmff94': {}, 'gasteiger': {}}

Which doesn't tell the user the important thing they need to know in this case: they either need to install NAGL or get it in their ToolkitRegistry.

So I'm in favor of leaving this as-is.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this made me realize there's a family of edge cases to consider, where NAGL charge assignment fails for various reasons. For example:

  • If NAGL just doesn't cover some elements in the molecule, then lower-precedence charge models should be attempted (NOT the current behavior, I'm committing a test for this)
    • I wonder if there's already a problem with this sort of fallback if a force field only had ToolkitAM1BCC and CIMH?
    • The error message printed when this happens SHOULDN'T just be from the most recent charge method to fail, but instead should include reasons why EACH attempted charge method failed (otherwise the user might just see one error from CIMH when they really want to know why their NAGLCharges tag didn't work)
  • If the NAGL model file isn't found and can't be fetched, that should be an immediate failure (this is the case now)
  • If the NAGL hash comparison fails, that should be an immediate failure (this is the case now)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the highlighted code to be trying to do the job of making sure that NAGL is installed/available/etc. before running charge assignment on it. If we did s/NAGL/RDKit/ and s/charge assignment/conformer generation/ would you grant that this wiring is supposed to be handled by the toolkit?

I agree that error message is not useful for a user, but I would rather have the error bubble up in one place and it be handled as gracefully as possible. This may be a sizable block of a try followed by potentially many excepts but that feels safer and more maintainable than trying to handle this particular error here. I have yet to think through all of these failure modes, but I would rather the toolkit tell me NAGL isn't installed, a file isn't found, etc. and report that back to the user here. Probably some of these cases are already not reported to the user well and could be improved.

Put a different way, I'm concerned that this opens the door to Interchange being responsible for making sure charge assignment methods and optional dependencies are properly lined up whereas Molecule is the choke point of all paths.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we did s/NAGL/RDKit/ and s/charge assignment/conformer generation/ would you grant that this wiring is supposed to be handled by the toolkit?

Yes.

I agree with this being resolved into a bunch of try/excepts, and trying not to make Interchange responsible for so many details. I'll give that a shot, there's a little magic in ToolkitRegistry.call (the raise_exception_types named argument) that may make this more straightforward.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome - I hope we can get this smoothness without a ton of effort or complexity here. (If not, a less elegant solution which gets things working okay may be appropriate!)

@mattwthompson mattwthompson added this to the 0.5.0 milestone Jul 10, 2025
Copy link
Copy Markdown
Member Author

@j-wags j-wags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes to self:

Tell matt that test_charge_assignment_logging layout is a pain to make changes to

Tell matt that almost all new tests are AI generated at my prompting and while I've reviewed and revised them they were way less work to make than they appear so I'm fine to just trash or consolidate any of them

The pytest.mark.slow decorator doesn't appear to be doing anything (the tests are run unconditionally and pytest raises an error if I pass it --runslow)

Review Codecov diff and see why its so sad https://app.codecov.io/gh/openforcefield/openff-interchange/pull/1206?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=checks&utm_campaign=pr+comments&utm_term=openforcefield

Copy link
Copy Markdown
Member

@mattwthompson mattwthompson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work - I found one detail which I think warrants further consideration, but all other comments are minor and non-blocking

Comment on lines +170 to +174
# No error should be raised if using charge_from_molecules
sage_with_nagl_charges.create_interchange(
topology=hexane_diol.to_topology(),
charge_from_molecules=[hexane_diol],
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth double-checking that the charges are correct (I think re-using code from a similar test would be fine). It's possible that something weird happens when ForceField.create_interchange is only given a registry with one toolkit

Comment on lines +234 to +237
@pytest.mark.xfail(
reason="charge assignment handler fallback behavior not yet implemented",
raises=ValueError,
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR I'm happy with pushing this implementation into the future, it's an edge case which a user ought to not come across without trying funky things

# Create a very large molecule
large_molecule = Molecule.from_smiles("C" * 200) # 200-carbon alkane chain

start_time = time.time()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't make a difference with 30 seconds as the threshold, and it's already probably "warmed up" from another test, but I wonder if the model is already loaded into memory (frequently a few seconds, compared to <1 seconds runtime) at this step

Comment on lines +191 to +192
def sage_with_nagl_charges(sage):
sage.get_parameter_handler(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def sage_with_nagl_charges(sage):
sage.get_parameter_handler(
def sage_nagl(sage):
sage.deregister_parameter_handler("ToolkitAM1BCC")

(blocking) this fixture should drop AM1-BCC from Sage

  • IIUC, that would get it to match what's going to be used in production
  • It's hard to imagine this breaking tests since AM1BCC should not be sniffed with the tests as written, but I would really like to avoid surprises in releases 2.3.0+ if this breaks anything
  • test_nagl_charges_precedence_over_am1bcc would break with this change, but I think the behavior (what happens when there are both? then read the title of the test back to oneself) is worth testing, so maybe just add ToolkitAM1BCCHandler back in that test only
  • I can take or leave the name change, I think there is value in differentiating "Sage which happens to use NAGL changes" from "Sage which really is based on NAGL charges"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, great catch. Implementing now.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented

@mattwthompson
Copy link
Copy Markdown
Member

Taking the liberty to click "Ready for review" since ... uh, I'm already doing that

@mattwthompson mattwthompson marked this pull request as ready for review August 7, 2025 16:47
Copy link
Copy Markdown
Member

@mattwthompson mattwthompson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, please nothing surprising seems to have come up.

I looked at the code coverage report and

  1. The non-test/non-tooling diff is quite small
  2. The coverage is great

Merge main in, update whatever docs you can think of, and I think we're good?

@mattwthompson mattwthompson removed their assignment Aug 12, 2025
@j-wags
Copy link
Copy Markdown
Member Author

j-wags commented Aug 19, 2025

@mattwthompson I've updated the releasehistory and think this is ready to merge. This is a big PR so I'd like to turn it over to you to push the merge button. I propose calling the release 0.4.5, since that saves me a package build (OFFTK is currently depending on interchange =0.4) but no big deal if you'd prefer to change the releasenotes and call it 0.5.0.

@mattwthompson mattwthompson merged commit 1354fc8 into main Aug 19, 2025
16 checks passed
@j-wags j-wags assigned mattwthompson and unassigned j-wags Aug 19, 2025
@mattwthompson mattwthompson deleted the naglcharges-handler branch January 5, 2026 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants