Skip to content

Fix residue naming#2042

Open
hannahbaumann wants to merge 12 commits into
mainfrom
fix_residue_naming
Open

Fix residue naming#2042
hannahbaumann wants to merge 12 commits into
mainfrom
fix_residue_naming

Conversation

@hannahbaumann

@hannahbaumann hannahbaumann commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Checklist

  • All new code is appropriately documented (user-facing code must have complete docstrings).
  • Added a news entry, or the changes are not user-facing.
  • Ran pre-commit: you can run pre-commit locally or comment on this PR with pre-commit.ci autofix.

Manual Tests: these are slow so don't need to be run every commit, only before merging and when relevant changes are made (generally at reviewer-discretion).

Developers certificate of origin

@hannahbaumann

Copy link
Copy Markdown
Contributor Author

pre-commit.ci autofix

@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 99.20635% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 90.69%. Comparing base (97bef39) to head (c68b88e).

Files with missing lines Patch % Lines
...s/protocols/openmm_rfe/test_hybrid_top_protocol.py 98.63% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2042      +/-   ##
==========================================
- Coverage   95.01%   90.69%   -4.33%     
==========================================
  Files         212      213       +1     
  Lines       20572    20693     +121     
==========================================
- Hits        19546    18767     -779     
- Misses       1026     1926     +900     
Flag Coverage Δ
fast-tests 90.69% <99.20%> (?)
slow-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hannahbaumann

Copy link
Copy Markdown
Contributor Author

pre-commit.ci autofix

@hannahbaumann hannahbaumann changed the title [WIP] Fix residue naming Fix residue naming Jun 29, 2026
@hannahbaumann hannahbaumann linked an issue Jun 29, 2026 that may be closed by this pull request

@IAlibay IAlibay left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good thanks! - couple of things to fix but otherwise should be fine.

After this PR, will you be able to do the same for PlainMD, AFE, and SepTop?

Comment thread news/fix_unk_resnames.rst Outdated

**Changed:**

* Small molecules in RelativeHybridTopology topologies (including the output PDB) are now named LIG (alchemical ligand) and CF1, CF2… (cofactors) instead of UNK.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about if a residue name was already assigned?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added something!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this "offmolecule_utils" so that we know it's utilities for openff molecules and not rdkit molecules

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

assert set(lig.indices).isdisjoint(cof.indices)


def test_get_metadata_inconsistent_warns(caplog):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this and the next one to test_openmmutils instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Comment thread src/openfe/tests/protocols/openmm_rfe/test_hybrid_top_protocol.py Outdated

try:
data = rmsd.gather_rms_data(pdb_file, trj_file)
data = rmsd.gather_rms_data(pdb_file, trj_file, ligand_selection="resname LIG")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail is someone provides their own residue names ahead of time (which is something we should encourage - i.e. power users should be rewarded not punished).

There are two potential solutions here:

  1. (lowest cost / temporary fix) We just store the alchemical residue names in the output of the setup unit and then fetch them in the analysis unit.
  2. (higher cost / more complicated) We pass in the alchemical components & the comp resids through all the way to the analysis unit and then extract the relevant residue names from that.

For now, I would suggest we just pass the residue names and have the residue name kwarg be optional and/or have a default of LIG, that way we don't break backwards compatibility.

We can clean up after the fact.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that for hybrid toplogy, it's technically possible for both endstate molecules to have different names, so we should check that it works properly.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a 3rd option might be best, where we do away with tracking the residue names and just keep the alchemical atom indices around. This would mean moving away from the gather_rms_data function and using the new openfe-analaysis API but we plan on doing that anyway, so this could be a good time to do it?

This would fix the case pointed out above and the fact that if a user sets the alchemical ligands and the cofactors to have the same residue name (LIG), we will still have the same issue in the analysis unit.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't consider the above blocking though we could have this as a temp fix and move to indices when we change the analysis for this protocol?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like the 3rd option as the long term fix, for now I went ahead with option 1, but happy to change it to a longer term fix.

Re two ligands different residue names: I tested this out and it assigns the residue name from the stateA ligand to the hybrid topology (in the HTF). We could still check for both residue names since it wouldn't hurt, but may also confuse the user?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a 3rd option might be best, where we do away with tracking the residue names and just keep the alchemical atom indices around.

This somewhat fits within the longer term idea of switching comp_resids to just comp_indices (where you track the indices of each component in the system). However, that's a lot bigger lift.

I don't mind just tracking the alchemical indices in the setup results - that's something we nearly already do in the AFE Protocol. But yeah, for the sake of incremental PRs = faster velocity, I would vote for the quick fix now, then the medium/long term fix after.

Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
@jthorton

jthorton commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Not blocking: Does a user have a way of setting the residue_name via a CLI input? It seems like the atom metadata fields are not preserved on writing an openff molecule to file?

@hannahbaumann

Copy link
Copy Markdown
Contributor Author

pre-commit.ci autofix

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

No API break detected ✅

View workflow run

Griffe output
$ griffe check "openfe" -s src --no-inspection --no-color --verbose -a origin/main

$ griffe check "openfecli" -s src --no-inspection --no-color --verbose -a origin/main

@IAlibay

IAlibay commented Jul 2, 2026

Copy link
Copy Markdown
Member

Not blocking: Does a user have a way of setting the residue_name via a CLI input? It seems like the atom metadata fields are not preserved on writing an openff molecule to file?

I had forgotten that this was still an issue...

The problem is that residue name is a per atom PDBResidueInfo property in rdkit molecules, and a) gufe SMCs don't serialize that, and b) it doesn't get written out when you write out an SDF file. See: OpenFreeEnergy/gufe#327

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Change ligand and cofactor residue names in the topology

3 participants