Fix residue naming#2042
Conversation
|
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2042 +/- ##
==========================================
- Coverage 95.01% 90.69% -4.33%
==========================================
Files 212 213 +1
Lines 20572 20693 +121
==========================================
- Hits 19546 18767 -779
- Misses 1026 1926 +900
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
IAlibay
left a comment
There was a problem hiding this comment.
This looks good thanks! - couple of things to fix but otherwise should be fine.
After this PR, will you be able to do the same for PlainMD, AFE, and SepTop?
|
|
||
| **Changed:** | ||
|
|
||
| * Small molecules in RelativeHybridTopology topologies (including the output PDB) are now named LIG (alchemical ligand) and CF1, CF2… (cofactors) instead of UNK. |
There was a problem hiding this comment.
What about if a residue name was already assigned?
There was a problem hiding this comment.
Added something!
There was a problem hiding this comment.
I would call this "offmolecule_utils" so that we know it's utilities for openff molecules and not rdkit molecules
| assert set(lig.indices).isdisjoint(cof.indices) | ||
|
|
||
|
|
||
| def test_get_metadata_inconsistent_warns(caplog): |
There was a problem hiding this comment.
Can you move this and the next one to test_openmmutils instead?
|
|
||
| try: | ||
| data = rmsd.gather_rms_data(pdb_file, trj_file) | ||
| data = rmsd.gather_rms_data(pdb_file, trj_file, ligand_selection="resname LIG") |
There was a problem hiding this comment.
This will fail is someone provides their own residue names ahead of time (which is something we should encourage - i.e. power users should be rewarded not punished).
There are two potential solutions here:
- (lowest cost / temporary fix) We just store the alchemical residue names in the output of the setup unit and then fetch them in the analysis unit.
- (higher cost / more complicated) We pass in the alchemical components & the comp resids through all the way to the analysis unit and then extract the relevant residue names from that.
For now, I would suggest we just pass the residue names and have the residue name kwarg be optional and/or have a default of LIG, that way we don't break backwards compatibility.
We can clean up after the fact.
There was a problem hiding this comment.
Note that for hybrid toplogy, it's technically possible for both endstate molecules to have different names, so we should check that it works properly.
There was a problem hiding this comment.
I think a 3rd option might be best, where we do away with tracking the residue names and just keep the alchemical atom indices around. This would mean moving away from the gather_rms_data function and using the new openfe-analaysis API but we plan on doing that anyway, so this could be a good time to do it?
This would fix the case pointed out above and the fact that if a user sets the alchemical ligands and the cofactors to have the same residue name (LIG), we will still have the same issue in the analysis unit.
There was a problem hiding this comment.
I wouldn't consider the above blocking though we could have this as a temp fix and move to indices when we change the analysis for this protocol?
There was a problem hiding this comment.
Yes, I like the 3rd option as the long term fix, for now I went ahead with option 1, but happy to change it to a longer term fix.
Re two ligands different residue names: I tested this out and it assigns the residue name from the stateA ligand to the hybrid topology (in the HTF). We could still check for both residue names since it wouldn't hurt, but may also confuse the user?
There was a problem hiding this comment.
I think a 3rd option might be best, where we do away with tracking the residue names and just keep the alchemical atom indices around.
This somewhat fits within the longer term idea of switching comp_resids to just comp_indices (where you track the indices of each component in the system). However, that's a lot bigger lift.
I don't mind just tracking the alchemical indices in the setup results - that's something we nearly already do in the AFE Protocol. But yeah, for the sake of incremental PRs = faster velocity, I would vote for the quick fix now, then the medium/long term fix after.
Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
|
Not blocking: Does a user have a way of setting the residue_name via a CLI input? It seems like the atom metadata fields are not preserved on writing an openff molecule to file? |
|
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
|
No API break detected ✅ Griffe output |
I had forgotten that this was still an issue... The problem is that residue name is a per atom PDBResidueInfo property in rdkit molecules, and a) gufe SMCs don't serialize that, and b) it doesn't get written out when you write out an SDF file. See: OpenFreeEnergy/gufe#327 |
Checklist
newsentry, or the changes are not user-facing.pre-commit.ci autofix.Manual Tests: these are slow so don't need to be run every commit, only before merging and when relevant changes are made (generally at reviewer-discretion).
Developers certificate of origin