Skip to content

fix: convert enforce_subs strings to Mol objects in fuzzy_scaffolding#243

Open
rtmalikian wants to merge 1 commit into
datamol-io:mainfrom
rtmalikian:fix/enforce-subs-string-to-mol
Open

fix: convert enforce_subs strings to Mol objects in fuzzy_scaffolding#243
rtmalikian wants to merge 1 commit into
datamol-io:mainfrom
rtmalikian:fix/enforce-subs-string-to-mol

Conversation

@rtmalikian

Copy link
Copy Markdown

Summary

Fixes dm.scaffold.fuzzy_scaffolding() crashing when enforce_subs is passed as a list of strings (SMILES or SMARTS).

The function signature declares enforce_subs: Optional[List[str]], but line 227 passes each string element directly to rgp.GetSubstructMatch(frag), which expects an RDKit Mol object — raising ArgumentError on any string input.

Root Cause

# Before (line 227): frag is a str from enforce_subs
rgp.GetSubstructMatch(frag)  # ArgumentError: str is not a Mol

GetSubstructMatch requires an RDKit ROMol query, not a Python string. The parameter type hint says List[str], but the implementation assumed Mol objects.

Fix

Convert string substructures to RDKit Mol objects via Chem.MolFromSmarts() early in the function, before they're used in GetSubstructMatch:

enforce_subs_mols = []
for sub in enforce_subs:
    if isinstance(sub, str):
        mol_sub = Chem.MolFromSmarts(sub)
        if mol_sub is not None:
            enforce_subs_mols.append(mol_sub)
    else:
        enforce_subs_mols.append(sub)

Use enforce_subs_mols instead of enforce_subs in the GetSubstructMatch call.

Test Output

tests/test_scaffold.py::test_fuzzy_scaffolding PASSED
tests/test_scaffold.py::test_fuzzy_scaffolding_with_enforce_subs PASSED
2 passed in 32.89s

Before fix (demonstrating the bug):

ArgumentError: Python argument types in
    Mol.GetSubstructMatch(Mol, str)
did not match C++ signature:
    GetSubstructMatch(RDKit::ROMol self, RDKit::ROMol query, ...)

After fix:

Success with Mol arg: (0, 1, 2, 3, 4, 5)

Verification

  • Existing test_fuzzy_scaffolding still passes (no regression)
  • New test_fuzzy_scaffolding_with_enforce_subs passes with both SMILES and SMARTS strings
  • Confirmed original code raises ArgumentError with string args
  • Fix correctly converts strings to Mol via MolFromSmarts

Closes #119


About the Author: Raphael Malikian — Clinical AI Solutions Architect. I specialise in building and fixing AI/ML systems for healthcare, including vector databases, RAG pipelines, and clinical NLP. If you need help with your project or think I can add value to your organisation, feel free to reach out — I'd love to connect.

📧 rtmalikian@gmail.com
🔗 GitHub: https://github.com/rtmalikian
🔗 LinkedIn: http://www.linkedin.com/in/raphael-t-malikian-mbbs-bsc-hons-71075436a


Disclosure: This code was developed with assistance from mimo-2.5-pro (Xiaomi) via Hermes Agent (Nous Research). All changes were reviewed, tested against the actual codebase, and verified for correctness.

The enforce_subs parameter accepts List[str] (SMILES/SMARTS), but
GetSubstructMatch() requires an RDKit Mol object, not a string.
Convert string substructures to Mol objects via MolFromSmarts before
passing to GetSubstructMatch.

Fixes datamol-io#119
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rdkit.Chem.GetSubstructMatch called in dm.scaffold.fuzzy_scaffolding takes rdkit mol as input, not str

1 participant