Skip to content

not able to smirksify double bonds in different clusters for cis/trans butene #100

@wutobias

Description

@wutobias

I am trying to use chemper.smirksify.SMIRKSifier to build a list of smarts patterns from clusters for cis- and trans-butene. Based on my clustering, CC single bonds should be discriminated between cis- and trans-butene. I.e. the CC single bonds are in the same cluster within a molecule but in different clusters between the two different molecules (see CC_single_different below). When running chemper.smirksify.SMIRKSifier to build the smarts list, I am getting the following error message:

ClusteringError: 
                      SMIRKSifier was not able to create SMIRKS for the provided
                      clusters with 5 layers. Try increasing the number of layers
                      or changing your clusters

I suppose this results from not finding matches with reference types, but this is just a guess. Is there a way to build a smarts list using SMIRKSifier that discriminates the two different CC single bonds? If not, what would be a good place in the code to start to implement that?

Here is an example that illustrates the above (see Buten.zip for rdkit molecules attached as JSON):

from chemper.mol_toolkits import mol_toolkit
from chemper.smirksify import SMIRKSifier, print_smirks
from rdkit import Chem

with open("./cis-Buten.json", "r") as fopen:
    cis_buten = Chem.JSONToMols(
        fopen.read()
    )[0]
with open("./trans-Buten.json", "r") as fopen:
    trans_buten = Chem.JSONToMols(
        fopen.read()
    )[0]


CC_single_different = [ 
    ('cc_single1', [[(0, 1), (2, 3)], []]),
    ('ch_single', [[(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)],
            [(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)]
           ]),
    ('cc_double', [[(1, 2)], [(1, 2)]]),
    ('cc_single2', [[], [(0, 1), (2, 3)]])
]

CC_single_same = [ 
    ('cc_single', [[(0, 1), (2, 3)], [(0, 1), (2, 3)]]),
    ('ch_single', [[(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)],
           [(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)]]),
    ('cc_double', [[(1, 2)], [(1, 2)]]),
]

molecules = [cis_buten, trans_buten]

### The following works nicely.
bond_smirksifier = SMIRKSifier(
    molecules, 
    CC_single_same, 
    max_layers=5,
    strict_smirks=True,
    verbose=False
)
smirks3k = bond_smirksifier.reduce(max_its=100, verbose=False)
print_smirks(smirks3k)

### The following will not work.
bond_smirksifier = SMIRKSifier(
    molecules, 
    CC_single_different, 
    max_layers=5,
    strict_smirks=True,
    verbose=False
)
smirks3k = bond_smirksifier.reduce(max_its=100, verbose=False)
print_smirks(smirks3k)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions