Skip to content

osmordred: add rdkit217 descriptor set (217 standard RDKit descriptors)#34

Open
guillaume-osmo wants to merge 1 commit into
bp-kelley:osmordredfrom
guillaume-osmo:osmordred-rdkit217
Open

osmordred: add rdkit217 descriptor set (217 standard RDKit descriptors)#34
guillaume-osmo wants to merge 1 commit into
bp-kelley:osmordredfrom
guillaume-osmo:osmordred-rdkit217

Conversation

@guillaume-osmo

Copy link
Copy Markdown

Adds the rdkit217 descriptor set — 217 standard RDKit physicochemical/topological descriptors computed in C++, in the exact order of Python's Descriptors._descList.

Independent of the osmordredv3 PR and the smarts291 PR; branches off osmordred.

Contents

  • Code/GraphMol/Descriptors/rdkit217/{RDKit217Descriptors.cpp,.h,test_rdkit217.cpp}
  • Osmordred.h: declares calcEState_VSA / calcVSA_EState (defined in OsmordredBasicPhyschemCountsRules.cpp) so rdkit217 can reuse the MOE-VSA terms.
  • CMake wiring + Python wrapper bindings (ExtractRDKitDescriptors, ExtractRDKitDescriptorsBatch, ExtractRDKitDescriptorsFromMolsBatch, GetRDKit217DescriptorNames) under RDK_BUILD_OSMORDRED.

Assisted by Claude

Adds RDKit217Descriptors: 217 standard RDKit physicochemical/topological
descriptors computed in C++, in the exact order of Python's
Descriptors._descList. Exposed via ExtractRDKitDescriptors,
ExtractRDKitDescriptorsBatch, ExtractRDKitDescriptorsFromMolsBatch, and
GetRDKit217DescriptorNames.

- Code/GraphMol/Descriptors/rdkit217/{RDKit217Descriptors.cpp,.h,test_rdkit217.cpp}
- Osmordred.h: declare calcEState_VSA / calcVSA_EState (defined in
  OsmordredBasicPhyschemCountsRules.cpp) so rdkit217 can reuse the MOE-VSA terms.
- CMake wiring + Python wrapper bindings, all under RDK_BUILD_OSMORDRED.

Independent of the osmordredv3 PR and the smarts291 PR; branches off osmordred.

Assisted by Claude
@bp-kelley

Copy link
Copy Markdown
Owner

We already have a way of doing this:

    Properties sink;
    std::vector<std::string> names = sink.getPropertyNames();
     RWMol *mol;
    mol = SmilesToMol("C1CCC2(C1)CC1CCC2CC1");
    std::vector<double> props = sink.computeProperties(*mol);

Might be worth checking to see if it has all 217 descriptors though. Might be missing some.

@guillaume-osmo

Copy link
Copy Markdown
Author

the idea is was port all descriptors into cpp itself then have a fast generator of them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants