Skip to content

Add chemical clustering code#4

Open
apayne97 wants to merge 10 commits into
mainfrom
add-chemical-series-clustering
Open

Add chemical clustering code#4
apayne97 wants to merge 10 commits into
mainfrom
add-chemical-series-clustering

Conversation

@apayne97

@apayne97 apayne97 commented Apr 15, 2024

Copy link
Copy Markdown
Collaborator

Intro

Simple idea, which is to use a version of hierarchical clustering where instead of using a metric of distance computed once, instead use a maximum-common substructure algorithm to find the # of atoms in the common substructure.

In a sense, I'm trying to match a Medicinal Chemists intuition of what the scaffolds are for a set of heterogeneous hits you might get from a screen. I've found that doing this with fingerprint distances, etc doesn't really work in the way I'd like.

I think this similar to how MedChemica's MCPairs algorithm works, so I might end up just recommending that instead, but I think this will be a good exercise and potentially useful to a few different projects I'm working on.

To-Do

  • add a way to traverse backward through the clusters to collect all the molecules that belong to each cluster
  • add the final while loop
  • add a filtering step to avoid calculating MCSS's that will time out "truncate" anyways
  • add Bemis-Murcko scaffold clustering
  • add tests and a good test case of molecules that isn't a proprietary (?) molecule screen

@codecov-commenter

Copy link
Copy Markdown

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants