Getting Involved and Commenting by PowersPope · Pull Request #8 · daenuprobst/molzip

PowersPope · 2023-07-17T02:15:06Z

Hey Daniel,

I added in some comments and Docstrings for the functions within gzip_regressor.py to help people understand what is happening. (people being myself) Thought I would upload it at least. I think I will go through everything and write out comments/Docstrings at least.

I think this looks like a cool project and would love to help contribute. I am a PhD student (finishing up my first year) at the University of Oregon. I work on ML and computational methods for macrocyclic peptide design. I'm still relatively newish in the ML space, so I would love to contribute anyway that I can!

My biggest confusion with this repo is where exactly the data is being pulled from? I see that main can run without any data in the repo. I am guessing it is being pulled from some hosting site? Though I am having troubles finding it within main.

Looking forward to helping out!

…two functions and a couple following ones in gzip-regressor

janweinreich · 2023-07-17T07:19:28Z

hey (I am not the owner of the repo but I can answer the 2nd part of your question:
have a look at
https://deepchem.readthedocs.io/en/latest/api_reference/moleculenet.html

it is used in

molzip/main.py

Line 137 in 6247efd

def molnet_loader(

for the benchmark tests :)
This is nice because you just have to pip install deepchem and avoid what causes 90 % of the problems in ML: Data preprocessing.

PowersPope · 2023-07-17T14:21:02Z

Hey thanks for the explanation! I knew it must be pulling from some sort of API, but couldn't figure it out. That's super clever. Thanks for the info! :)

…saw that both Lasso regression and a pseudo onehot encoding for each smile and its combination with kNN was not very effective. It could also, be the approach I am applying currently. I need to look into allpying the onehot to classification.

PowersPope · 2023-07-19T07:09:43Z

I was looking for something to try out and figured I would give Lasso Regression and add in some extra features. I didn't know if you wanted to include any other features besides just NCD. Though if you don’t then we don’t need to incorporate the onehot function at all. It was more for my curiosity.

Overall it seemed like Lasso Regression was not helpful. The model did worse for both approaches that I tried. I haven’t tried to use Lasso Regression in the past, so it is possible that I incorporated it wrong.

Two approaches that I used:

Incorporated a non-kNN Lasso Regression and a k-NN Lasso Regression. However, I only included the scores from the non-kNN Lasso Regression. Both of them were similar and not very good.
I incorporated a onehot vector to the kNN regression task. I am pretty sure this is not good practice. I still wanted to see what would happen. It actually did better than approach 1. However, it still wasn’t very good.

All my functions are included in the gzip_lasso_regressor.py. I also included a LASSO_REGRESSION.md file. This includes the two test results. I had a couple thoughts on other approaches that could be better than this. Though I would love to hear thoughts on the direction you think this project should go.

I’m looking forward to helping. Let me know if you think I should fix/change anything.

* Generated RMSE valid graph with error bars `RMSE_VALID_CLASSIFICATION.png` * Generated RMSE test graph with error bars `RMSE_TEST_CLASSIFICATION.png` * Uploaded dataset of total run. Formated into CSV format.

Added in explanatory comments and function doc strings for the first …

de9f7de

…two functions and a couple following ones in gzip-regressor

PowersPope added 2 commits July 25, 2023 21:37

DATA - Make Gzip Varying Compression Test Graphs

e9bea0e

* Generated RMSE valid graph with error bars `RMSE_VALID_CLASSIFICATION.png` * Generated RMSE test graph with error bars `RMSE_TEST_CLASSIFICATION.png` * Uploaded dataset of total run. Formated into CSV format.

Merge branch 'main' of github.com:PowersPope/molzip

fe0c0c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Involved and Commenting#8

Getting Involved and Commenting#8
PowersPope wants to merge 4 commits into
daenuprobst:mainfrom
PowersPope:main

PowersPope commented Jul 17, 2023

Uh oh!

janweinreich commented Jul 17, 2023

Uh oh!

PowersPope commented Jul 17, 2023

Uh oh!

PowersPope commented Jul 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PowersPope commented Jul 17, 2023

Uh oh!

janweinreich commented Jul 17, 2023

Uh oh!

PowersPope commented Jul 17, 2023

Uh oh!

PowersPope commented Jul 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants