Getting Involved and Commenting#8
Conversation
…two functions and a couple following ones in gzip-regressor
|
hey (I am not the owner of the repo but I can answer the 2nd part of your question: it is used in Line 137 in 6247efd for the benchmark tests :) This is nice because you just have to pip install deepchem and avoid what causes 90 % of the problems in ML: Data preprocessing. |
|
Hey thanks for the explanation! I knew it must be pulling from some sort of API, but couldn't figure it out. That's super clever. Thanks for the info! :) |
…saw that both Lasso regression and a pseudo onehot encoding for each smile and its combination with kNN was not very effective. It could also, be the approach I am applying currently. I need to look into allpying the onehot to classification.
|
I was looking for something to try out and figured I would give Lasso Regression and add in some extra features. I didn't know if you wanted to include any other features besides just NCD. Though if you don’t then we don’t need to incorporate the onehot function at all. It was more for my curiosity. Overall it seemed like Lasso Regression was not helpful. The model did worse for both approaches that I tried. I haven’t tried to use Lasso Regression in the past, so it is possible that I incorporated it wrong. Two approaches that I used:
All my functions are included in the I’m looking forward to helping. Let me know if you think I should fix/change anything. |
* Generated RMSE valid graph with error bars `RMSE_VALID_CLASSIFICATION.png` * Generated RMSE test graph with error bars `RMSE_TEST_CLASSIFICATION.png` * Uploaded dataset of total run. Formated into CSV format.
Hey Daniel,
I added in some comments and Docstrings for the functions within
gzip_regressor.pyto help people understand what is happening. (people being myself) Thought I would upload it at least. I think I will go through everything and write out comments/Docstrings at least.I think this looks like a cool project and would love to help contribute. I am a PhD student (finishing up my first year) at the University of Oregon. I work on ML and computational methods for macrocyclic peptide design. I'm still relatively newish in the ML space, so I would love to contribute anyway that I can!
My biggest confusion with this repo is where exactly the data is being pulled from? I see that main can run without any data in the repo. I am guessing it is being pulled from some hosting site? Though I am having troubles finding it within main.
Looking forward to helping out!