Stephanie Shishis LCR Assignment-1 Completed#1
Conversation
PatelVishakh
left a comment
There was a problem hiding this comment.
Assignment 1:Complete. Great work! Suggested Changes:
When answering questions rather then commenting in after manually reading the output, should automate it. for eg.
Number of observations (rows)
num_observations = wine_df.shape[0]
print(f"Number of observations: {num_observations}")
Q1)III) The type of variable is categorical. In a data science setting, this question is asking whether the variable is numerical or categorial (Integer, continuous, ordinal are for distinguishing further) to assess whether classification or regression methods should be used. The complete statement should Class is a Categorical Variable, represented here in integers (0,1,2) stored in our Dataframe as int64.
Q4)I) When using results from other code sections rather then commenting in after manually reading the output, you should automate it. Specifically using best n_neighbors
Vishakh Patel [LS]
UofT-DSI | LCR - Assignment 1
What changes are you trying to make? (e.g. Adding or removing code, refactoring existing code, adding reports)
The changes I was trying to make was to first inspect the data by adding code to find parameters like column/row length, how many predictor values and what variable types existed. I also pre-processed the data by standardizing and data-splitting into testing and training sets. Lastly, I initialized the model by doing a grid search and fitted the KNN model.
What did you learn from the changes you have made?
I learned the importance of standardization for distance-based models and how grid search can be used to find the best hyperparameter.
Was there another approach you were thinking about making? If so, what approach(es) were you thinking of?
For Question 2, I could have passed one array to train_test_split instead of two separate arrays. If I did one, it would have returned 2 outputs (test, train). However, I would have to then separate the predictors and response variable again before running the grid search.
Were there any challenges? If so, what issue(s) did you face? How did you overcome it?
My biggest issue is syntax errors and using the incorrect variable names. I also have to go back to the live coding scripts to remember the correct formatting.
How were these changes tested?
The changes were tested by running each of the code blocks and ensuring no error was thrown.
A reference to a related issue in your repository (if applicable)
N/A
Checklist