Initial commit of assignment 1#1
Conversation
PatelVishakh
left a comment
There was a problem hiding this comment.
Assignment 1:Pending Resubmission. Needs a few changes. Required Changes:
Q1)III) The type of variable is categorical. In a data science setting, this question is asking whether the variable is numerical or categorial (Integer, continuous, ordinal are for distinguishing further) to assess whether classification or regression methods should be used. The complete statement should Class is a Categorical Variable, represented here in integers (0,1,2) stored in our Dataframe as int64.
Q2)I)The more precise explanation is that Knn relies heavily on comparing distances between data points to fit a model to the data. Hence Variables with large scales sway the DISTANCE computation more, influencing the estimation quite incorrectly.
Q2)II) We do not standardize the variable Class not only because it is categorical variable, but also rather because that is the quantity of interest and we would have to rescale our predictions for them to be interpretable.
Q2)IV) need to use predictors_standardized rather than original dataset. This has caused incorrect models\output for the 3) and 4)
Vishakh Patel [LS]
Merge changes from main
|
Confirming I have corrected the listed questions @PatelVishakh |
PatelVishakh
left a comment
There was a problem hiding this comment.
Assignment 1 Complete! Good Changes!
Vishakh Patel
UofT-DSI | LCR - Assignment 1
What changes are you trying to make? (e.g. Adding or removing code, refactoring existing code, adding reports)
Added code to complete assignment 1
What did you learn from the changes you have made?
Biggest learning was applying pandas and knn.
Was there another approach you were thinking about making? If so, what approach(es) were you thinking of?
I did a couple of the steps in different ways than specified in part 1 because I was going through the pandas documentation and wanted to compare a few of the functions that were quite similar (info vs describe vs size etc)
Were there any challenges? If so, what issue(s) did you face? How did you overcome it?
Aside from the amount of time it took me to figure out that the "y" variable in the KNN functions had to be lowercase, no major issues that couldn't be solved by reading and following the error messages.
How were these changes tested?
All code was tested after new additions. Debug steps were added frequently when testing to ensure intermediate outputs were producing the expected outcomes
Checklist