Skip to content

Initial commit of assignment 1#1

Open
jnahmiach wants to merge 4 commits into
mainfrom
assignment-1
Open

Initial commit of assignment 1#1
jnahmiach wants to merge 4 commits into
mainfrom
assignment-1

Conversation

@jnahmiach

Copy link
Copy Markdown
Owner

UofT-DSI | LCR - Assignment 1

What changes are you trying to make? (e.g. Adding or removing code, refactoring existing code, adding reports)

Added code to complete assignment 1

What did you learn from the changes you have made?

Biggest learning was applying pandas and knn.

Was there another approach you were thinking about making? If so, what approach(es) were you thinking of?

I did a couple of the steps in different ways than specified in part 1 because I was going through the pandas documentation and wanted to compare a few of the functions that were quite similar (info vs describe vs size etc)

Were there any challenges? If so, what issue(s) did you face? How did you overcome it?

Aside from the amount of time it took me to figure out that the "y" variable in the KNN functions had to be lowercase, no major issues that couldn't be solved by reading and following the error messages.

How were these changes tested?

All code was tested after new additions. Debug steps were added frequently when testing to ensure intermediate outputs were producing the expected outcomes

Checklist

  • I can confirm that my changes are working as intended

@PatelVishakh PatelVishakh left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assignment 1:Pending Resubmission. Needs a few changes. Required Changes:

Q1)III) The type of variable is categorical. In a data science setting, this question is asking whether the variable is numerical or categorial (Integer, continuous, ordinal are for distinguishing further) to assess whether classification or regression methods should be used. The complete statement should Class is a Categorical Variable, represented here in integers (0,1,2) stored in our Dataframe as int64.

Q2)I)The more precise explanation is that Knn relies heavily on comparing distances between data points to fit a model to the data. Hence Variables with large scales sway the DISTANCE computation more, influencing the estimation quite incorrectly.

Q2)II) We do not standardize the variable Class not only because it is categorical variable, but also rather because that is the quantity of interest and we would have to rescale our predictions for them to be interpretable.

Q2)IV) need to use predictors_standardized rather than original dataset. This has caused incorrect models\output for the 3) and 4)

Vishakh Patel [LS]

@jnahmiach

jnahmiach commented May 21, 2026

Copy link
Copy Markdown
Owner Author

Confirming I have corrected the listed questions @PatelVishakh

@PatelVishakh PatelVishakh left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assignment 1 Complete! Good Changes!

Vishakh Patel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants