ML2: Linear Models and Regularization

This project represents a deep dive into the "engine" of Machine Learning. Beyond just using libraries, I focused on implementing algorithms from scratch, mastering optimization techniques, and tackling the core ML challenge: overfitting.

Topics

Custom ML Implementation: Building regression models from scratch using NumPy.
Optimization: Stochastic Gradient Descent (SGD) and Analytical closed-form solutions.
Regularization: Mastering L1 (Lasso), L2 (Ridge), and ElasticNet to control model complexity.
Advanced Preprocessing: Feature normalization (MinMaxScaler, StandardScaler) and outlier handling.
Feature Engineering: Parsing complex text data into binary features and exploring polynomial transformations.

Roadmap

1. The Mathematics Under the Hood

I derived the analytical solution for linear regression in vector form and explored how L1 and L2 penalties transform the loss function. This stage was crucial for understanding why Lasso acts as a feature selector by shrinking weights to zero.

2. Building Models from Scratch

I moved beyond sklearn by implementing my own LinearRegression class:

Developed SGD with deterministic behavior for reproducibility.
Coded the R² (Coefficient of Determination) manually to deeply understand variance explanation.
Implemented Ridge, Lasso, and ElasticNet by extending the loss function. Comparing my custom code against sklearn confirmed the accuracy of my mathematical logic.

3. Strategic Feature Engineering

Data processing became more granular. I extracted the top 20 most frequent apartment highlights (e.g., 'Elevator', 'FitnessCenter') from raw text features and transformed them into binary flags, expanding the dataset to 22 high-impact features.

4. Normalization & Scalability

I explored why linear models are sensitive to feature scales. By manually implementing MinMaxScaler and StandardScaler, I observed how normalization accelerates gradient descent convergence and makes model coefficients truly interpretable.

5. Mastering Overfitting

To see theory in action, I intentionally overfitted a model using 10th-degree polynomial features. This experiment vividly demonstrated how weights explode during overfitting and how regularization (tuning the Alpha parameter) effectively "tames" the model, restoring its generalization power.

Results

The project concluded with a rigorous comparison of all models, including naive baselines. I investigated advanced tricks like Target Log-transformation to handle skewed distributions and learned the critical distinction of why outliers should only be removed from training data.

This module was a turning point: I transitioned from "tuning parameters" to understanding the geometry and logic of the learning process. I now possess a clear intuition of what happens to data the moment it enters a model.

How to Run the Project

Clone the repository:

git clone https://github.com/knight99rus/ML2_Supervised_learning.git
cd ML2_Supervised_learning

Create and activate a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # For Windows: venv\Scripts\activate

Install dependencies:

pip install jupyter pandas numpy scikit-learn

Download data:
- Read the task on the Kaggle competition page.
- Download test.json file.
Launch Jupyter Notebook:
```
jupyter notebook
```
Open and execute the cells in the project02.ipynb file.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
code-samples		code-samples
data-samples		data-samples
datasets		datasets
materials		materials
misc		misc
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML2: Linear Models and Regularization

Topics

Roadmap

1. The Mathematics Under the Hood

2. Building Models from Scratch

3. Strategic Feature Engineering

4. Normalization & Scalability

5. Mastering Overfitting

Results

How to Run the Project

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ML2: Linear Models and Regularization

Topics

Roadmap

1. The Mathematics Under the Hood

2. Building Models from Scratch

3. Strategic Feature Engineering

4. Normalization & Scalability

5. Mastering Overfitting

Results

How to Run the Project

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages