Skip to content

knight99rus/ML2_Supervised_learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML2: Linear Models and Regularization

This project represents a deep dive into the "engine" of Machine Learning. Beyond just using libraries, I focused on implementing algorithms from scratch, mastering optimization techniques, and tackling the core ML challenge: overfitting.

Topics

  • Custom ML Implementation: Building regression models from scratch using NumPy.
  • Optimization: Stochastic Gradient Descent (SGD) and Analytical closed-form solutions.
  • Regularization: Mastering L1 (Lasso), L2 (Ridge), and ElasticNet to control model complexity.
  • Advanced Preprocessing: Feature normalization (MinMaxScaler, StandardScaler) and outlier handling.
  • Feature Engineering: Parsing complex text data into binary features and exploring polynomial transformations.

Roadmap

1. The Mathematics Under the Hood

I derived the analytical solution for linear regression in vector form and explored how L1 and L2 penalties transform the loss function. This stage was crucial for understanding why Lasso acts as a feature selector by shrinking weights to zero.

2. Building Models from Scratch

I moved beyond sklearn by implementing my own LinearRegression class:

  • Developed SGD with deterministic behavior for reproducibility.
  • Coded the R² (Coefficient of Determination) manually to deeply understand variance explanation.
  • Implemented Ridge, Lasso, and ElasticNet by extending the loss function. Comparing my custom code against sklearn confirmed the accuracy of my mathematical logic.

3. Strategic Feature Engineering

Data processing became more granular. I extracted the top 20 most frequent apartment highlights (e.g., 'Elevator', 'FitnessCenter') from raw text features and transformed them into binary flags, expanding the dataset to 22 high-impact features.

4. Normalization & Scalability

I explored why linear models are sensitive to feature scales. By manually implementing MinMaxScaler and StandardScaler, I observed how normalization accelerates gradient descent convergence and makes model coefficients truly interpretable.

5. Mastering Overfitting

To see theory in action, I intentionally overfitted a model using 10th-degree polynomial features. This experiment vividly demonstrated how weights explode during overfitting and how regularization (tuning the Alpha parameter) effectively "tames" the model, restoring its generalization power.


Results

The project concluded with a rigorous comparison of all models, including naive baselines. I investigated advanced tricks like Target Log-transformation to handle skewed distributions and learned the critical distinction of why outliers should only be removed from training data.

This module was a turning point: I transitioned from "tuning parameters" to understanding the geometry and logic of the learning process. I now possess a clear intuition of what happens to data the moment it enters a model.

How to Run the Project

  1. Clone the repository:

    git clone https://github.com/knight99rus/ML2_Supervised_learning.git
    cd ML2_Supervised_learning
  2. Create and activate a virtual environment (recommended):

    python -m venv venv
    source venv/bin/activate  # For Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install jupyter pandas numpy scikit-learn
  4. Download data:

  5. Launch Jupyter Notebook:

    jupyter notebook

    Open and execute the cells in the project02.ipynb file.

About

Supervised learning techniques, focusing on linear models, regularization, overfitting, underfitting, and metrics for evaluating model quality

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors