Skip to content

Kenuuey/linear-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Linear Models

Summary

This project focuses on supervised learning using linear regression models for predicting apartment rental prices.
It demonstrates preprocessing, feature engineering, model training, evaluation, and handling overfitting/underfitting using Kaggle RentHop apartment listings.


📘 Project Overview

The goal of this project is to understand the linear modeling pipeline — from data preprocessing and exploration to training and evaluating regression models.
The final task is to predict apartment rental prices based on features like the number of bedrooms, bathrooms, and interest level.


📊 Topics Covered

  • Machine Learning Theory
    • Linear Regression
    • Overfitting vs. Underfitting
    • Regularization (L1, L2, ElasticNet)
  • Data Analysis
    • Data loading and exploration
    • Outlier detection and handling
    • Data visualization (histograms, boxplots, scatterplots, heatmaps)
  • Feature Engineering
    • Creating new features (squared terms, top features from listing highlights)
    • Using PolynomialFeatures
  • Model Training & Evaluation
    • Linear Regression
    • Ridge, Lasso, ElasticNet
    • Decision Tree Regressor
    • Baseline Models (Mean & Median)
  • Model Metrics
    • Mean Absolute Error (MAE)
    • Root Mean Squared Error (RMSE)
    • R² coefficient

🧩 Tools & Libraries

  • Python 3
  • Jupyter Notebook
  • Pandas, NumPy
  • Matplotlib, Seaborn
  • Scikit-learn

🎯 Project Goal

To develop an understanding of:

  • Analyzing and visualizing data
  • Training linear models and regularized versions
  • Evaluating and interpreting regression models
  • Handling overfitting using polynomial features and regularization

📂 Project Structure

Linear_Models/
│
├── data/ # Dataset (from Kaggle)
├── notebooks/ # Jupyter notebooks
│ └── linear_models.ipynb
├── src/ # Python scripts
│ └── LinearRegression.py
│ └── ...
├── requirements.txt
└── README.md

⚙️ Virtual Environment Setup

Windows

python -m venv lm_env
lm_env\Scripts\activate

macOS / Linux

python3 -m venv lm_env
source lm_env/bin/activate

Install dependencies

pip install --upgrade pip
pip install -r requirements.txt

Add environment to Jupyter

pip install ipykernel
python -m ipykernel install --user --name=lm_env --display-name "Python (lm_env)"
  • After this, select Python (lm_env) in VS Code or Jupyter Notebook as the kernel.

🏁 Deliverables

  • Jupyter Notebook with:
    • Data exploration and visualization
    • Preprocessing and feature engineering
    • Model training and evaluation
    • Comparisons between Linear Regression, Ridge, Lasso, ElasticNet, Decision Tree, and native baselines

🔑 Key Takeaways

  • Polynomial features can improve training performance but may overfit.
  • Regularization (L1, L2) stabilizes high-degree polynomial models.
  • Removing outliers from the training set only prevents biasing the model toward extreme values.
  • Native models provide a simple baseline for comparison.

📌 Additional

  • Transform target variable using logarithms for skewed distributions.
  • Experiment with polynomial degrees and regularization parameters.
  • Compare feature normalization methods (MinMaxScaler vs. StandardScaler).
  • Implement batch-trained Linear Regression from scratch.

Releases

No releases published

Packages

 
 
 

Contributors