This project focuses on supervised learning using linear regression models for predicting apartment rental prices.
It demonstrates preprocessing, feature engineering, model training, evaluation, and handling overfitting/underfitting using Kaggle RentHop apartment listings.
The goal of this project is to understand the linear modeling pipeline — from data preprocessing and exploration to training and evaluating regression models.
The final task is to predict apartment rental prices based on features like the number of bedrooms, bathrooms, and interest level.
- Machine Learning Theory
- Linear Regression
- Overfitting vs. Underfitting
- Regularization (L1, L2, ElasticNet)
- Data Analysis
- Data loading and exploration
- Outlier detection and handling
- Data visualization (histograms, boxplots, scatterplots, heatmaps)
- Feature Engineering
- Creating new features (squared terms, top features from listing highlights)
- Using
PolynomialFeatures
- Model Training & Evaluation
- Linear Regression
- Ridge, Lasso, ElasticNet
- Decision Tree Regressor
- Baseline Models (Mean & Median)
- Model Metrics
- Mean Absolute Error (MAE)
- Root Mean Squared Error (RMSE)
- R² coefficient
- Python 3
- Jupyter Notebook
- Pandas, NumPy
- Matplotlib, Seaborn
- Scikit-learn
To develop an understanding of:
- Analyzing and visualizing data
- Training linear models and regularized versions
- Evaluating and interpreting regression models
- Handling overfitting using polynomial features and regularization
Linear_Models/
│
├── data/ # Dataset (from Kaggle)
├── notebooks/ # Jupyter notebooks
│ └── linear_models.ipynb
├── src/ # Python scripts
│ └── LinearRegression.py
│ └── ...
├── requirements.txt
└── README.md
python -m venv lm_env
lm_env\Scripts\activatepython3 -m venv lm_env
source lm_env/bin/activatepip install --upgrade pip
pip install -r requirements.txtpip install ipykernel
python -m ipykernel install --user --name=lm_env --display-name "Python (lm_env)"- After this, select Python (lm_env) in VS Code or Jupyter Notebook as the kernel.
- Jupyter Notebook with:
- Data exploration and visualization
- Preprocessing and feature engineering
- Model training and evaluation
- Comparisons between Linear Regression, Ridge, Lasso, ElasticNet, Decision Tree, and native baselines
- Polynomial features can improve training performance but may overfit.
- Regularization (L1, L2) stabilizes high-degree polynomial models.
- Removing outliers from the training set only prevents biasing the model toward extreme values.
- Native models provide a simple baseline for comparison.
- Transform target variable using logarithms for skewed distributions.
- Experiment with polynomial degrees and regularization parameters.
- Compare feature normalization methods (MinMaxScaler vs. StandardScaler).
- Implement batch-trained Linear Regression from scratch.