Skip to content

Lightleonardo/Linear_regression_evaluation

Repository files navigation

Simple Linear Regression Evaluation

This project contains a Jupyter Notebook that explores a marketing and sales dataset, identifies the strongest advertising channel for predicting sales, and fits a simple linear regression model using Ordinary Least Squares (OLS).

The analysis is implemented in Linear_Regression_analysis.ipynb and includes data inspection, missing-value handling, exploratory visualizations, correlation analysis, model fitting, regression diagnostics, and a plain-language interpretation of the results.

Project Overview

  • Loads the marketing_and_sales_data_evaluate_lr.csv dataset
  • Reviews the dataset structure, data types, and summary statistics
  • Fills missing numeric values with the median
  • Generates exploratory plots for variable distributions and channel-vs-sales relationships
  • Builds a correlation heatmap to compare predictors
  • Selects the strongest single predictor for sales
  • Fits an OLS simple linear regression model
  • Produces regression diagnostics and an interpretation summary

Key Findings

  • Dataset size: 4,572 rows and 4 columns
  • Variables: TV, Radio, Social_Media, Sales
  • Best predictor of Sales: TV
  • Correlation with Sales:
    • TV: 0.9966
    • Radio: 0.8674
    • Social_Media: 0.5281
  • Final model formula: Sales ~ TV
  • Model performance:
    • R-squared: 0.9933
    • Coefficient for TV: 3.5545
    • Intercept: 0.2923
    • P-value: 0.000000

Interpretation: for every $1 increase in TV advertising budget, predicted sales increase by about $3.55, based on this fitted simple linear regression model.

Project Files

  • Linear_Regression_analysis.ipynb - main notebook for the full analysis
  • marketing_and_sales_data_evaluate_lr.csv - dataset used in the notebook
  • plot_distributions.png - variable distribution plots
  • plot_scatter.png - scatter plots of marketing channels against sales
  • plot_correlation_heatmap.png - feature correlation heatmap
  • plot_regression_line.png - fitted regression line plot
  • plot_diagnostics.png - regression diagnostic plots

Environment Setup

1. Create a virtual environment

python -m venv venv

2. Activate the virtual environment

On Windows:

venv\Scripts\activate

On macOS/Linux:

source venv/bin/activate

3. Install required packages

pip install notebook pandas numpy matplotlib seaborn statsmodels scipy

4. Launch Jupyter Notebook

jupyter notebook

Then open Linear_Regression_analysis.ipynb.

How To Run

  1. Activate your virtual environment.
  2. Launch Jupyter Notebook.
  3. Open Linear_Regression_analysis.ipynb.
  4. Run the notebook cells from top to bottom.

Notes

  • The notebook imputes missing values using the median for each numeric column.
  • Diagnostic plots are included to check linearity, residual normality, and homoscedasticity.
  • This project focuses on simple linear regression using a single best predictor rather than a multiple regression model.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors