This project contains a Jupyter Notebook that explores a marketing and sales dataset, identifies the strongest advertising channel for predicting sales, and fits a simple linear regression model using Ordinary Least Squares (OLS).
The analysis is implemented in Linear_Regression_analysis.ipynb and includes data inspection, missing-value handling, exploratory visualizations, correlation analysis, model fitting, regression diagnostics, and a plain-language interpretation of the results.
- Loads the
marketing_and_sales_data_evaluate_lr.csvdataset - Reviews the dataset structure, data types, and summary statistics
- Fills missing numeric values with the median
- Generates exploratory plots for variable distributions and channel-vs-sales relationships
- Builds a correlation heatmap to compare predictors
- Selects the strongest single predictor for sales
- Fits an OLS simple linear regression model
- Produces regression diagnostics and an interpretation summary
- Dataset size: 4,572 rows and 4 columns
- Variables:
TV,Radio,Social_Media,Sales - Best predictor of
Sales:TV - Correlation with
Sales:TV:0.9966Radio:0.8674Social_Media:0.5281
- Final model formula:
Sales ~ TV - Model performance:
R-squared:0.9933- Coefficient for
TV:3.5545 - Intercept:
0.2923 - P-value:
0.000000
Interpretation: for every $1 increase in TV advertising budget, predicted sales increase by about $3.55, based on this fitted simple linear regression model.
Linear_Regression_analysis.ipynb- main notebook for the full analysismarketing_and_sales_data_evaluate_lr.csv- dataset used in the notebookplot_distributions.png- variable distribution plotsplot_scatter.png- scatter plots of marketing channels against salesplot_correlation_heatmap.png- feature correlation heatmapplot_regression_line.png- fitted regression line plotplot_diagnostics.png- regression diagnostic plots
python -m venv venvOn Windows:
venv\Scripts\activateOn macOS/Linux:
source venv/bin/activatepip install notebook pandas numpy matplotlib seaborn statsmodels scipyjupyter notebookThen open Linear_Regression_analysis.ipynb.
- Activate your virtual environment.
- Launch Jupyter Notebook.
- Open
Linear_Regression_analysis.ipynb. - Run the notebook cells from top to bottom.
- The notebook imputes missing values using the median for each numeric column.
- Diagnostic plots are included to check linearity, residual normality, and homoscedasticity.
- This project focuses on simple linear regression using a single best predictor rather than a multiple regression model.