This repository contains a straightforward implementation of a Simple Linear Regression model using Python. The primary goal of this project is not to solve a complex real-world problem, but to serve as a clear, educational demonstration of the fundamental principles of statistical modeling and how to interpret its results using the statsmodels library.
Linear Regression is one of the most fundamental algorithms in statistics and machine learning. This project walks through the essential steps of building a regression model:
- Defining the independent (X) and dependent (Y) variables.
- Adding a constant to the independent variable to account for the model's intercept.
- Fitting an Ordinary Least Squares (OLS) model to the data.
- Generating and interpreting a comprehensive summary of the model's performance.
This serves as a foundational exercise for anyone learning data science or statistical analysis.
- Language: Python
- Libraries:
numpy: For numerical operations and array management.statsmodels: A powerful Python module for statistical modeling and econometrics.
To run this project on your local machine, please follow these steps:
1. Clone the repository:
git clone https://github.com/ifanhakm/nama-repository-anda.git
cd nama-repository-anda2. Create a virtual environment (recommended):
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`3. Install the required libraries:
pip install numpy statsmodels4. Execute the Python script:
python nama_file_anda.pyRunning the script will print a detailed OLS Regression Results summary to your console. This summary provides crucial statistical information about the model, including:
- R-squared: A measure of how well the model explains the variance in the dependent variable.
- coef: The estimated coefficients for the constant (intercept) and the independent variable (slope).
- P>|t|: The p-value, which helps determine the statistical significance of each variable. A low p-value (typically < 0.05) indicates that the variable is a significant predictor.
- Confidence Interval: The range in which the true coefficient is likely to fall.
This output is key to evaluating the model's validity and understanding the relationship between the variables.