📈 Retail Sales Forecasting

The project demonstrates an end-to-end ML workflow — from raw data ingestion and feature engineering to model evaluation and an interactive Streamlit dashboard.

🚀 Project Overview

Retail sales data is highly volatile and influenced by temporal patterns. This project focuses on predicting daily revenue using historical transaction data and advanced feature engineering, achieving high predictive accuracy.

🔑 Key Highlights

Leakage-free ML pipeline
Strong feature engineering (lags, rolling stats, momentum)
Multiple ML models compared
Interactive Streamlit UI
Clean, modular, production-style codebase

🧠 Models Used

Model	Purpose
Random Forest Regressor	Baseline ensemble model
XGBoost Regressor	Final high-performance model

Feature engineering contributed more to performance improvement than adding new models.

📊 Final Model Performance

Model	RMSE	MAE	R²
Random Forest	279.53	175.40	0.909
XGBoost (Best)	240.12	153.10	0.933

✅ XGBoost explains 93.3% of daily revenue variance

🏗️ Project Architecture

retail-sale-forcasting/
│
├── app/
│   └── streamlit_app.py          # Streamlit dashboard
│
├── src/
│   ├── data_loader.py            # Load & aggregate raw data
│   ├── feature_engineering.py    # Lag, rolling, momentum features
│   ├── train_ml_models.py        # Model training
│   ├── evaluate_models.py        # Model evaluation
│   └── metrics.py                # RMSE, MAE, R²
│
├── data/
│   ├── raw/
│   │   └── retail_sales_dataset.csv
│   └── processed/
│       ├── daily_data.csv
│       ├── daily_data_features.csv
│       ├── X_train.csv
│       ├── X_test.csv
│       ├── y_train.csv
│       └── y_test.csv
│
├── models/
│   ├── random_forest.pkl
│   └── xgboost.pkl
│
├── requirements.txt
└── README.md

🔧 Feature Engineering Strategy

To capture real retail behavior, the following features were engineered:

✅ Memory (Lags)

Revenue lags: 1, 7, 14, 30 days
Transaction & quantity lags

✅ Trend & Momentum

Revenue difference (1-day, 7-day)
Percentage change (7-day)

✅ Volatility

Rolling standard deviation (7, 14, 30 days)

✅ Smoothed Baselines

Rolling mean revenue (7, 14, 30 days)

All features use past values only → no data leakage.

🖥️ Streamlit Dashboard

The Streamlit UI provides:

📈 Historical daily revenue visualization
📊 Model comparison table
🏆 Automatic best model selection
💼 Business-friendly interpretation of results

▶️ Run the dashboard

streamlit run app/streamlit_app.py

⚙️ Installation & Setup

1️⃣ Clone the repository

git clone <repo-url>
cd retail-sale-forcasting

2️⃣ Install dependencies

pip install -r requirements.txt

3️⃣ Run pipeline (optional)

python src/data_loader.py
python src/feature_engineering.py
python src/train_ml_models.py
python src/evaluate_models.py

💼 Business Interpretation

The model predicts daily revenue with an average error of ~150 units
High accuracy achieved through temporal feature engineering
Suitable for:
- Sales planning
- Inventory optimization
- Revenue trend analysis

🧰 Tech Stack

Language: Python
Libraries: Pandas, NumPy, Scikit-learn, XGBoost
Visualization: Streamlit
ML Techniques: Feature Engineering, Ensemble Learning, Time-aware Validation

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
app		app
data		data
models		models
src		src
.dockerignore		.dockerignore
Dockerfile		Dockerfile
Jenkinsfile		Jenkinsfile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📈 Retail Sales Forecasting

🚀 Project Overview

🔑 Key Highlights

🧠 Models Used

📊 Final Model Performance

🏗️ Project Architecture

🔧 Feature Engineering Strategy

🖥️ Streamlit Dashboard

▶️ Run the dashboard

⚙️ Installation & Setup

1️⃣ Clone the repository

2️⃣ Install dependencies

3️⃣ Run pipeline (optional)

💼 Business Interpretation

🧰 Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📈 Retail Sales Forecasting

🚀 Project Overview

🔑 Key Highlights

🧠 Models Used

📊 Final Model Performance

🏗️ Project Architecture

🔧 Feature Engineering Strategy

🖥️ Streamlit Dashboard

▶️ Run the dashboard

⚙️ Installation & Setup

1️⃣ Clone the repository

2️⃣ Install dependencies

3️⃣ Run pipeline (optional)

💼 Business Interpretation

🧰 Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages