Skip to content

Harh2646/RetailLens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RetailLens — Sales Performance & Market Intelligence

I built this project to understand how SQL and Python connect to real business decisions. I wanted to go beyond just learning syntax and actually find something meaningful in a dataset. I wanted to build something where the analysis actually led somewhere — where the numbers told you what was wrong and what to do about it.

So I picked the Superstore dataset (classic, I know, but it's actually quite rich once you dig in) and decided to go end to end — raw CSV to SQL to Python EDA to a proper Power BI dashboard. No shortcuts.

---

What this project covers

The core question I wanted to answer: where is this retail business actually making money, and where is it quietly losing it?

Turns out the answer is more interesting than you'd expect. The Furniture category looks fine on the surface — decent sales, high transaction values. But once you run the discount analysis, it's sitting at an average 35% discount rate with negative margins on a huge chunk of orders. The business is essentially paying customers to buy furniture.

That kind of finding is what I wanted the dashboard to surface clearly, not bury in a table.

---

Tech Stack

  • PostgreSQL — all the heavy analytical work happens here. 7 SQL scripts covering regional performance, product margins, YoY trends, discount impact, customer frequency, and shipping analysis.
  • Python (pandas, matplotlib, seaborn) — EDA to validate the SQL findings and explore distributions before building visuals
  • Power BI — 4-page interactive dashboard with slicers, maps, and DAX measures

---

The Findings

1. Discounting is destroying Furniture margins
Orders with 30%+ discount are loss-making on average. Furniture has the highest average discount of any category at ~35%. The Tables sub-category is net negative overall — it loses money across all orders combined.

2. West leads both revenue ($725K) and margin (14.94%). Central has the weakest margin at 7.9% despite being third in revenue.
If you're optimizing for top-line growth, focus West. If you're optimizing for profitability, Central is actually your healthiest region. These require different strategies.

3. Q4 revenue spikes don't translate to profit
The holiday season drives volume, but margins compress at the same time. Almost certainly because of promotional discounting. The business is growing revenue in Q4 at the cost of its annual profit.

4. First Class shipping has the best margin at 13.93%. Standard Class dominates volume with 57% of all profit.
Counter-intuitive, but customers using Same Day shipping are probably buying high-value Technology items where discounts are lower. Worth investigating.

---

## Dashboard preview

![Monthly Revenue Trend](screenshots/eda_monthly_trend.png)

![Regional Performance](screenshots/eda_regional.png)

![Discount vs Profit](screenshots/eda_discount_impact.png)

![Category Breakdown](screenshots/eda_category_top.png)

Project Structure

RetailLens/
│
├── README.md
├── requirements.txt
│
├── data/
│   ├── superstore\_sample.csv      ← 200-row sample (full data on Kaggle)
│   └── processed/                 ← cleaned CSVs exported from Python
│
├── sql/
│   ├── 00\_setup.sql               ← table creation + data loading
│   ├── 01\_regional\_performance.sql
│   ├── 02\_product\_analysis.sql
│   ├── 03\_monthly\_trends.sql
│   ├── 04\_discount\_impact.sql
│   ├── 05\_customer\_segments.sql
│   ├── 06\_shipping\_analysis.sql
│   └── 07\_executive\_summary\_view.sql
│
├── notebooks/
│   └── eda.ipynb                  ← full exploratory analysis
│
├── dashboard/
│   ├── RetailLens.pbix            ← Power BI file
│   └── Notes.md          ← how to rebuild the dashboard
│
├── screenshots/
│   ├── page1\_overview.png
│   ├── page2\_regional.png
│   ├── page3\_products.png
│   └── page4\_trends.png
│
└── docs/
    ├── SETUP.md                   ← full setup instructions
    

---

Dashboard Preview

*(Add screenshots here once Power BI dashboard is built)*
Page 1 — Executive Overview
Page 2 — Regional Intelligence
Page 3 — Product Analysis
Page 4 — Trends & Shipping

---

Running This Locally

Full instructions are in docs/SETUP.md. Short version:

# 1. Install Python dependencies
pip install -r requirements.txt

# 2. Set up PostgreSQL database
# Create a DB called 'retailens', then run sql/00\_setup.sql

# 3. Load the dataset
# Download from Kaggle (link below) and import into PostgreSQL

# 4. Run SQL queries in order from the sql/ folder

# 5. Open notebooks/eda.ipynb in Jupyter and run all cells

# 6. Open dashboard/RetailLens.pbix in Power BI Desktop

Dataset: Sample Superstore — Kaggle

---

What I'd Build Next

  • Customer lifetime value segmentation (this feeds naturally into the next project, PulseIQ)
  • A simple linear regression forecast for next quarter revenue by region
  • Automated daily refresh using Python + scheduled PostgreSQL queries
  • An anomaly detection alert when weekly margin drops below 8%

---

A Note on the SQL

I wrote all queries to run in PostgreSQL. Most will work in MySQL or SQLite with small changes (mainly date functions and EXTRACT syntax). The window functions in queries 03 and 06 require a modern SQL engine — they won't run in very old MySQL versions.

If you're using this to learn SQL, I'd suggest running query 04 (discount impact) first — it has the most interesting business interpretation and introduces CASE WHEN bucketing which comes up constantly in real DA work.

---

Dataset credit: Tableau's Sample Superstore dataset, widely used in data analytics education.

About

End-to-end retail sales analytics — PostgreSQL, Python EDA, and Power BI dashboard. Found discount breakeven at 20% and 3 loss-making sub-categories in Superstore data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Contributors