Customer segmentation in the retail business is vital for improving customer engagement and profitability.
This project analyzes transactional data from 1/12/2010 to 9/12/2011 to identify different customer groups based on purchasing behavior.
Segmentation helps businesses personalize marketing strategies, retain customers, and optimize revenue.
- Analyze historical customer transactions to understand spending patterns.
- Segment customers using analytical and machine learning techniques.
- Derive actionable insights to guide targeted marketing and promotions.
- Online_Retail.csv -> CSV file containing more than 5 lacs transactions between 1st December 2010 and 9th December 2011 for a UK-based online retailer
- Jupyter notebook -> Jupyter file containg clean and commented code
- Report 1: Applications of Data Science in E-Commerce -> Report on how data science can be useful in E-Commerce market
- Report 2: Final Report -> Insights of the analysis and recommendations for the company
- Importing and understanding the dataset
- Data cleaning and preprocessing
- Exploratory data analysis (EDA)
- Feature engineering for customer behavior
- Applying clustering algorithms for segmentation
- Visualizing and interpreting results
- Programming Language: Python
- Libraries Used:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
- warnings (for suppressing runtime warnings)
Dataset: Online Retail Dataset
Source: UCI Machine Learning Repository
- Contains all 541,909 transactions between 1st December 2010 and 9th December 2011 for a UK-based online retailer.
- Each row represents a unique product purchase by a customer.
- Key columns:
InvoiceNo– Invoice numberStockCode– Product codeDescription– Product nameQuantity– Number of products purchasedInvoiceDate– Date of purchaseUnitPrice– Price per unitCustomerID– Unique ID per customerCountry– Customer location
- Removed missing customer IDs and invalid transactions.
- Filtered out canceled orders.
- Handled outliers in
QuantityandUnitPrice.
- Analyzed customer purchasing frequency and total revenue.
- Identified top-performing products and countries.
-
Applied K-Means clustering to segment customers.
-
Determined the optimal number of clusters using the Elbow Method.
- Plotted customer segments to interpret behaviors.
- Compared average RFM scores across segments.
- The K-Means algorithm grouped customers into 4 segments:
- Cluster 0: High-spending loyal customers
- Cluster 1: Medium-frequency, average spenders
- Cluster 2: Low-value one-time buyers
- Cluster 3: Inactive or lost customers
- The Elbow Method suggested 4 as the optimal number of clusters.
- Average Recency, Frequency, and Monetary scores were highest for Cluster 0.
Elbow curve for optimal K value
Segments Plot
--
- Implement advanced clustering methods (DBSCAN, hierarchical).
- Integrate real-time customer segmentation.
- Build a dashboard for live customer tracking.
Contributions are welcome!
To contribute:
- Fork this repository
- Create a new branch (
feature-branch-name) - Commit your changes
- Open a Pull Request
This project is licensed under the MIT License – see the LICENSE file for details.
- UCI Machine Learning Repository for providing the dataset.
- Open-source contributors for Python libraries used in this project.
