Customer Segmentation using Clustering Algorithms

Overview

This notebook demonstrates customer segmentation using various clustering algorithms such as K-Means, DBSCAN, and Agglomerative Clustering. The segmentation is based on a dataset with various customer features such as balance, purchase frequency, cash advance usage, and credit limit. The goal is to group customers into meaningful clusters that can help in profiling customer behavior and tailoring marketing strategies.

Dataset

The dataset contains customer financial information, including:

BALANCE: The account balance of the customer.
BALANCE_FREQUENCY: How frequently the balance is updated.
PURCHASES: Total purchases made by the customer.
ONEOFF_PURCHASES: One-time purchases made by the customer.
INSTALLMENTS_PURCHASES: Purchases made in installments.
CASH_ADVANCE: Cash advances taken by the customer.
CREDIT_LIMIT: The credit limit of the customer.
PAYMENTS: Total payments made by the customer.
MINIMUM_PAYMENTS: Minimum payments made by the customer.
PRC_FULL_PAYMENT: Percentage of full payments made by the customer.
TENURE: The duration of the customer's account.

Preprocessing

Missing Value Imputation: SimpleImputer is used to fill missing values in the dataset using the mean strategy.
Feature Scaling: StandardScaler is applied to ensure features are on the same scale, which is crucial for clustering.
Outlier and Distribution Analysis: Histograms, boxplots, and correlation heatmaps are generated to visualize feature distributions and relationships.

Clustering Algorithms

K-Means Clustering:
- Optimal number of clusters determined using the Elbow method.
- Silhouette score used to evaluate cluster quality.
- Customer profiling based on clusters formed by K-Means.
DBSCAN Clustering:
- Density-based clustering algorithm.
- Noise points are identified and excluded from cluster evaluation.
- Silhouette score calculated (if multiple clusters are formed).
Agglomerative Clustering:
- Hierarchical clustering using Euclidean distance and Ward linkage.
- Silhouette score used to assess cluster separation.

Dimensionality Reduction

PCA (Principal Component Analysis):
- Data is reduced to 2 dimensions to visualize clusters formed by the algorithms.
t-SNE (t-distributed Stochastic Neighbor Embedding):
- Non-linear dimensionality reduction technique used for visualizing clusters in 2D space.

Customer Profiling

Each cluster formed by K-Means is analyzed based on average values of key features like balance, purchases, credit limit, and payments.
Insights into customer behavior such as high spenders, low-frequency payers, and cash advance users are provided.
Marketing strategies are suggested for each customer group.

Visualizations

Histograms: Display feature distributions.
Boxplots: Visualize the spread and outliers for each feature.
Correlation Heatmaps: Show relationships between features.
Scatter Plots: Visualize clusters using PCA and t-SNE.

Usage

Install the required libraries:

pip install pandas numpy scikit-learn matplotlib seaborn

Run the notebook in a Jupyter environment:

jupyter notebook customer_segmentation.ipynb

Clustering Results

Silhouette scores are calculated for each clustering method to evaluate the quality of clusters.
The notebook provides detailed visualizations and insights into each cluster’s characteristics.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
customer_segmentation.ipynb		customer_segmentation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Segmentation using Clustering Algorithms

Overview

Dataset

Preprocessing

Clustering Algorithms

Dimensionality Reduction

Customer Profiling

Visualizations

Usage

Clustering Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Customer Segmentation using Clustering Algorithms

Overview

Dataset

Preprocessing

Clustering Algorithms

Dimensionality Reduction

Customer Profiling

Visualizations

Usage

Clustering Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages