This repository contains a simple exploratory data analysis workflow using Python. The project demonstrates how to inspect, clean, and visualize a structured dataset using pandas, NumPy, seaborn, and matplotlib.
The notebook was developed as part of my early data analysis and machine learning learning journey and is maintained as a simple demonstration of foundational exploratory data analysis skills.
Exploratory Data Analysis is an important step in any data science or machine learning workflow. It helps analysts understand the structure of a dataset, detect missing values, identify outliers, explore variable relationships, and generate insights before model development.
In this project, a marketing/customer campaign dataset is analyzed to explore customer attributes and response patterns. The analysis includes data cleaning, missing value handling, univariate analysis, bivariate analysis, categorical analysis, and basic visualization.
The main objectives of this project are to:
- load and inspect a structured dataset
- clean unnecessary or redundant columns
- handle missing values
- separate combined columns into meaningful features
- explore numerical and categorical variables
- visualize feature distributions
- examine relationships between customer attributes and response outcomes
- demonstrate basic exploratory data analysis using Python
EDA_Demo/
├── EDA.ipynb
├── Marketing_Analysis.csv
├── README.md
├── requirements.txt
├── .gitignore
└── LICENSE
Depending on the current version of the repository, the dataset file may not be included. If the dataset is not included, users should place Marketing_Analysis.csv in the root directory before running the notebook.
This Jupyter notebook contains the exploratory data analysis workflow. It includes data loading, data cleaning, missing value handling, feature separation, and visual exploration of numerical and categorical variables.
This is the dataset used in the notebook. The notebook expects this file to be available in the repository root directory.
This file lists the Python packages required to run the notebook.
This file provides an overview of the project, usage instructions, limitations, and future improvement ideas.
The notebook follows a basic EDA workflow.
The project uses common Python data analysis and visualization libraries, including:
- pandas
- NumPy
- seaborn
- matplotlib
The dataset is loaded using pandas. The notebook uses skiprows=2 because the first two rows of the original file are not needed for analysis.
The cleaning steps include:
- removing unnecessary columns
- separating combined columns into individual variables
- checking missing values
- handling missing values in selected columns
- preparing the dataset for analysis
The notebook checks for missing values and handles them based on the nature of the affected variables. For example, missing values in categorical columns may be filled using the mode, while rows with missing target response values may be removed.
Univariate analysis is performed to understand individual variables.
Examples include:
- job category distribution
- education distribution
- salary summary statistics
Bivariate analysis is used to explore relationships between two variables.
Examples include:
- salary versus balance
- age versus balance
- salary grouped by response
- response differences across customer groups
Categorical analysis is performed to examine how response rates vary across groups such as marital status and loan status.
The notebook includes visualizations such as:
- bar plots
- pie charts
- scatter plots
- pair plots
- heatmaps
- box plots
- count plots
Clone the repository:
git clone https://github.com/CodeeSam/EDA_Demo.git
cd EDA_DemoInstall the required dependencies:
pip install -r requirements.txtOpen the notebook:
jupyter notebook EDA.ipynbRun the notebook cells in order.
The main Python packages used in this project include:
pandas
numpy
seaborn
matplotlib
jupyter
A typical requirements.txt file may include:
pandas
numpy
seaborn
matplotlib
jupyter
The notebook expects a file named:
Marketing_Analysis.csv
This repository represents one of my early exploratory data analysis practice projects. It is maintained as part of my data science and machine learning learning archive.
The project is intended to demonstrate foundational EDA skills rather than advanced statistical modeling or machine learning.
Some limitations of this project include:
- The repository currently focuses on exploratory analysis only.
- No predictive machine learning model is included.
- The workflow is notebook-based and not modularized into scripts.
- The analysis depends on the availability and structure of the original dataset.
- Some visualizations may require further formatting for publication-level presentation.
This type of project can be useful as a starting point for:
- exploratory data analysis practice
- marketing analytics
- customer behavior analysis
- data visualization learning
- beginner-level data science training
- preparing datasets for machine learning workflows
Samson Ayorinde Oni
Data Science | Machine Learning | Computational Research
This repository is released under the MIT License.