Skip to content

CodeeSam/EDA_Demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploratory Data Analysis Demo

This repository contains a simple exploratory data analysis workflow using Python. The project demonstrates how to inspect, clean, and visualize a structured dataset using pandas, NumPy, seaborn, and matplotlib.

The notebook was developed as part of my early data analysis and machine learning learning journey and is maintained as a simple demonstration of foundational exploratory data analysis skills.

Project Overview

Exploratory Data Analysis is an important step in any data science or machine learning workflow. It helps analysts understand the structure of a dataset, detect missing values, identify outliers, explore variable relationships, and generate insights before model development.

In this project, a marketing/customer campaign dataset is analyzed to explore customer attributes and response patterns. The analysis includes data cleaning, missing value handling, univariate analysis, bivariate analysis, categorical analysis, and basic visualization.

Objectives

The main objectives of this project are to:

  • load and inspect a structured dataset
  • clean unnecessary or redundant columns
  • handle missing values
  • separate combined columns into meaningful features
  • explore numerical and categorical variables
  • visualize feature distributions
  • examine relationships between customer attributes and response outcomes
  • demonstrate basic exploratory data analysis using Python

Repository Structure

EDA_Demo/
├── EDA.ipynb
├── Marketing_Analysis.csv
├── README.md
├── requirements.txt
├── .gitignore
└── LICENSE

Depending on the current version of the repository, the dataset file may not be included. If the dataset is not included, users should place Marketing_Analysis.csv in the root directory before running the notebook.

Files Description

EDA.ipynb

This Jupyter notebook contains the exploratory data analysis workflow. It includes data loading, data cleaning, missing value handling, feature separation, and visual exploration of numerical and categorical variables.

Marketing_Analysis.csv

This is the dataset used in the notebook. The notebook expects this file to be available in the repository root directory.

requirements.txt

This file lists the Python packages required to run the notebook.

README.md

This file provides an overview of the project, usage instructions, limitations, and future improvement ideas.

Analysis Workflow

The notebook follows a basic EDA workflow.

1. Importing Libraries

The project uses common Python data analysis and visualization libraries, including:

  • pandas
  • NumPy
  • seaborn
  • matplotlib

2. Loading the Dataset

The dataset is loaded using pandas. The notebook uses skiprows=2 because the first two rows of the original file are not needed for analysis.

3. Data Cleaning

The cleaning steps include:

  • removing unnecessary columns
  • separating combined columns into individual variables
  • checking missing values
  • handling missing values in selected columns
  • preparing the dataset for analysis

4. Missing Value Handling

The notebook checks for missing values and handles them based on the nature of the affected variables. For example, missing values in categorical columns may be filled using the mode, while rows with missing target response values may be removed.

5. Univariate Analysis

Univariate analysis is performed to understand individual variables.

Examples include:

  • job category distribution
  • education distribution
  • salary summary statistics

6. Bivariate Analysis

Bivariate analysis is used to explore relationships between two variables.

Examples include:

  • salary versus balance
  • age versus balance
  • salary grouped by response
  • response differences across customer groups

7. Categorical Analysis

Categorical analysis is performed to examine how response rates vary across groups such as marital status and loan status.

8. Visualization

The notebook includes visualizations such as:

  • bar plots
  • pie charts
  • scatter plots
  • pair plots
  • heatmaps
  • box plots
  • count plots

How to Run the Project

Clone the repository:

git clone https://github.com/CodeeSam/EDA_Demo.git
cd EDA_Demo

Install the required dependencies:

pip install -r requirements.txt

Open the notebook:

jupyter notebook EDA.ipynb

Run the notebook cells in order.

Requirements

The main Python packages used in this project include:

pandas
numpy
seaborn
matplotlib
jupyter

A typical requirements.txt file may include:

pandas
numpy
seaborn
matplotlib
jupyter

Important Note on Dataset Availability

The notebook expects a file named:

Marketing_Analysis.csv

Project Note

This repository represents one of my early exploratory data analysis practice projects. It is maintained as part of my data science and machine learning learning archive.

The project is intended to demonstrate foundational EDA skills rather than advanced statistical modeling or machine learning.

Limitations

Some limitations of this project include:

  • The repository currently focuses on exploratory analysis only.
  • No predictive machine learning model is included.
  • The workflow is notebook-based and not modularized into scripts.
  • The analysis depends on the availability and structure of the original dataset.
  • Some visualizations may require further formatting for publication-level presentation.

Applications

This type of project can be useful as a starting point for:

  • exploratory data analysis practice
  • marketing analytics
  • customer behavior analysis
  • data visualization learning
  • beginner-level data science training
  • preparing datasets for machine learning workflows

Author

Samson Ayorinde Oni
Data Science | Machine Learning | Computational Research

License

This repository is released under the MIT License.

About

A Demo on Exploratory Data Analysis

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors