Welcome to the repository for Advanced Exploratory Data Analysis (EDA) on synthetic datasets. This project dives deep into data visualization, distribution analysis, correlation checks, and outlier handling techniques, providing a comprehensive walkthrough of modern EDA techniques.
This project uses a synthetic dataset to demonstrate:
Univariate and multivariate analysis
Visualization techniques using libraries like Matplotlib and Seaborn
Correlation and distribution exploration
Handling of missing data and outliers
Insights derivation for machine learning model readiness
The analysis is fully documented in a Jupyter Notebook for easy readability and reproducibility.
Data.ipynb: Jupyter notebook containing all EDA steps, visualizations, and insights.
For a detailed walkthrough and insights behind the decisions made in this analysis, check out the blog post here: https://medium.com/@sachinsmanoj02/advanced-exploratory-data-analysis-eda-on-synthetic-datasets-d7c82bf78a14
π Tools & Libraries Used Python
Pandas
NumPy
Seaborn
Matplotlib
Scikit-learn (for preprocessing)