- Language: Python 3.12 | R 4.4
- Core Libraries:
pandas,numpy,matplotlib,seaborn - Optional/Advanced:
plotly,ydata-profiling
Before diving into analysis, the data was prepared using the following steps:
- Handling Missing Values: [e.g., dropped nulls, imputed with median, filled via KNN]
- Duplicate Removal: [e.g., removed X duplicated rows]
- Outlier Treatment: [e.g., capped extreme values or removed anomalies]
- Data Type Conversion: [e.g., string dates cast to
datetime]
Here are the primary observations uncovered during the exploration process:
- Insight 1: [e.g., Sales peak during Q4, specifically in November.]
- Insight 2: [e.g., Strong positive correlation ((r = 0.85)) between feature A and feature B.]
- Insight 3: [e.g., Uneven distribution in the target variable; data is imbalanced.]
Below is a summary of the visualizations used to understand the data's distribution and relationships:
- Univariate Analysis: Histograms and box plots to check individual feature distributions and skewness.
- Bivariate Analysis: Scatter plots and violin plots to identify relationships between the target variable and features.
- Multivariate Analysis: Correlation heatmaps to detect multicollinearity among numerical variables.