This project is an Exploratory Data Analysis (EDA) focused on Data Center energy efficiency, addressing the critical correlation between workload, temperature, and electrical power consumption.
Originally structured at an academic level and converted into a clean Python script, the goal of this repository is to demonstrate skills in data cleaning, statistical inference, and graphical visualization applied to infrastructure metrics.
- Automatic Extraction: Autonomously consumes the
programmer3/data-center-cold-source-control-datasetdataset via Kaggle API. - Treatment and Normalization: Renames extracted columns to clear English standards (
workload,temperature,power_consumption,cooling_parameters), converts data to numeric types, and handles null values using robust medians. - Outlier Analysis: Identifies critical peaks in temperature and processing using the Interquartile Range (IQR) technique, removing anomalies and sensor noise.
- Relevant Feature Engineering:
esforco_energia: Energy cost relative to workload.risco_termico: Thermal evaluation versus cooling parameters.
- Visual Insights (Charts):
- Population histograms.
- Comparative boxplots showing data before and after outlier removal.
- Variable correlation heatmap.
- Scatterplot between workload and power.
├── main.py # Main Python script containing data routine, engineering, and plotting
└── .github/ # CI workflows and issue templates
Install the main dependencies and run the script:
pip install pandas numpy matplotlib seaborn scikit-learn kagglehub
python main.pyAnalysis developed for exploratory purposes in green IT management and operational stability.