This script compares two CSV files (supermarket_sales_dirty.csv and supermarket_sales_cleaned.csv) and creates a PDF report showing the top 5 biggest changes.
- Python 3.14 or higher
uvpackage manager (fast alternative to pip)
-
Install uv (if not already installed):
curl -LsSf https://astral.sh/uv/install.sh | shOr see uv installation guide.
-
Clone or download this project folder.
-
Open a terminal in the project folder.
-
Run the script using uv:
uv run main.py
uvwill automatically install required dependencies (pandas, matplotlib) and then execute the script.
Place these two CSV files inside a datasets/ folder (relative to the script):
datasets/supermarket_sales_dirty.csvdatasets/supermarket_sales_cleaned.csv
The script generates a file named report.pdf in the same folder. It contains:
- A table with the top 5 changed values (old value → new value, plus the difference).
- A horizontal bar chart of the absolute changes.
- Ensure both CSV files have the same number of rows (the script will trim to the shortest file).