This tool cleans up supermarket sales data to make it easier to analyze and use.
The script processes the raw sales data and:
- Removes incomplete records - Deletes rows with missing information
- Removes duplicates - Keeps only unique sales records
- Standardizes text formatting - Makes all text consistent:
- Removes extra spaces from text fields
- Capitalizes city names properly (e.g., "Yangon")
- Formats customer types ("Member", "Normal")
- Formats gender ("Male", "Female")
- Formats product lines ("Sports And Travel", "Fashion Accessories")
- Formats payment methods ("Credit Card", "Cash", "Ewallet")
- Makes branch codes uppercase ("A", "B", "C")
- Fixes date formats - Converts dates to a standard format
- Place your raw sales data in
supermarket_sales_dirty.csv - Run the cleanup script using
uv:uv run python cleanup.py
- The cleaned data will be saved as
supermarket_sales_cleaned.csv
This script uses uv for Python package management. Make sure you have uv installed:
curl -LsSf https://astral.sh/uv/install.sh | shor
# Fedora based
sudo dnf install uv
Supermarket sales by alexhuitron and supermarket_sales_dirty by Claude
Input: supermarket_sales_dirty.csv - Raw sales data with possible errors
Output: supermarket_sales_cleaned.csv - Clean, standardized data ready for analysis