Skip to content

unknownsteve7/Air-Quality-Tracker

Repository files navigation

Global Air Quality & Weather Tracker (AQT)

A professional-grade Data Engineering pipeline that extracts, transforms, and visualizes global weather and air pollution data in real-time.

Overview

This project implements a complete ETL (Extract, Transform, Load) pipeline designed for a portfolio-ready data engineering solution. It pulls data from multiple OpenWeatherMap API endpoints, processes it using pandas, ensures data quality through custom validation, and serves it via a premium Streamlit dashboard.


Key Features

  • Automated ETL Pipeline: Full lifecycle from API ingestion to local storage.
  • SQL & Big Data Formats: Simultaneously stores data in a SQLite database (.db) and an industry-standard Parquet file (.parquet).
  • Data Quality Assurance: Integrated validation layer that checks for missing values, out-of-range temperatures, and corrupted data.
  • Real-time Dashboard: Premium Streamlit UI featuring interactive maps, air quality health insights, and comparative analysis charts.
  • Autonomous Scheduling: A background scheduler that triggers the pipeline every hour to keep data fresh.
  • Professional Logging: Dual-stream logging (Console + File) to track pipeline health and performance.

Architecture

  1. Extract: Python requests calls to OpenWeatherMap (Weather and Pollution APIs).
  2. Transform: pandas merge and normalization. Temperature conversion (Kelvin to Celsius) and AQI categorization.
  3. Validate: Logic checks to prevent "garbage-in, garbage-out" scenarios.
  4. Load:
    • SQLite: For relational queries and dashboard serving.
    • Parquet: For high-performance analytical storage (Data Lake style).
  5. Visualize: Streamlit & Plotly interactive interface.

Project Structure

aqt/
├── fetch_data.py        # API Extraction logic
├── transform_data.py    # Merging, Data Cleaning & Validation
├── database_manager.py  # Loading logic (SQL & Parquet)
├── run_pipeline.py      # Main ETL orchestrator with Logging
├── auto_scheduler.py    # Background automation script
├── dashboard.py         # Streamlit visualization app
└── requirements.txt     # Project dependencies

Getting Started

1. Prerequisites

2. Installation

pip install -r requirements.txt

3. Usage

  • Run a single data fetch:
    python run_pipeline.py
  • Start the automated scheduler:
    python auto_scheduler.py
  • Launch the Dashboard:
    streamlit run dashboard.py

Sample Insights

The dashboard provides a health-centric view of global cities:

  • AQI Level 1 (Good): "Air quality is satisfactory."
  • AQI Level 5 (Very Poor): "Health alert: stay indoors!"

Author

Naga Mohan Madicharla A Data Engineering beginner project exploring APIs, Pandas, and Automation.

About

Building a pipeline that fetches current weather AND air quality for 5 cities, combines them into a single record, and stores them in a database for historical tracking.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages