Skip to content

DennisHao1211/Python-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Data Analysis

This repository is an introductory NumPy + Matplotlib data analysis exercise.

To run this project, you only need a basic scientific Python environment with NumPy (for loading and processing the CO₂ dataset) and Matplotlib (for plotting and saving visualizations). Make sure your Python version is 3.9+, and install the required packages with either conda install numpy matplotlib or pip install numpy matplotlib. After installing these libraries, you can directly run the script to reproduce all data analysis steps and figures.

Its core task is to load, clean, visualize, and export results from country-level CO2 emissions intensity data (1990-2014).

Environment Setup

Requirements:

  • Python 3.9+
  • numpy
  • matplotlib

Option A: venv + pip (recommended)

cd /path/to/Python-Data-Analysis
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install numpy matplotlib

Option B: conda

cd /path/to/Python-Data-Analysis
conda create -n py-data-analysis python=3.11 -y
conda activate py-data-analysis
conda install numpy matplotlib -y

Optional check:

python -c "import numpy, matplotlib; print('OK')"

Run the script (from the repository root):

python3 python101.py

File-by-File Description (Purpose and Relationships)

File Purpose Relationship to Other Files
python101.py Main project script. Handles data loading, cleaning, country-to-series mapping, plotting, classification, and file output. Reads co2py.csv; generates/overwrites 3countries.pdf, below_average.out, and reducers.dat.
co2py.csv Raw input dataset. The first column is country name; the remaining columns are yearly CO2 intensity values from 1990 to 2014. Loaded by python101.py and transformed into numeric arrays plus a country-name array; it is the single data source for all analysis steps.
3countries.pdf Visualization output file showing time-series plots for the United States, Afghanistan, and China. Generated by python101.py via plt.savefig("3countries.pdf").
below_average.out Text output file listing countries with 2014 emissions intensity below or equal to the global mean in the dataset. Produced by python101.py after computing mean2014.
reducers.dat Text output file listing countries whose 2014 value is lower than their 2013 value. Produced by python101.py after comparing the last two yearly columns (2014 vs 2013).
README.md Project documentation. Explains how all files work together.
.DS_Store macOS Finder metadata file. Not part of the analysis logic; safe to ignore.

File Relationships and Execution Flow

  1. python101.py reads co2py.csv.
  2. The script cleans the raw data (removes header row and country-name column) and builds a country -> yearly series mapping.
  3. It plots selected countries and exports 3countries.pdf.
  4. It computes the 2014 mean and writes below_average.out.
  5. It compares 2014 vs 2013 values and writes reducers.dat.

In short, the data flow is:

co2py.csv -> python101.py -> (3countries.pdf, below_average.out, reducers.dat)

Notes

  • The script contains multiple tutorial-style sections (including repeated load/print steps) to demonstrate basic NumPy and Matplotlib operations.
  • It uses relative paths for reading and writing files, so run it from the repository root directory.

About

Data analysis project using NumPy for CSV processing and numerical operations, and Matplotlib for plotting CO₂ emission trends (1990–2014) from World Bank data. Python scripts demonstrating data loading, cleaning, and numerical analysis, along with data visualization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages