SDSC 2001 Python for Data Science
Final Course Project
Taught by Prof. LI Xinyue, City University of Hong Kong
Final Course Grade: A+
The goal of this project is to detect fraudulent and non-fraudulent transactions from a given dataset that was taken from Kaggle and manipulated by the Professor.
The following files has been posted:
- Project instructions
project-instructions.ipynb - Notebook
fraud-detection.ipynb - Dataset
creditcard_test.csv creditcard_train.csv
Language: Python
Technology: Pandas, Matplotlib, Seaborn, scikit-learn
There are 5 main modules:
- Data Exploration
We start by exploring the dataset, handling missing values and outliers. - Data Visualization
Continue to explore the dataset using visualizations and use them to explain the findings. - Dimension Reduction
Apply unsupervised learning methods to achieve dimension reduction. We use Principal Component Analysis (PCA) as our dimensionality reduction algorithm. - Classification
we use the dataset to train different models which are Gaussian NB, Decision Tree Clasifier, and Logistic Regression. We prepared both data for training and testing. For each model, we count the accuracy scores, plot a confusion matrix, and 5-Fold Cross Validation. - Summary
