Skip to content

Add anomaly detection project#18

Open
rashidrao-pk wants to merge 1 commit into
Prodigy-InfoTech:mainfrom
rashidrao-pk:add-anomaly-detection-project
Open

Add anomaly detection project#18
rashidrao-pk wants to merge 1 commit into
Prodigy-InfoTech:mainfrom
rashidrao-pk:add-anomaly-detection-project

Conversation

@rashidrao-pk

@rashidrao-pk rashidrao-pk commented Jun 10, 2026

Copy link
Copy Markdown

Credit Card Fraud Detection using Anomaly Detection

Issue

Fixes #18

Overview

This project demonstrates how anomaly detection techniques can be used to identify fraudulent credit card transactions. Since fraudulent transactions are extremely rare compared to legitimate transactions, anomaly detection provides an effective approach for detecting unusual patterns without relying heavily on balanced labeled data.

Dataset

The project uses the Credit Card Fraud Detection dataset, which contains transactions made by European cardholders in September 2013.

Dataset Features:

  • 284,807 transactions
  • 492 fraudulent transactions
  • Highly imbalanced dataset
  • PCA-transformed features (V1–V28)
  • Additional features:
    • Time
    • Amount
    • Class (0 = Normal, 1 = Fraud)

Dataset Source:
https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

Objectives

  • Explore and understand class imbalance
  • Apply preprocessing and feature scaling
  • Train anomaly detection models
  • Compare different anomaly detection approaches
  • Evaluate performance using classification metrics

Models Implemented

1. Isolation Forest

Isolation Forest isolates anomalies by randomly selecting features and split values. Anomalies require fewer splits and are therefore easier to isolate.

2. Local Outlier Factor (LOF)

LOF identifies anomalies by comparing the local density of a sample with the density of its neighbors.

3. One-Class SVM

One-Class SVM learns the boundary of normal transactions and identifies samples outside this boundary as anomalies.

Evaluation Metrics

The following metrics are used for evaluation:

  • Accuracy
  • Precision
  • Recall
  • F1-Score
  • ROC-AUC
  • Precision-Recall Curve
  • Confusion Matrix

Project Structure

Anomaly Detection/
│
├── anomaly_detection.ipynb
├── README.md
└── dataset/

Workflow

  1. Load Dataset
  2. Perform Exploratory Data Analysis (EDA)
  3. Analyze Class Distribution
  4. Scale Features
  5. Train Anomaly Detection Models
  6. Generate Predictions
  7. Evaluate Results
  8. Compare Models

Results

The notebook provides a side-by-side comparison of all implemented anomaly detection methods, highlighting their strengths and limitations when dealing with highly imbalanced fraud detection datasets.

Requirements

pip install pandas numpy matplotlib seaborn scikit-learn jupyter

Running the Notebook

jupyter notebook anomaly_detection.ipynb

Learning Outcomes

After completing this project, users will understand:

  • The challenges of imbalanced datasets
  • Fundamentals of anomaly detection
  • Differences between Isolation Forest, LOF, and One-Class SVM
  • Evaluation techniques for anomaly detection systems

Author

Muhammad Rashid

GitHub:
https://github.com/rashidrao-pk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant