Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/Anomaly Detection/dataset
79 changes: 79 additions & 0 deletions Anomaly Detection/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Anomaly Detection in Credit Card Fraud Transactions

This project demonstrates a simple but strong anomaly detection workflow using the Credit Card Fraud Detection dataset.

## Overview

Credit card fraud detection is a highly imbalanced anomaly detection problem. Most transactions are normal, while fraudulent transactions are rare. This notebook compares three popular unsupervised anomaly detection algorithms:

- Isolation Forest
- Local Outlier Factor
- One-Class SVM

The notebook includes:

- Data loading
- Class distribution analysis
- Feature scaling
- Model training
- Prediction conversion from anomaly labels to binary labels
- Evaluation using classification report, confusion matrix, ROC-AUC, and PR-AUC
- A summary table comparing all models

## Dataset

Use the Credit Card Fraud Detection dataset from Kaggle:

https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

Download `creditcard.csv` and place it in this project folder before running the notebook.

Expected structure:

```text
Anomaly Detection/
├── README.md
├── anomaly_detection.ipynb
|── dataset/
└──── creditcard.csv
```

## How to Run

Install the required packages:

```bash
pip install pandas numpy matplotlib scikit-learn
```

Then open the notebook:

```bash
jupyter notebook anomaly_detection.ipynb
```

## Models Used

### Isolation Forest
Isolation Forest detects anomalies by randomly isolating observations. Anomalies are expected to require fewer random splits.

### Local Outlier Factor
Local Outlier Factor detects anomalies by comparing the local density of a sample with the density of nearby samples.

### One-Class SVM
One-Class SVM learns a boundary around normal samples and treats points outside the boundary as anomalies.

## Evaluation Metrics

Because the dataset is highly imbalanced, accuracy alone is not reliable. This project reports:

- Precision
- Recall
- F1-score
- ROC-AUC
- PR-AUC
- Confusion matrix

## Notes

The models are trained in an unsupervised way by removing the label column during training. The labels are only used for evaluation.
Loading