Prodigy-InfoTech · rashidrao-pk · Jun 10, 2026
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1 @@
+/Anomaly Detection/dataset
diff --git a/Anomaly Detection/README.md b/Anomaly Detection/README.md
@@ -0,0 +1,79 @@
+# Anomaly Detection in Credit Card Fraud Transactions
+
+This project demonstrates a simple but strong anomaly detection workflow using the Credit Card Fraud Detection dataset.
+
+## Overview
+
+Credit card fraud detection is a highly imbalanced anomaly detection problem. Most transactions are normal, while fraudulent transactions are rare. This notebook compares three popular unsupervised anomaly detection algorithms:
+
+- Isolation Forest
+- Local Outlier Factor
+- One-Class SVM
+
+The notebook includes:
+
+- Data loading
+- Class distribution analysis
+- Feature scaling
+- Model training
+- Prediction conversion from anomaly labels to binary labels
+- Evaluation using classification report, confusion matrix, ROC-AUC, and PR-AUC
+- A summary table comparing all models
+
+## Dataset
+
+Use the Credit Card Fraud Detection dataset from Kaggle:
+
+https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
+
+Download `creditcard.csv` and place it in this project folder before running the notebook.
+
+Expected structure:
+
+```text
+Anomaly Detection/
+├── README.md
+├── anomaly_detection.ipynb
+|── dataset/
+└──── creditcard.csv
+```
+
+## How to Run
+
+Install the required packages:
+
+```bash
+pip install pandas numpy matplotlib scikit-learn
+```
+
+Then open the notebook:
+
+```bash
+jupyter notebook anomaly_detection.ipynb
+```
+
+## Models Used
+
+### Isolation Forest
+Isolation Forest detects anomalies by randomly isolating observations. Anomalies are expected to require fewer random splits.
+
+### Local Outlier Factor
+Local Outlier Factor detects anomalies by comparing the local density of a sample with the density of nearby samples.
+
+### One-Class SVM
+One-Class SVM learns a boundary around normal samples and treats points outside the boundary as anomalies.
+
+## Evaluation Metrics
+
+Because the dataset is highly imbalanced, accuracy alone is not reliable. This project reports:
+
+- Precision
+- Recall
+- F1-score
+- ROC-AUC
+- PR-AUC
+- Confusion matrix
+
+## Notes
+
+The models are trained in an unsupervised way by removing the label column during training. The labels are only used for evaluation.