Skip to content

Varshini659/fraud-detection-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Credit Card Fraud Detection API

XGBoost · SMOTE · SHAP · FastAPI deployment


The Problem

I started this project thinking fraud detection was a straightforward classification problem. It isn't.

The dataset has 284,807 transactions. 492 are fraud. That's 0.17%. A model that predicts "legit" for literally every transaction scores 99.83% accuracy - and catches zero frauds. Accuracy is useless here. The whole project is really about dealing with that one uncomfortable fact.

What I Built

A full pipeline from raw imbalanced data to a deployed REST API:

  • Handled class imbalance using SMOTE (not just oversampling - actually generating synthetic fraud examples by interpolating between real ones)
  • Trained XGBoost on 454K balanced training rows
  • Used SHAP to understand why the model flags specific transactions, not just that it does
  • Wrapped everything in a FastAPI app so the model is actually callable, not just a notebook

Results

Metric Score
ROC-AUC 0.9776
PR-AUC 0.8663
Fraud Recall 90%
Fraud Precision 51%

PR-AUC is the right metric here, not accuracy. A random classifier on this dataset scores ~0.0017 PR-AUC. Getting to 0.8663 on genuinely imbalanced real-world data is the actual benchmark.

The 51% precision means roughly half the fraud alerts are false alarms - which sounds bad until you realize the alternative (missing a fraud) costs ~₹122 vs ₹5 for a false alarm. I ran an explicit business cost optimization to find the threshold that minimizes total expected loss. It converged at 0.50, which actually tells you something useful: the model's probability scores are well-separated enough that no threshold tricks are needed.

Live API

image image

Three tiers: HIGH (≥0.80) blocks immediately, MEDIUM (0.40-0.79) triggers OTP, LOW approves.

Handling the Imbalance

Imbalance

The naive fix is to just duplicate fraud rows. SMOTE is better - it creates new synthetic fraud examples by picking two real fraud transactions and generating a point somewhere between them in feature space. Less memorization, more generalization.

One thing I was careful about: SMOTE only on training data. Never touch the test set. If you apply SMOTE before splitting, synthetic examples from the same neighbourhood end up in both train and test - that's data leakage and your evaluation numbers are lies.

Split Rows Fraud %
Train before SMOTE 227,845 0.17%
Train after SMOTE 454,902 50.0%
Test (real world) 56,962 0.17%

Model Performance

Pr curve Confusion Matrix - HAL NS XGBoost Prediction

90% recall means the model catches 9 out of 10 real frauds. The 10% it misses are the expensive ones - that's where an ensemble approach (adding an Isolation Forest for anomaly detection) would help in a production setup.

SHAP - Understanding What the Model Actually Learned

This was the most interesting part. You can train a model and report metrics, but if you can't explain a specific decision you can't deploy it anywhere that matters. Banks in India (RBI guidelines) and Europe (GDPR) both require explainability for automated financial decisions.

shap bar

V4 and V14 completely dominate. Their mean SHAP values are nearly 3× the next feature. Everything else is noise by comparison.

shap beeswarm

The beeswarm shows the direction. For V14: low values (blue dots) push hard toward fraud, high values push toward legit. It's an inverted relationship - which in the real world probably corresponds to something like "low transaction approval history" or "unusual merchant type," but the bank anonymized the raw features so we can't know for sure.

V17 has the same pattern. The model's primary fraud fingerprint is: V14 very low + V17 very low + V10 very low, all at the same time. Any one of them alone isn't enough. The combination is what triggers it.

For transaction 77348 specifically (model confidence 99.99%):

  • Baseline: −0.039 (slightly toward legit - makes sense, 99.83% of transactions are legit)
  • V14 alone: +4.91 push toward fraud
  • V17: +1.64
  • V10: +1.27
  • Final score: 8.836 → fraud

That's the kind of breakdown a risk officer can actually act on.

The API Internals

JSON transaction arrives
    → Pydantic validates all 30 fields (auto-rejects malformed requests)
    → StandardScaler normalizes Amount + Time (same scaler fitted on training data)
    → Features aligned to exact training column order (wrong order = garbage predictions)
    → XGBoost.predict_proba() → fraud probability
    → Business rule layer maps probability to verdict + action
    → JSON response

The scaler and feature order are both saved as artifacts alongside the model. Without them, inference is broken even if the model weights are correct.

Tech Stack

  • Data: UCI Credit Card Fraud dataset, 284,807 transactions, 30 features
  • Imbalance handling: SMOTE (imbalanced-learn)
  • Model: XGBoost Classifier
  • Explainability: SHAP (TreeExplainer)
  • API: FastAPI + uvicorn
  • Deployment (demo): ngrok tunnel, Google Colab

Project Structure

fraud-detection-api/
├── fraud_detection.ipynb    # Full pipeline notebook
├── main.py                  # FastAPI app
├── fraud_model.pkl          # Trained model
├── feature_names.json       # Column order for inference
├── scaler.pkl               # Fitted scaler
└── requirements.txt

Limitations & Next Steps

  • Dataset is from 2013 European cardholders - fraud patterns shift over time (concept drift), so a production model would need monthly, not annual, retraining
  • V1-V28 are PCA-anonymized - impossible to give feature names real business meaning without access to the original variables
  • ngrok URL is ephemeral - production deployment would use Docker + GCP Cloud Run or AWS Lambda
  • Logical next steps: walk-forward retraining pipeline, Isolation Forest ensemble for unseen fraud patterns, MLflow for experiment tracking, Docker for reproducibility

Data Source

UCI ML Repository - Credit Card Fraud Detection. Published by Dal Pozzolo et al., IEEE Symposium on Computational Intelligence, 2015. Features V1-V28 are PCA-transformed; raw features are confidential.

About

An end-to-end credit card fraud detection microservice using XGBoost on an imbalanced 284K dataset, deployed as a live, low-latency REST API with FastAPI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors