Fraud Detection System using Machine Learning and FastAPI

Live Deployment

API URL

https://fraud-detection-api-t7f4.onrender.com

Interactive API Documentation

https://fraud-detection-api-t7f4.onrender.com/docs

Business Problem

Financial fraud causes significant monetary losses for digital payment platforms. The objective of this project is to build a machine learning system capable of identifying potentially fraudulent transactions in real time while minimizing false positives.

Dataset

The dataset consists of synthetic mobile payment transactions generated by the PaySim simulator, which models financial activities of a mobile money service over a 30-day period.

The dataset contains transaction information such as:

Transaction type
Transaction amount
Origin account balances
Destination account balances
Fraud labels

Data Challenge: Class Imbalance

Fraud detection is a highly imbalanced classification problem.

Fraudulent transactions account for approximately 0.13% of all transactions, while legitimate transactions account for more than 99% of the data.

This imbalance makes traditional accuracy metrics misleading and requires careful model evaluation and threshold selection.

Feature Engineering

To capture suspicious transaction patterns beyond the raw dataset, several domain-inspired features were engineered:

balancediff_Dest_including_amount

Measures the consistency between destination account balances after considering the transferred amount.

balancediff_Org_including_amount

Measures the expected balance change of the origin account after the transaction.

amount_to_balance_ratio

Represents the proportion of the account balance being transferred. Large ratios may indicate suspicious behavior.

is_zero_balance

Flags transactions where the origin account balance is zero, which may provide useful fraud signals.

Model Development

Multiple machine learning algorithms were evaluated:

Logistic Regression
Random Forest
XGBoost

Model performance was compared using classification metrics suitable for imbalanced datasets.

XGBoost achieved the best overall performance and was selected as the final production model.

Threshold Optimization

Machine learning models typically use a default classification threshold of 0.5.

To improve fraud detection performance, threshold analysis was performed to evaluate the trade-off between precision and recall.

The optimal threshold was selected based on validation performance and business requirements, resulting in improved fraud identification compared to the default threshold.

Deployment Architecture

The trained model was deployed as a REST API using FastAPI.

Workflow

User submits transaction data.
FastAPI validates the request using Pydantic schemas.
Feature engineering transformations are applied.
Features are aligned with training-time feature columns.
The XGBoost model generates fraud probabilities.
The optimized threshold converts probabilities into final predictions.
The API returns both fraud probability and fraud prediction.

Example API Response

{
    "fraud_probability": 0.87,
    "prediction": 1
}

Where:

prediction = 1 indicates fraud
prediction = 0 indicates non-fraud

System Architecture

flowchart LR

A[Client Request] --> B[FastAPI]
B --> C[Feature Engineering]
C --> D[XGBoost Model]
D --> E[Threshold Optimization]
E --> F[Prediction Response]

Technologies Used

Python
Pandas
NumPy
Scikit-learn
XGBoost
Joblib
FastAPI
Pydantic

Future Improvements

Model monitoring and drift detection
Automated retraining pipeline
Experiment tracking using MLflow
Real-time streaming predictions
CI/CD integration using GitHub Actions
Kubernetes deployment for scalability

Business Recommendations

The dataset is highly imbalanced, with fraudulent transactions representing approximately 0.13% of all observations.

This means traditional accuracy metrics are not sufficient for evaluating model performance, and greater emphasis should be placed on precision, recall, and F1-score.

Important Fraud Indicators

Feature importance analysis showed that the following variables contributed significantly to fraud detection:

newbalanceOrig
amount_to_balance_ratio
balancediff_Org_including_amount

These features capture abnormal balance movements and unusual transaction behavior that are commonly associated with fraudulent activity.

Recommended Production Model

The selected production model is XGBoost.

Performance at the selected threshold of 0.7:

Metric	Value
Precision	95%
Recall	94%
F1 Score	94%

Business Impact

The model correctly identifies approximately 94% of fraudulent transactions.
A precision of 95% indicates that most transactions flagged as fraud are truly fraudulent, reducing unnecessary investigations.
The selected threshold balances fraud detection capability with operational efficiency.

Recommendation

Deploy the XGBoost model as an initial fraud screening layer and route flagged transactions for additional verification before approval.

Model Limitations

As fraud patterns evolve, model performance may degrade over time if retraining is not performed regularly.
Customer transaction behavior may change due to :

New payment methods
Economic conditions
Seasonal effects
Products changes

The PaySlim dataset is a simulated representation of financial transactions. Although it captures many realistic fraud patterns, real world transaction data may contain additional complexities not represented in the dataset.
Despite strong models performance, some fradulent transactions may still remain undetected due to class imbalance.
Model depends on the chosen classification model. Different business objectives may require adjusting the threshold to prioritize either fraud detection(higher recall) or fewer false alarms (higher precision).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
models		models
notebooks		notebooks
schemas		schemas
services		services
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud Detection System using Machine Learning and FastAPI

Live Deployment

Business Problem

Dataset

Data Challenge: Class Imbalance

Feature Engineering

balancediff_Dest_including_amount

balancediff_Org_including_amount

amount_to_balance_ratio

is_zero_balance

Model Development

Threshold Optimization

Deployment Architecture

Workflow

Example API Response

System Architecture

Technologies Used

Future Improvements

Business Recommendations

Important Fraud Indicators

Recommended Production Model

Business Impact

Recommendation

Model Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection System using Machine Learning and FastAPI

Live Deployment

Business Problem

Dataset

Data Challenge: Class Imbalance

Feature Engineering

balancediff_Dest_including_amount

balancediff_Org_including_amount

amount_to_balance_ratio

is_zero_balance

Model Development

Threshold Optimization

Deployment Architecture

Workflow

Example API Response

System Architecture

Technologies Used

Future Improvements

Business Recommendations

Important Fraud Indicators

Recommended Production Model

Business Impact

Recommendation

Model Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages