Skip to content

Yadnyikee99/Fraud_detection_API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fraud Detection System using Machine Learning and FastAPI

Live Deployment

API URL

https://fraud-detection-api-t7f4.onrender.com

Interactive API Documentation

https://fraud-detection-api-t7f4.onrender.com/docs

Business Problem

Financial fraud causes significant monetary losses for digital payment platforms. The objective of this project is to build a machine learning system capable of identifying potentially fraudulent transactions in real time while minimizing false positives.

Dataset

The dataset consists of synthetic mobile payment transactions generated by the PaySim simulator, which models financial activities of a mobile money service over a 30-day period.

The dataset contains transaction information such as:

  • Transaction type
  • Transaction amount
  • Origin account balances
  • Destination account balances
  • Fraud labels

Data Challenge: Class Imbalance

Fraud detection is a highly imbalanced classification problem.

Fraudulent transactions account for approximately 0.13% of all transactions, while legitimate transactions account for more than 99% of the data.

This imbalance makes traditional accuracy metrics misleading and requires careful model evaluation and threshold selection.

Feature Engineering

To capture suspicious transaction patterns beyond the raw dataset, several domain-inspired features were engineered:

balancediff_Dest_including_amount

Measures the consistency between destination account balances after considering the transferred amount.

balancediff_Org_including_amount

Measures the expected balance change of the origin account after the transaction.

amount_to_balance_ratio

Represents the proportion of the account balance being transferred. Large ratios may indicate suspicious behavior.

is_zero_balance

Flags transactions where the origin account balance is zero, which may provide useful fraud signals.

Model Development

Multiple machine learning algorithms were evaluated:

  • Logistic Regression
  • Random Forest
  • XGBoost

Model performance was compared using classification metrics suitable for imbalanced datasets.

XGBoost achieved the best overall performance and was selected as the final production model.

Threshold Optimization

Machine learning models typically use a default classification threshold of 0.5.

To improve fraud detection performance, threshold analysis was performed to evaluate the trade-off between precision and recall.

The optimal threshold was selected based on validation performance and business requirements, resulting in improved fraud identification compared to the default threshold.

Deployment Architecture

The trained model was deployed as a REST API using FastAPI.

Workflow

  1. User submits transaction data.
  2. FastAPI validates the request using Pydantic schemas.
  3. Feature engineering transformations are applied.
  4. Features are aligned with training-time feature columns.
  5. The XGBoost model generates fraud probabilities.
  6. The optimized threshold converts probabilities into final predictions.
  7. The API returns both fraud probability and fraud prediction.

Example API Response

{
    "fraud_probability": 0.87,
    "prediction": 1
}

Where:

  • prediction = 1 indicates fraud
  • prediction = 0 indicates non-fraud

System Architecture

flowchart LR

A[Client Request] --> B[FastAPI]
B --> C[Feature Engineering]
C --> D[XGBoost Model]
D --> E[Threshold Optimization]
E --> F[Prediction Response]
Loading

Technologies Used

  • Python
  • Pandas
  • NumPy
  • Scikit-learn
  • XGBoost
  • Joblib
  • FastAPI
  • Pydantic

Future Improvements

  • Model monitoring and drift detection
  • Automated retraining pipeline
  • Experiment tracking using MLflow
  • Real-time streaming predictions
  • CI/CD integration using GitHub Actions
  • Kubernetes deployment for scalability

Business Recommendations

The dataset is highly imbalanced, with fraudulent transactions representing approximately 0.13% of all observations.

This means traditional accuracy metrics are not sufficient for evaluating model performance, and greater emphasis should be placed on precision, recall, and F1-score.

Important Fraud Indicators

Feature importance analysis showed that the following variables contributed significantly to fraud detection:

  • newbalanceOrig
  • amount_to_balance_ratio
  • balancediff_Org_including_amount

These features capture abnormal balance movements and unusual transaction behavior that are commonly associated with fraudulent activity.

Recommended Production Model

The selected production model is XGBoost.

Performance at the selected threshold of 0.7:

Metric Value
Precision 95%
Recall 94%
F1 Score 94%

Business Impact

  • The model correctly identifies approximately 94% of fraudulent transactions.
  • A precision of 95% indicates that most transactions flagged as fraud are truly fraudulent, reducing unnecessary investigations.
  • The selected threshold balances fraud detection capability with operational efficiency.

Recommendation

Deploy the XGBoost model as an initial fraud screening layer and route flagged transactions for additional verification before approval.

Model Limitations

  • As fraud patterns evolve, model performance may degrade over time if retraining is not performed regularly.
  • Customer transaction behavior may change due to :
  • New payment methods
  • Economic conditions
  • Seasonal effects
  • Products changes
  • The PaySlim dataset is a simulated representation of financial transactions. Although it captures many realistic fraud patterns, real world transaction data may contain additional complexities not represented in the dataset.
  • Despite strong models performance, some fradulent transactions may still remain undetected due to class imbalance.
  • Model depends on the chosen classification model. Different business objectives may require adjusting the threshold to prioritize either fraud detection(higher recall) or fewer false alarms (higher precision).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages