Syntecxhub ML Internship
An end-to-end machine learning system for detecting fraudulent credit card transactions. Includes a full ML training pipeline (EDA → SMOTE → Random Forest + XGBoost) and a dark-themed interactive Flask web application for real-time predictions.
Run locally → open http://127.0.0.1:5000 after following the setup steps below.
Credit card fraud costs the global economy billions of dollars annually. This project tackles the challenge of detecting fraud from a highly imbalanced dataset (only 0.172% fraud) using proven ML techniques and presents results through an intuitive web interface.
Key challenge: Standard accuracy is misleading — a model predicting "Normal" for everything achieves 99.83% accuracy but catches zero frauds. This project uses Precision, Recall, ROC-AUC, and F1 as meaningful metrics, with SMOTE to handle class imbalance.
Two-layer system:
Layer 1 — ML Pipeline (run once)
creditcard.csv → EDA → Scale → SMOTE → Train RF + XGBoost → Save .pkl
Layer 2 — Web App (run anytime after Layer 1)
Browser (index.html) → POST /predict → Flask (app.py) → .pkl model → JSON verdict
Syntecxhub_Project_CreditCardFraudDetection/
│
├── data/
│ └── creditcard.csv ← Download from Kaggle (not in Git)
│
├── src/
│ ├── __init__.py
│ ├── data_loader.py ← CSV loading and feature extraction
│ ├── eda.py ← 5 EDA plots with statistical summary
│ ├── preprocessing.py ← StandardScaler + train/test split + SMOTE
│ ├── model.py ← Random Forest and XGBoost training
│ └── evaluate.py ← Metrics, ROC, PR curves, threshold analysis
│
├── templates/
│ └── index.html ← Flask web app frontend (dark fintech UI)
│
├── outputs/
│ ├── plots/ ← 10 PNG plots auto-generated
│ ├── models/ ← Trained model .pkl files
│ └── reports/ ← model_comparison.csv
│
├── main.py ← ML pipeline entry point
├── app.py ← Flask web server
├── requirements.txt
├── .gitignore
├── LICENSE
└── README.md
| Property | Value |
|---|---|
| Source | Kaggle — Credit Card Fraud Detection |
| Transactions | 284,807 |
| Features | 30 (Time, V1–V28 via PCA, Amount) |
| Target | Class — 0 = Normal, 1 = Fraud |
| Fraud rate | 0.172% — 492 fraud cases |
| Missing values | None |
The dataset is not included due to size. Download
creditcard.csvfrom Kaggle and place it indata/.
- Class imbalance visualization (bar + pie)
- Transaction amount distribution by class
- Temporal distribution across 48-hour window
- Full feature correlation heatmap (30×30)
- V1–V14 boxplots: Normal vs Fraud comparison
StandardScalerapplied toTimeandAmount(V1–V28 are already PCA-scaled)- 80/20 stratified train/test split
- SMOTE applied only to training set → balances from 227,451 vs 394 to 1:1
Random Forest
- 200 estimators, max_depth=12, class_weight='balanced'
- Ensemble of decision trees via majority vote
XGBoost
- 200 estimators, learning_rate=0.05, scale_pos_weight for imbalance
- Sequential gradient boosting — each tree corrects previous errors
- Confusion matrix (TN, FP, FN, TP)
- ROC Curve with AUC score
- Precision-Recall Curve with Average Precision
- Threshold Analysis — F1/Precision/Recall across all decision thresholds
| Route | Method | Description |
|---|---|---|
/ |
GET | Serves the main prediction UI |
/predict |
POST | Accepts feature JSON, returns verdict from both models |
/model_status |
GET | Returns whether RF and XGBoost are loaded |
The frontend sends transaction features as JSON. Flask loads the saved .pkl models, runs predict_proba(), and returns fraud probability and verdict for both models. The browser renders the result with confidence bars and per-model breakdown.
| Metric | Why it matters for fraud |
|---|---|
| Recall | Are we catching actual frauds? Missing a fraud = financial loss |
| Precision | Are our fraud flags real? Too many false alarms = bad UX |
| ROC-AUC | Overall discrimination ability across all thresholds |
| F1-Score | Harmonic mean — balances Precision and Recall |
| Threshold Analysis | Business teams can tune the operating point |
A model that only predicts "Normal" gets 99.83% accuracy — but 0% Recall. That is why accuracy is never used here.
git clone https://github.com/<your-username>/Syntecxhub_Project_CreditCardFraudDetection.git
cd Syntecxhub_Project_CreditCardFraudDetectionpython -m venv venv
# Windows
venv\Scripts\activate
# macOS / Linux
source venv/bin/activatepip install -r requirements.txtDownload creditcard.csv from Kaggle and place it at data/creditcard.csv.
python main.pyThis will:
- Generate 10 EDA and evaluation plots in
outputs/plots/ - Save
random_forest.pklandxgboost.pklinoutputs/models/ - Save
model_comparison.csvinoutputs/reports/ - Print full evaluation metrics to terminal
Estimated runtime: 3–7 minutes depending on hardware.
python app.pyOpen your browser and go to: http://127.0.0.1:5000
main.pymust be run at least once beforeapp.py— the web app needs the saved.pklmodel files.
- Amount:
150, all V sliders at center → Expected:LEGITIMATE
- Click "Suspicious Pattern" preset
- V1=−4.77, V3=−5.03, V14=−6.1, Amount=$1.00
- Expected:
FRAUDwith high probability
- Amount:
0.01–5.00(very small) - V1, V3, V14 sliders: drag to far left (−10)
- Watch fraud probability increase
| File | Description |
|---|---|
01_class_distribution.png |
Class imbalance bar and pie chart |
02_amount_distribution.png |
Amount histogram by class |
03_time_distribution.png |
Temporal distribution |
04_correlation_heatmap.png |
30-feature correlation matrix |
05_feature_boxplots.png |
V1–V14 Normal vs Fraud |
06_confusion_matrix_*.png |
Per-model confusion matrix |
07_roc_curve_*.png |
ROC curve with AUC |
08_precision_recall_*.png |
Precision-Recall curve |
09_threshold_analysis_*.png |
Business threshold analysis |
10_model_comparison.png |
Side-by-side metric comparison |
model_comparison.csv |
Final metrics table |
| Category | Tools |
|---|---|
| Language | Python 3.8+ |
| Web Framework | Flask 3.0 |
| ML Models | scikit-learn (Random Forest), XGBoost |
| Imbalance Handling | imbalanced-learn (SMOTE) |
| Data Processing | pandas, numpy |
| Visualization | matplotlib, seaborn |
| Model Persistence | joblib |
Rafiul Islam
IoT & Robotics Engineering
University of Frontier Technology Bangladesh (UFTB)
Syntecxhub ML Internship