FraudGraph

Graph-based fraud analytics — modeling fraud as a network problem, not just a transaction problem.

The Problem

Traditional fraud detection looks at individual transactions in isolation. But fraud rarely happens in isolation.

A compromised account shares a device with three others. Two flagged payment methods resolve to the same IP address. A cluster of new users all registered within minutes of each other, from the same subnet.

These patterns are invisible at the transaction level. They become visible as a graph.

FraudGraph builds a connected representation of users, devices, payment methods, IP addresses, and transactions — then mines that network for the kind of structural signals that isolated transaction analysis can't see.

What This Project Explores

Entity relationship modeling — representing transactional data as a graph of interconnected entities
Graph feature engineering — deriving risk signals from network structure (centrality, clustering, community detection)
Risk scoring — building interpretable, prioritized risk scores from graph-derived features
Investigation tooling — designing analyst-facing dashboards and interactive network visualizations
Realistic synthetic data — generating and validating data with credible relationship complexity, noise, and fraud pattern representation

Architecture

Transactional Data
        │
        ▼
Entity Resolution
(User · Device · IP · Payment Method)
        │
        ▼
Graph Construction
        │
        ├──────────────────────────────────┐
        ▼                                  ▼
Graph Feature Engineering           Community Detection
(Degree, Centrality, Clustering)    (Louvain, Label Propagation)
        │                                  │
        └──────────────┬───────────────────┘
                       ▼
               Risk Scoring Layer
                       │
                       ▼
           Investigation Dashboard
                       │
                       ▼
      Interactive Network Visualization

Why Graph Analytics for Fraud?

Fraudsters exploit the gaps between siloed data systems. A device shared across ten accounts is a transaction-level blind spot — but in a graph, it's a high-degree node connecting otherwise unrelated entities. The same logic applies to shared IPs, shared payment instruments, and coordinated account creation.

Graph analytics surfaces these structural patterns by treating the relationships between entities as first-class signals, alongside the raw transaction attributes.

Key graph-derived signals this project explores:

Signal	What it captures
Node degree	How many other entities is this one connected to?
Betweenness centrality	Is this entity a bridge between otherwise separate clusters?
Clustering coefficient	Are an entity's neighbors tightly interconnected?
Community membership	Does this entity belong to a suspiciously dense subgraph?
Shared resource risk	Is a device, IP, or payment method shared across flagged entities?

Tech Stack

Analytics & Modeling

Python · Pandas · NumPy · Scikit-learn

Graph Analytics

NetworkX · Neo4j AuraDB

Visualization & Dashboards

Streamlit · Plotly · Pyvis

Development

Git · GitHub · Jupyter Notebook

Project Status

🚧 Active Development

Phase	Status
Synthetic data generation & validation	🔄 In progress
Entity resolution & graph construction	🔄 In progress
Graph feature engineering	📋 Planned
Risk scoring methodology	📋 Planned
Investigation dashboard	📋 Planned
Interactive network visualization	📋 Planned

A Note on Synthetic Data

This project uses synthetic data generated for experimentation and learning.

A known failure mode in synthetic fraud datasets is being too clean — perfectly balanced classes, no noise, unrealistically tidy relationship structures. A model trained on such data may perform well on paper while learning patterns that don't exist in real environments.

Development here includes explicit validation of:

Relationship realism (do shared devices/IPs occur at plausible rates?)
Class distribution (is the fraud rate representative of real-world baselines?)
Network complexity (are graph structures sufficiently varied and non-trivial?)
Noise and variability (are there false positives, ambiguous cases, behavioral drift?)
Fraud pattern representation (do synthetic fraud clusters resemble known real-world patterns?)

The goal is not to build a model that performs well on synthetic data. The goal is to build an approach that could hold up under more realistic conditions.

Repository Structure

fraudgraph/
├── data/
│   ├── raw/                  # Synthetic transaction records
│   └── processed/            # Entity-resolved, graph-ready datasets
├── notebooks/
│   ├── 01_data_generation.ipynb
│   ├── 02_graph_construction.ipynb
│   ├── 03_feature_engineering.ipynb
│   └── 04_risk_scoring.ipynb
├── src/
│   ├── data/                 # Data generation and entity resolution
│   ├── graph/                # Graph construction and feature extraction
│   ├── scoring/              # Risk scoring logic
│   └── dashboard/            # Streamlit app
├── tests/
├── requirements.txt
└── README.md

Future Directions

Near real-time risk scoring pipelines
Graph embeddings and representation learning (Node2Vec, GraphSAGE)
Unsupervised anomaly detection on graph structure
Explainability features for investigation workflows
Evaluation on larger, more complex graph topologies

About

Building as a portfolio project exploring the intersection of graph analytics and fraud detection. Motivated by the observation that fraud often leaves structural footprints in relationship data that transaction-level analysis alone can't capture.

Questions, feedback, or collaboration — feel free to open an issue or reach out.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FraudGraph

The Problem

What This Project Explores

Architecture

Why Graph Analytics for Fraud?

Tech Stack

Project Status

A Note on Synthetic Data

Repository Structure

Future Directions

About

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

FraudGraph

The Problem

What This Project Explores

Architecture

Why Graph Analytics for Fraud?

Tech Stack

Project Status

A Note on Synthetic Data

Repository Structure

Future Directions

About

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages