Financial Fraud Risk Engine

A Full-Stack ML System for Real-Time Transaction Risk Scoring, Explainability, and Operational Fraud Analysis

Executive Summary

Modern financial systems must process millions of transactions per hour while simultaneously identifying suspicious behavior in milliseconds. Fraud evolves continuously, testing the boundaries of rule-based systems and exploiting predictable model weaknesses.

This repository demonstrates a complete, realistic, modular fraud detection engine built using:

Python for data preparation and experimentation
scikit-learn for model training and inference pipelines
SHAP for model transparency and governance
Streamlit for interactive analyst-facing tools
Cost-sensitive thresholding reflecting business trade-offs
A clean, production-style project layout

The engine works on the Synthetic Financial Fraud Dataset, but the architecture is flexible enough to plug into any real banking pipeline.

This project is not a “model notebook.” It is a small fraud detection platform, designed to reflect the systems used in real fintech companies.

1. Business Context: Why Fraud Detection Matters

Financial fraud occurs when an unauthorized party initiates a monetary transaction by exploiting weaknesses in user behavior, device security, identity verification, or merchant channels.

The cost is multidimensional:

Direct Financial Loss

Fraudulent transactions must often be reimbursed, with banks absorbing the loss.

Reputation & Trust Damage

If users feel unsafe, churn increases and regulatory risk escalates.

Operational Overhead

False positives → manual reviews → support tickets → internal investigation costs.

Regulatory Compliance

Modern regulations (PSD2, PCI DSS, AML directives) require:

Transparency
Explainability
Audit trails
Documented decision logic

This project includes explainability, logs, model artifacts, and visual documentation to reflect these requirements.

2. Modeling Philosophy

Fraud is not a “normal” ML classification problem.

It is:

Imbalanced (fraud is <1–5% of transactions)
Adversarial (attackers adapt to detection methods)
Temporal (fraud strategies evolve)
Non-stationary (data distribution drifts rapidly)
High-stakes (errors are expensive)

Therefore, fraud modeling must emphasize:

Interpretability

You must explain to analysts why a transaction was flagged.

Stability

Models must handle distribution shifts gracefully.

Cost sensitivity

Missing a fraud is 10×–100× more expensive than a false alert.

Human-in-the-loop tooling

Fraud operations teams must investigate and override model decisions.

Monitoring & governance

Thresholds, FPR, TPR, and drift must be monitored continuously.

This repository demonstrates these concepts through:

Cost-weighted threshold search
Feature preprocessing pipelines
SHAP interpretability
Visual diagnostics
A human-facing dashboard

3. Dataset Description

Although synthetic, the dataset includes the same structure as many real fintech transaction logs:

Feature	Why It Matters in Real Fraud
`amount`	Fraud often involves unusually large or unusually small amounts.
`hour`	Fraud probability increases at unusual times (night-time patterns).
`transaction_type`	Online transactions tend to have higher fraud rates vs POS/ATM.
`merchant_category`	Fraud patterns emerge around electronics, travel, luxury goods.
`country`	Geographical anomalies signal compromised accounts.
`device_risk_score`	Device fingerprinting is a core modern fraud defense.
`ip_risk_score`	High-risk IPs (VPNs, Tor, known attack ranges) correlate strongly with fraud.

Fraud label (is_fraud) is deterministic in this dataset, making it perfectly separable, but this project treats the dataset as if it were real by adding:

Reproducible splits
Proper pipelines
Explainability
Cost-based threshold logic

This makes the system behave like a real fraud engine, even if the underlying data is clean.

4. Project Structure Explained in Depth

Below is a human-readable explanation (not just a directory tree):

financial-fraud-risk-engine/
│
├── data/
│   ├── raw/                    # Input dataset(s) exactly as received
│   ├── processed/              # Cleaned, split datasets ready for ML
│
├── models/
│   ├── fraud_pipeline.joblib   # Full sklearn pipeline (preprocess + classifier)
│   └── threshold.json          # Business-optimized fraud decision threshold
│
├── reports/
│   ├── metrics/                # Metrics and model evaluation artifacts
│   └── figures/                # ROC, PR, Confusion Matrix, SHAP
│
├── src/
│   ├── data_prep.py            # Cleaning, stratified splits, reproducible data handling
│   ├── features.py             # Preprocessing + model construction (sklearn pipeline)
│   ├── train_model.py          # Model training + baseline evaluation
│   ├── evaluate.py             # Threshold tuning + visual diagnostics
│   ├── explain.py              # SHAP global explainability
│   └── score_new_transactions.py  # Batch scoring for new incoming data
│
├── app.py                      # Streamlit dashboard (analyst-facing UI)
├── requirements.txt
├── README.md
└── LICENSE

This architecture intentionally mirrors systems used in regulated financial environments.

5. Full System Architecture Diagram

flowchart LR
    %% Data sources
    A[Raw CSV<br/>data/raw/synthetic_fraud_dataset.csv]

    %% Data prep
    A --> B[Data Prep<br/>src/data_prep.py<br/>Stratified train/test split]
    B --> B1[Train Set<br/>data/processed/transactions_train.csv]
    B --> B2[Test Set<br/>data/processed/transactions_test.csv]

    %% Training
    B1 --> C[Model Training<br/>src/train_model.py]
    C --> C1[Preprocess + RandomForest Pipeline<br/>src/features.py]
    C1 --> D[Trained Pipeline<br/>models/fraud_pipeline.joblib]
    C1 --> E[Base Metrics<br/>reports/metrics/metrics.json]

    %% Threshold tuning & evaluation
    B2 --> F[Threshold & Evaluation<br/>src/evaluate.py]
    D --> F
    F --> G[Best Threshold<br/>models/threshold.json]
    F --> H[ROC / PR / CM Plots<br/>reports/figures/*.png]
    F --> I[Threshold Grid Metrics<br/>reports/metrics/threshold_search.json]

    %% Explainability
    B2 --> J[Global Explainability<br/>src/explain.py]
    D --> J
    J --> K[SHAP Summary Plot<br/>reports/figures/shap_summary.png]

    %% Scoring new data
    A --> L[Batch Scoring<br/>src/score_new_transactions.py]
    D --> L
    G --> L
    L --> M[Scored CSV<br/>fraud_probability + fraud_flag]

    %% Streamlit app
    subgraph UI[Interactive Dashboard<br/>app.py Streamlit]
        N[Upload CSV<br/>raw or scored] --> O[Apply Pipeline<br/>use models/fraud_pipeline.joblib]
        G --> O
        O --> P[Risk Scores & Flags<br/>fraud_probability / fraud_flag]
        D --> Q[Local SHAP for Selected Txn]
        B2 --> Q
    end

    %% Connections to UI
    A --> N
    M --> N
    D --> O
    G --> O
    D --> Q
    B2 --> Q

6. End-to-End Pipeline

Let’s walk through a transaction as it flows through the system.

6.1 Data Ingestion

The engine begins with a raw CSV:

No assumptions about cleanliness
No assumptions about schema enforcement
Data is placed into data/raw

This reflects real ETL pipelines where data arrives from upstream systems like:

Core banking logs
Payment gateway events
App telemetry
Merchant transaction streams

Raw data is always treated as immutable.

6.2 Data Preparation (`src/data_prep.py`)

This script:

Loads raw CSV
Validates columns
Performs quality checks (missing values, types)
Creates stratified train/test splits
Saves the results into data/processed

Why stratification?

If the test split accidentally contains 0 fraud cases, every downstream metric becomes meaningless.

This is a common mistake in junior data science projects, and this script prevents it.

6.3 Feature Engineering & Pipeline Definition (`src/features.py`)

A full scikit-learn Pipeline is created, including:

Categorical → OneHotEncoder

This expands merchant, country, and transaction type into expressive linear features.

Numerical → StandardScaler

Ensures numeric parameters share similar scale, stabilizing model training.

Classifier → RandomForestClassifier

Chosen for:

Robustness
Good performance on tabular data
Interpretability via SHAP
Nonlinearity support
Minimal feature engineering requirements

Why use a Pipelines?

Because:

It prevents leakage
It ensures identical preprocessing in training and inference
It serializes cleanly using Joblib
It mimics real-world model-serving systems

6.4 Model Training (`src/train_model.py`)

This step:

Fits the full pipeline on train data
Generates baseline metrics (ROC, recall, etc.)
Saves the pipeline for inference
Produces an auditable metrics JSON

In real fintech operations, metrics logs are required for:

Validation
Governance
Audits
Model monitoring

6.5 Threshold Optimization (`src/evaluate.py`)

Most portfolios skip this step and instead rely on 0.5 thresholds. This is incorrect for fraud.

We evaluate thresholds from 0.05 → 0.95 and compute:

Precision
Recall
F1
Confusion matrix

Business cost with weighting:

Cost = 10 × False Negative + 1 × False Positive

Banks often weight FN at 50× FP, but here 10:1 is used for clarity.

The chosen threshold is saved to:

models/threshold.json

This is how real fraud engines tune their trade-offs.

6.6 Evaluation Figures

Fraud Probability Distribution

3e56dc21e76039bcb187cfaa01e496bbed8f3c7a7b06b8a7780e666c

Deep explanation:

Synthetic dataset → sharp separability
Fraud probability ≈ 1
Non-fraud probability ≈ 0
Reflects a deterministic fraud generator
Useful for educational purposes despite unrealistic perfection

Confusion Matrix

Deep explanation:

Zero false positives and false negatives
This never happens in real fraud data
Illustrates why synthetic data is dangerous for benchmarking
But excellent for demonstrating workflow and system design

Precision, Recall Curve

Deep explanation:

PR focuses on the minority class
Shows nearly perfect performance
Real fraud PR curves are typically 0.3–0.8
This curve highlights model dominance over the synthetic structure

ROC Curve

Deep explanation:

AUC ≈ 1.00
Confirms extreme separability
Demonstrates the upper bound of model performance

Local SHAP Explanation

bd554276f200c2a0a141b9e84b1f2b1a29bf0a5aa2cfa091e9c783a1

Deep explanation:

Shows how each feature pushed a single prediction
Used by human fraud investigators
Provides transparency for disputing decisions
Important for appeals in financial regulations

Streamlit Dashboard

Screenshot 2025-12-04 at 17-11-11 Financial Fraud Risk Engine

Deep explanation:

The dashboard simulates a real internal risk tool:

Analysts upload data
Adjust thresholds
Inspect high-risk cases
Drill down into SHAP explanations
Export flagged transactions

This interaction loop is core to fraud operations.

7. Streamlit Application Architecture (Deep Explanation)

The dashboard is not just UI, it's a fraud investigation console, consisting of:

Data Upload Module

Accepts arbitrary CSVs and validates columns.

Threshold Control Slider

Allows analysts to balance FPs and FNs dynamically.

Risk Distribution View

Shows population characteristics and model confidence.

High-Risk Case Ranking

Lists top suspicious transactions based on score.

SHAP Explanation Panel

Provides reason codes for decision justification.

This is similar to internal tools at:

Stripe Radar
PayPal FraudNet
Revolut Risk Engine
Monzo Financial Crime Platform

8. Explainability & Governance (ML in Regulated Environments)

Financial ML is heavily regulated. This project demonstrates best practices:

Global feature importance

Ensures the model is not using prohibited or biased features.

Local explanations

Allows customers to appeal flagged decisions.

Serialized artifacts

Provides reproducibility for audits.

Threshold documentation

Required for internal risk committees.

Complete training logs

Ensures no hidden processing steps.

Even though synthetic data is used, the workflow is industry-realistic.

9. Batch Scoring Pipeline (`src/score_new_transactions.py`)

This script mimics an ML microservice:

Loads the saved pipeline
Applies preprocessing identically
Computes risk scores
Applies the selected threshold
Outputs flagged results

This replicates how production fraud systems function:

Kafka streaming
Batch ETL scoring
Real-time scoring APIs

10. Known Limitations of This Synthetic Setup

Perfect separability

No real dataset behaves like this.

No temporal drift

Fraud evolves, this dataset does not.

No adversarial behavior

Fraudsters adapt; synthetic datasets do not.

No user-session risk modeling

Real models incorporate behavioral histories.

No graph-based fraud rings

Real fraud involves rings of connected accounts.

Despite these limitations, the architecture is still industry-grade.

11. Future Roadmap

A serious fraud system evolves continuously. Here are realistic next steps:

Phase 1, Model Improvements

Add Gradient Boosting models (XGBoost / LightGBM)
Introduce calibration (Platt scaling, isotonic regression)
Add temporal features (velocity, bursts, user baselines)
Add geolocation and device fingerprinting enrichments

Phase 2, Real-World Fraud Behaviors

Simulate fraud rings
Add compromised device patterns
Add coordinated attacks (same IP hitting many accounts)
Add session-based features

Phase 3, Deployment Engineering

Convert inference pipeline into FastAPI microservice
Containerize with Docker
Add CI/CD for model retraining
Use MLflow model registry

Phase 4, Monitoring & Drift Detection

Track score distribution drift
Track recall on drifted populations
Add alerting for sharp changes

Phase 5, Analyst Tool Enhancements

Add case notes & exporting
Add reason-code explanations
Add multi-transaction user investigation view
Add fraud ring graph visualization

12. Portfolio Value

This repository demonstrates:

Full ML system design

Not just training, full pipeline architecture.

Fraud domain understanding

Critical for fintech interviews.

Explainability and governance

Huge differentiator.

Realistic engineering skills

Pipelines, serialization, reproducibility.

User-facing tooling

Dashboards show usability thinking.

Production mindset

Thresholds, cost functions, monitoring.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
models		models
reports		reports
src		src
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

License

AmirhosseinHonardoust/Financial-Fraud-Risk-Engine

Folders and files

Latest commit

History

Repository files navigation

Financial Fraud Risk Engine

A Full-Stack ML System for Real-Time Transaction Risk Scoring, Explainability, and Operational Fraud Analysis

Executive Summary

1. Business Context: Why Fraud Detection Matters

Direct Financial Loss

Reputation & Trust Damage

Operational Overhead

Regulatory Compliance

2. Modeling Philosophy

Interpretability

Stability

Cost sensitivity

Human-in-the-loop tooling

Monitoring & governance

3. Dataset Description

4. Project Structure Explained in Depth

5. Full System Architecture Diagram

6. End-to-End Pipeline

6.1 Data Ingestion

6.2 Data Preparation (src/data_prep.py)

Why stratification?

6.3 Feature Engineering & Pipeline Definition (src/features.py)

Categorical → OneHotEncoder

Numerical → StandardScaler

Classifier → RandomForestClassifier

Why use a Pipelines?

6.4 Model Training (src/train_model.py)

6.5 Threshold Optimization (src/evaluate.py)

6.6 Evaluation Figures

Fraud Probability Distribution

Confusion Matrix

Precision, Recall Curve

ROC Curve

Local SHAP Explanation

Streamlit Dashboard

7. Streamlit Application Architecture (Deep Explanation)

Data Upload Module

Threshold Control Slider

Risk Distribution View

High-Risk Case Ranking

SHAP Explanation Panel

8. Explainability & Governance (ML in Regulated Environments)

Global feature importance

Local explanations

Serialized artifacts

Threshold documentation

Complete training logs

9. Batch Scoring Pipeline (src/score_new_transactions.py)

10. Known Limitations of This Synthetic Setup

Perfect separability

No temporal drift

No adversarial behavior

No user-session risk modeling

No graph-based fraud rings

11. Future Roadmap

Phase 1, Model Improvements

Phase 2, Real-World Fraud Behaviors

Phase 3, Deployment Engineering

Phase 4, Monitoring & Drift Detection

Phase 5, Analyst Tool Enhancements

12. Portfolio Value

Full ML system design

Fraud domain understanding

Explainability and governance

Realistic engineering skills

User-facing tooling

Production mindset

About

Topics

Resources

License

Uh oh!

Stars

Watchers

6.2 Data Preparation (`src/data_prep.py`)

6.3 Feature Engineering & Pipeline Definition (`src/features.py`)

6.4 Model Training (`src/train_model.py`)

6.5 Threshold Optimization (`src/evaluate.py`)

9. Batch Scoring Pipeline (`src/score_new_transactions.py`)

Packages