Skip to content

Latest commit

 

History

History
97 lines (71 loc) · 4.69 KB

File metadata and controls

97 lines (71 loc) · 4.69 KB

FastAPI Celery Redis PostgreSQL TensorFlow OpenCV Docker Langchain

Real-Time Deepfake Detection and Explanation Using Lightweight ML and LLMs

Deepfake content is an increasing threat to digital media authenticity. This project introduces a real-time, explainable deepfake detection pipeline that processes video in memory, minimizing latency while providing interpretable outputs. The system leverages frame reduction techniques, pretrained visual models (e.g., FakeShield), and vision-capable LLMs for generating human-understandable explanations — all without writing any media to disk.

Objectives

  • Perform real-time deepfake detection without persistent storage.
  • Apply lightweight motion-based heuristics (e.g., optical flow, scene detection) to reduce the number of frames analyzed.
  • Use pretrained models like FakeShield for forgery detection and localization.
  • Generate coherent natural-language explanations via fine-tuned vision LLMs.
  • Serve results instantly through a FastAPI-based backend, optionally containerized with Docker.

System Modules

1. Video Input & Streaming

  • Supports both video file upload and live frame streaming via WebSocket.
  • Entire pipeline runs in-memory, avoiding any disk I/O for performance.

2. Frame Selection Pipeline

  • Optical Flow: Tracks pixel-wise motion (e.g., Farneback or RAFT).
  • Scene Change Detection: Uses histogram delta or tools like PySceneDetect.
  • Optional: Background subtraction to eliminate static regions.

Goal: Reduce frame count by 90% while maintaining semantic fidelity.

3. Deepfake Detection Engine

Model: FakeShield v1-22b
A multimodal vision-language framework for explainable deepfake detection and localization.

Input:

  • Frame as a tensor (from OpenCV → NumPy → PyTorch)
  • Modules used: DTE-FDM and MFLM

Output (per frame):

  • verdict: real or fake
  • confidence_score: e.g., 0.87
  • forgery_mask: binary/grayscale image mask (H, W) as numpy.ndarray or torch.Tensor
  • attention_map (optional): model attention visualization to highlight decision focus

These outputs are passed to the LLM for final explanation generation.

4. Explanation Engine

Model: saakshigupta/deepfake-explainer-new
A LLaVA-based adapter fine-tuned to generate deepfake analysis across multiple images.

Inputs:

  • Original frame
  • Forgery mask
  • Optional: Attention map or overlay
  • Prompt: "Explain if this frame shows signs of tampering."

Output:

  • A detailed natural language explanation, highlighting potential manipulation and justifying the verdict.

Example:

“Regions around the mouth and cheek show boundary noise and abnormal motion artifacts, indicating synthetic manipulation.”

5. Response API

Served via FastAPI as JSON:

{
  "frame_index": 45,
  "verdict": "fake",
  "confidence": 0.87,
  "explanation": "Facial boundary irregularities suggest deepfake generation.",
  "forgery_mask": "<base64-image>",
  "attention_map": "<base64-image>"
}

image

Tech Stack

Layer Stack
API Backend FastAPI
Frame Processing OpenCV, FFmpeg
Motion Detection Optical Flow, PySceneDetect
Deepfake Detection FakeShield, PyTorch
LLM Explanation Hugging Face, LangChain
Deployment Docker (optional)