Deepfake Audio Detection System

AI-powered audio authenticity verification using CNN-based spectrogram analysis.

Overview

This system uses a Convolutional Neural Network (CNN) to detect deepfake audio by analyzing Log-Mel spectrograms. The model is trained on 2-second audio clips and can distinguish between real and AI-generated audio with high accuracy.

Architecture

Model

Input: Log-Mel spectrogram (1x128x200)
Architecture: 4 convolutional blocks with increasing filters (32, 64, 128, 256)
Output: Fakeness score (0-1)
Parameters: ~1.5M trainable parameters
Model Size: ~6MB

Dataset

Training: 13,956 samples (6,978 fake + 6,978 real)
Testing: 1,088 samples (544 fake + 544 real)
Format: 2-second WAV files, preprocessed at 16kHz

Performance

Expected Accuracy: 90-95%
Expected F1 Score: 0.88-0.93
Inference Time: <100ms per clip
Training Time: 2-3 hours on dual RTX 5060 Ti

Project Structure

FAC/
├── backend/
│   ├── model.py              # CNN architecture
│   ├── preprocessing.py      # Spectrogram generation
│   ├── augmentation.py       # Data augmentation (5 techniques)
│   ├── dataset.py           # PyTorch Dataset & DataLoader
│   ├── training.py          # Training loop with validation
│   ├── server.py            # Flask API
│   ├── main.py              # Main orchestrator
│   ├── requirements.txt     # Python dependencies
│   ├── checkpoints/         # Saved models
│   └── results/             # Training graphs & metrics
├── frontend/
│   ├── index.html           # UI structure
│   ├── style.css            # Styling
│   ├── main.js              # Frontend logic
│   └── package.json         # Node dependencies
└── Dataset/
    └── for-2sec/for-2seconds/
        ├── training/        # Training data
        ├── testing/         # Test data
        └── validation/      # Validation data

Installation

Backend Setup

Create a virtual environment:

cd backend
python -m venv .venv
.venv\Scripts\activate  # Windows

Install dependencies:

pip install -r requirements.txt

Frontend Setup

Install Node.js dependencies:

cd frontend
npm install

Usage

1. Train the Model

Train the model on your dataset:

cd backend
python training.py

This will:

Load the dataset from H:\FAC\Dataset
Train using dual GPUs (RTX 5060 Ti) with DataParallel
Apply 5 augmentation techniques during training
Save the best model to checkpoints/best_model.pth
Generate validation graphs in results/

Training takes approximately 2-3 hours.

2. Start the Backend Server

cd backend
python server.py

The Flask API will start on http://localhost:5000

3. Start the Frontend

In a separate terminal:

cd frontend
npm run dev

The frontend will start on http://localhost:5173

4. Use the Application

Open http://localhost:5173 in your browser.

Upload Mode:

Drag & drop an audio file or click to browse
Supported formats: MP3, WAV, OGG
Get instant analysis results

API Endpoints

GET `/api/model/status`

Check if model is loaded and get accuracy metrics.

Response:

{
  "loaded": true,
  "device": "cuda",
  "accuracy": 0.93,
  "version": "1.0"
}

POST `/api/predict/upload`

Upload an audio file for analysis.

Request: multipart/form-data with file

Response:

{
  "fakeness_score": 0.87,
  "prediction": "fake",
  "confidence": 0.87
}

POST `/api/predict/realtime`

Accepts base64-encoded audio for analysis.

Request:

{
  "audio": "base64_encoded_audio_data"
}

Response:

{
  "fakeness_score": 0.23,
  "prediction": "real",
  "confidence": 0.77
}

Features

Data Augmentation

Time Masking - Masks random time frames
Frequency Masking - Masks random mel bins
Gaussian Noise - Adds random noise
Codec Compression - Simulates MP3/AAC artifacts
Pitch Shift - Random pitch changes (±2 semitones)

Multi-GPU Training

Automatic detection of available GPUs
DataParallel for batch splitting across GPUs
Optimized batch size (64 total, 32 per GPU)
~2x training speedup with dual GPUs

Validation Testing

After training, comprehensive validation generates:

Training History - Loss and accuracy curves
ROC Curve - With AUC score
Confusion Matrix - True/False positives/negatives
Probability Distribution - Model confidence visualization
Precision-Recall Curve - Performance across thresholds

All graphs saved to backend/results/

Validation graphs (from `backend/results/`)

Training History	ROC Curve	Confusion Matrix

Probability Distribution	Precision-Recall Curve

Technical Details

Preprocessing

Sample rate: 16kHz
FFT size: 1024
Hop length: 256
Mel bins: 128
Normalization: Per-sample z-score

Training Configuration

Loss: BCELoss
Optimizer: AdamW (lr=3e-4)
Scheduler: ReduceLROnPlateau (patience=3)
Batch size: 64 (32 per GPU)
Epochs: 30-40
Early stopping: patience=5
Checkpointing: Best F1 score

Model Architecture

Input [1, 128, 200]
    ↓
Conv Block 1: 1→32 filters
    ↓
Conv Block 2: 32→64 filters
    ↓
Conv Block 3: 64→128 filters
    ↓
Conv Block 4: 128→256 filters
    ↓
AdaptiveAvgPool (1x1)
    ↓
FC: 256→128 + Dropout(0.3)
    ↓
FC: 128→1 + Sigmoid
    ↓
Output: Fakeness score [0-1]

Troubleshooting

Model not loading

Ensure you've trained the model first: python training.py
Check that checkpoints/best_model.pth exists

CUDA out of memory

Reduce batch size in training.py (default: 64)
Reduce num_workers in DataLoader (default: 8)

Frontend can't connect to backend

Ensure Flask server is running on port 5000
Check CORS settings in server.py
Verify API_BASE_URL in main.js

License

See LICENSE file for details.

Acknowledgments

Dataset: Fake-or-Real Audio Dataset
Framework: PyTorch
Frontend: Vanilla JavaScript with Vite
Backend: Flask

Deepfake Audio Detection - Research Project

Status: ✅ Complete | Training F1: 99.79% (validation) | Test F1: 39.8% | Demonstrates: Generalization challenges in deepfake detection

Important Disclaimer

⚠️ This detector is trained on 2020-2022 era TTS systems. It achieves 99% accuracy on training-era systems but approximately 60% accuracy on modern AI voice synthesis (ElevenLabs, Resemble AI, etc.). This demonstrates the fundamental challenge in deepfake detection: models learn system-specific artifacts rather than general "fakeness."

What This Project Is

An honest, AI-assisted research project that:

✅ Achieves 99.79% F1 on training distribution
✅ Demonstrates complete ML pipeline (data → training → deployment)
✅ Reveals real-world generalization challenges
✅ Provides comprehensive analysis and visualizations
✅ Includes honest assessment of limitations

"I did what I could. This is an AI-assisted project, but it's an honest one because I did the work of learning, structuring, and lecturing myself on its logic"

Features

📁 File upload with drag & drop support
📊 Complete training analysis with 5 visualization graphs
🔬 Technical documentation with architecture details
⚡ Fast inference (<100ms per clip on GPU)
🎯 Honest disclaimers about limitations

For complete documentation, see PROJECT_SUMMARY.md

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Documentation		Documentation
backend		backend
docs		docs
frontend		frontend
model		model
public/audio		public/audio
.gitattributes		.gitattributes
.gitignore		.gitignore
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
LICENSE		LICENSE
README.md		README.md
dedupe-results-11am.ps1		dedupe-results-11am.ps1

Folders and files

Latest commit

History

Repository files navigation

Deepfake Audio Detection System

Overview

Architecture

Model

Dataset

Performance

Project Structure

Installation

Backend Setup

Frontend Setup

Usage

1. Train the Model

2. Start the Backend Server

3. Start the Frontend

4. Use the Application

API Endpoints

GET /api/model/status

POST /api/predict/upload

POST /api/predict/realtime

Features

Data Augmentation

Multi-GPU Training

Validation Testing

Validation graphs (from backend/results/)

Technical Details

Preprocessing

Training Configuration

Model Architecture

Troubleshooting

Model not loading

CUDA out of memory

Frontend can't connect to backend

License

Acknowledgments

Deepfake Audio Detection - Research Project

Important Disclaimer

What This Project Is

Features

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

GET `/api/model/status`

POST `/api/predict/upload`

POST `/api/predict/realtime`

Validation graphs (from `backend/results/`)

Packages