Skip to content

KunalBharadwaj/ForgeIt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Explainable Fake News Detection using Machine Learning and Deep Learning

Python License

A comprehensive implementation of fake news detection using both traditional Machine Learning and Deep Learning approaches, with explainability powered by LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations).

๐ŸŽฏ Project Overview

This project addresses the critical problem of fake news detection by:

  • Implementing multiple ML models (Logistic Regression, Random Forest, SVM)
  • Implementing Deep Learning models (LSTM, BERT)
  • Providing explainable AI (XAI) using LIME and SHAP
  • Visualizing model predictions and explanations
  • Comparing model performance and interpretability

๐Ÿš€ Features

  • Multiple Model Architectures

    • Traditional ML: Logistic Regression, Random Forest, SVM
    • Deep Learning: LSTM, BERT (Transformer-based)
  • Explainability Techniques

    • LIME: Local interpretable model-agnostic explanations
    • SHAP: Unified approach to explain predictions
  • Comprehensive Analysis

    • Performance metrics (Accuracy, Precision, Recall, F1-Score)
    • Confusion matrices
    • ROC curves
    • Feature importance visualization
    • Word-level and phrase-level explanations

๐Ÿ“ Project Structure

AIProject/
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ raw/                    # Raw dataset files
โ”‚   โ”œโ”€โ”€ processed/              # Preprocessed data
โ”‚   โ””โ”€โ”€ README.md
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ data_preprocessing.py   # Data loading and preprocessing
โ”‚   โ”œโ”€โ”€ ml_models.py            # Traditional ML models
โ”‚   โ”œโ”€โ”€ dl_models.py            # Deep Learning models
โ”‚   โ”œโ”€โ”€ explainability.py       # LIME & SHAP implementations
โ”‚   โ”œโ”€โ”€ visualization.py        # Plotting and visualization
โ”‚   โ””โ”€โ”€ utils.py                # Utility functions
โ”œโ”€โ”€ notebooks/
โ”‚   โ””โ”€โ”€ demo.ipynb              # Interactive demonstration
โ”œโ”€โ”€ models/                     # Saved model files
โ”œโ”€โ”€ results/                    # Outputs, plots, reports
โ”‚   โ”œโ”€โ”€ plots/
โ”‚   โ””โ”€โ”€ reports/
โ””โ”€โ”€ main.py                     # Main training script

๐Ÿ› ๏ธ Installation

Prerequisites

  • Python 3.8 or higher
  • pip package manager
  • (Optional) CUDA-capable GPU for deep learning models

Setup

  1. Clone the repository:
git clone git@github.com:KunalBharadwaj/ForgeIt.git
  1. Create a virtual environment:
python -m venv venv
source venv/bin/activate  # On Linux/Mac
  1. Install dependencies:
pip install -r requirements.txt
  1. Download NLTK data:
python -c "import nltk; nltk.download('stopwords'); nltk.download('punkt'); nltk.download('wordnet')"

๐Ÿ“Š Dataset

This project uses fake news datasets. You can use:

  • Kaggle Fake News Dataset: Link
  • LIAR Dataset: Political statements with truth ratings
  • FakeNewsNet: Social media fake news dataset

Place your dataset in the data/raw/ directory.

Expected CSV format:

  • text or title: The news article text
  • label: Binary classification (0=Real, 1=Fake) or (Real/Fake)

๐ŸŽฎ Usage

Training Models

Run the main training script:

python main.py --data data/raw/news.csv --models all --explain

Options:

  • --data: Path to dataset
  • --models: Choose models (lr, rf, svm, lstm, bert, all)
  • --explain: Enable LIME & SHAP explanations
  • --test-size: Train-test split ratio (default: 0.2)

Interactive Demo

Launch the Jupyter notebook for interactive exploration:

jupyter notebook notebooks/demo.ipynb

Explaining Individual Predictions

from src.explainability import explain_prediction

# Explain a prediction using LIME
explanation = explain_prediction(
    model=trained_model,
    text="Your news article text here",
    method="lime"
)

# Visualize the explanation
explanation.show_in_notebook()

๐Ÿ“ˆ Models Implemented

Traditional Machine Learning

  1. Logistic Regression

    • Fast, interpretable baseline
    • TF-IDF feature extraction
  2. Random Forest

    • Ensemble method
    • Feature importance analysis
  3. Support Vector Machine (SVM)

    • Effective for high-dimensional text data
    • RBF and linear kernels

Deep Learning

  1. LSTM (Long Short-Term Memory)

    • Sequential text processing
    • Word embeddings (Word2Vec/GloVe)
  2. BERT (Bidirectional Encoder Representations from Transformers)

    • State-of-the-art NLP model
    • Fine-tuned for fake news classification

๐Ÿ” Explainability Methods

LIME (Local Interpretable Model-agnostic Explanations)

LIME explains individual predictions by:

  • Perturbing input text
  • Building local linear approximations
  • Highlighting influential words/phrases

Advantages:

  • Model-agnostic
  • Human-interpretable
  • Works with any classifier

SHAP (SHapley Additive exPlanations)

SHAP provides unified explanations based on game theory:

  • Consistent feature attribution
  • Shapley values for each word
  • Global and local explanations

Advantages:

  • Theoretically grounded
  • Consistent and accurate
  • Multiple visualization types

๐Ÿ“Š Results

Example performance metrics:

Model Accuracy Precision Recall F1-Score
Logistic Regression 92.3% 91.8% 92.7% 92.2%
Random Forest 94.1% 93.5% 94.6% 94.0%
SVM 93.7% 93.2% 94.1% 93.6%
LSTM 95.2% 94.8% 95.6% 95.2%
BERT 97.4% 97.1% 97.7% 97.4%

Note: Results may vary based on dataset and hyperparameters

๐Ÿ–ผ๏ธ Visualizations

The project generates various visualizations:

  • Confusion matrices
  • ROC curves and AUC scores
  • Word importance heatmaps (LIME)
  • SHAP force plots
  • SHAP summary plots
  • Feature importance charts

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ“š References

  • Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why should I trust you?" Explaining the predictions of any classifier. KDD.
  • Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. NIPS.
  • Devlin, J., et al. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
  • Shu, K., et al. (2017). Fake News Detection on Social Media: A Data Mining Perspective.

๐Ÿ‘ฅ Authors

๐Ÿ™ Acknowledgments

  • LIME and SHAP libraries for explainability frameworks
  • Hugging Face for transformer models
  • Kaggle for fake news datasets
  • scikit-learn for ML implementations

๐Ÿ“ง Contact

For questions or suggestions, please open an issue or contact [bharadwajkunal172@gmail.com]


Note: This project is for educational and research purposes. Always verify news from multiple credible sources.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published