A comprehensive implementation of fake news detection using both traditional Machine Learning and Deep Learning approaches, with explainability powered by LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations).
This project addresses the critical problem of fake news detection by:
- Implementing multiple ML models (Logistic Regression, Random Forest, SVM)
- Implementing Deep Learning models (LSTM, BERT)
- Providing explainable AI (XAI) using LIME and SHAP
- Visualizing model predictions and explanations
- Comparing model performance and interpretability
-
Multiple Model Architectures
- Traditional ML: Logistic Regression, Random Forest, SVM
- Deep Learning: LSTM, BERT (Transformer-based)
-
Explainability Techniques
- LIME: Local interpretable model-agnostic explanations
- SHAP: Unified approach to explain predictions
-
Comprehensive Analysis
- Performance metrics (Accuracy, Precision, Recall, F1-Score)
- Confusion matrices
- ROC curves
- Feature importance visualization
- Word-level and phrase-level explanations
AIProject/
โโโ README.md
โโโ requirements.txt
โโโ LICENSE
โโโ .gitignore
โโโ data/
โ โโโ raw/ # Raw dataset files
โ โโโ processed/ # Preprocessed data
โ โโโ README.md
โโโ src/
โ โโโ __init__.py
โ โโโ data_preprocessing.py # Data loading and preprocessing
โ โโโ ml_models.py # Traditional ML models
โ โโโ dl_models.py # Deep Learning models
โ โโโ explainability.py # LIME & SHAP implementations
โ โโโ visualization.py # Plotting and visualization
โ โโโ utils.py # Utility functions
โโโ notebooks/
โ โโโ demo.ipynb # Interactive demonstration
โโโ models/ # Saved model files
โโโ results/ # Outputs, plots, reports
โ โโโ plots/
โ โโโ reports/
โโโ main.py # Main training script
- Python 3.8 or higher
- pip package manager
- (Optional) CUDA-capable GPU for deep learning models
- Clone the repository:
git clone git@github.com:KunalBharadwaj/ForgeIt.git- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Linux/Mac- Install dependencies:
pip install -r requirements.txt- Download NLTK data:
python -c "import nltk; nltk.download('stopwords'); nltk.download('punkt'); nltk.download('wordnet')"This project uses fake news datasets. You can use:
- Kaggle Fake News Dataset: Link
- LIAR Dataset: Political statements with truth ratings
- FakeNewsNet: Social media fake news dataset
Place your dataset in the data/raw/ directory.
Expected CSV format:
textortitle: The news article textlabel: Binary classification (0=Real, 1=Fake) or (Real/Fake)
Run the main training script:
python main.py --data data/raw/news.csv --models all --explainOptions:
--data: Path to dataset--models: Choose models (lr, rf, svm, lstm, bert, all)--explain: Enable LIME & SHAP explanations--test-size: Train-test split ratio (default: 0.2)
Launch the Jupyter notebook for interactive exploration:
jupyter notebook notebooks/demo.ipynbfrom src.explainability import explain_prediction
# Explain a prediction using LIME
explanation = explain_prediction(
model=trained_model,
text="Your news article text here",
method="lime"
)
# Visualize the explanation
explanation.show_in_notebook()-
Logistic Regression
- Fast, interpretable baseline
- TF-IDF feature extraction
-
Random Forest
- Ensemble method
- Feature importance analysis
-
Support Vector Machine (SVM)
- Effective for high-dimensional text data
- RBF and linear kernels
-
LSTM (Long Short-Term Memory)
- Sequential text processing
- Word embeddings (Word2Vec/GloVe)
-
BERT (Bidirectional Encoder Representations from Transformers)
- State-of-the-art NLP model
- Fine-tuned for fake news classification
LIME explains individual predictions by:
- Perturbing input text
- Building local linear approximations
- Highlighting influential words/phrases
Advantages:
- Model-agnostic
- Human-interpretable
- Works with any classifier
SHAP provides unified explanations based on game theory:
- Consistent feature attribution
- Shapley values for each word
- Global and local explanations
Advantages:
- Theoretically grounded
- Consistent and accurate
- Multiple visualization types
Example performance metrics:
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Logistic Regression | 92.3% | 91.8% | 92.7% | 92.2% |
| Random Forest | 94.1% | 93.5% | 94.6% | 94.0% |
| SVM | 93.7% | 93.2% | 94.1% | 93.6% |
| LSTM | 95.2% | 94.8% | 95.6% | 95.2% |
| BERT | 97.4% | 97.1% | 97.7% | 97.4% |
Note: Results may vary based on dataset and hyperparameters
The project generates various visualizations:
- Confusion matrices
- ROC curves and AUC scores
- Word importance heatmaps (LIME)
- SHAP force plots
- SHAP summary plots
- Feature importance charts
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why should I trust you?" Explaining the predictions of any classifier. KDD.
- Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. NIPS.
- Devlin, J., et al. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
- Shu, K., et al. (2017). Fake News Detection on Social Media: A Data Mining Perspective.
- Kunal Bharadwaj - GitHub
- LIME and SHAP libraries for explainability frameworks
- Hugging Face for transformer models
- Kaggle for fake news datasets
- scikit-learn for ML implementations
For questions or suggestions, please open an issue or contact [bharadwajkunal172@gmail.com]
Note: This project is for educational and research purposes. Always verify news from multiple credible sources.