This repository presents a comparative study of deep learning architectures for phishing URL detection, including:
- Convolutional Neural Network (CNN)
- Bidirectional Recurrent Neural Network (BRRN)
- Attention-Based Neural Network
- Teacher-Student Knowledge Distillation (RoBERTa → DistilRoBERTa)
The models were trained on a curated dataset of phishing and legitimate URLs. Each model's performance was evaluated using Accuracy, Precision, Recall, F1-Score, and Confusion Matrix.
📄 Read the full academic paper: View Publication
| Model | Accuracy | F1-Score (Phishing Class) |
|---|---|---|
| CNN | 63.5% | 0.72 |
| BRRN | 54.7% | 0.60 |
| Attention Network | 55.2% | 0.61 |
| Student (DistilRoBERTa) | 75.0% | 0.85 |
| Teacher (RoBERTa) | 75.0% | 0.85 |
- Preprocessing: Tokenization, URL cleaning, padding
- Balancing: Applied SMOTE to address class imbalance
- Evaluation Metrics: Accuracy, Precision, Recall, F1-score, Confusion Matrix
- Visual Analysis: Charts for metric comparison, heatmaps for confusion matrices
- Python
- TensorFlow & Keras
- Hugging Face Transformers
- Scikit-learn
- Imbalanced-learn (SMOTE)
- Matplotlib, Seaborn
If you use this repository or reference the research, please cite:
Oluwadamilare Tobiloba. (2025). A Comparative Study of Deep Learning Models for Phishing Detection Using Teacher-Student Learning.
Special thanks to the open-source community for the tools and datasets that enabled this research.
Have questions or suggestions? Feel free to reach out or connect:
- LinkedIn: [Profile](https://linkedin.com/in/your-profile
- Email: tbiggest4@gmail.com





