Bridging Legacy and Modern Threat Detection

Machine Learning and Deep Learning Models on EMBER2018 & CIC-EvasivePDF2022

🚀 Overview

Malware detection remains a critical challenge as attackers constantly evolve tactics.
This project benchmarks traditional ML, deep learning, and ensemble methods for malware detection across two generations of attacks:

EMBER2018 → legacy malware (2006–2018)
CIC-EvasivePDF2022 → recent evasive PDF malware

The aim is to highlight strengths, weaknesses, and trade-offs between models, focusing on accuracy, adaptability, and robustness to imbalanced datasets.

🔑 Key Features

Datasets:
- EMBER2018 (structured malware features, legacy attacks)
- CIC-EvasivePDF2022 (modern evasive samples, PDFs)
Algorithms Tested:
- Traditional ML: Random Forest, XGBoost, AdaBoost, Logistic Regression, KNN
- Deep Learning: CNN, MLP, RNN–LSTM, Transformer
- Ensembles: Stacking, Voting classifiers (hybrid ML + DL)
Challenges Tackled:
- Class imbalance handling (resampling, weighting)
- Comparative evaluation of structured vs evasive malware detection

📊 Results Summary

Model / Method	EMBER2018 Accuracy	CIC-EvasivePDF2022 Accuracy
Random Forest	99.6%	99.3%
XGBoost	99.7%	99.1%
CNN	78.4%	98.1%
RNN–LSTM	50.4%	96.2%
Transformer	76.7%	97.4%
Voting Ensemble	99.5%	99.1%

👉 Key Insight:

ML methods excel on structured, legacy malware (EMBER).
DL models shine on evasive, complex malware (CIC-EvasivePDF).
Ensembles combine the best of both worlds.

🛠️ Tech Stack

Python 3.9+
Scikit-learn, PyTorch, XGBoost
Pandas, NumPy, Matplotlib/Seaborn for analysis

⚡ Quick Start

Clone repo

git clone https://github.com/MAvRK7/Bridging-Legacy-Modern-Threat-Detection.git
cd Bridging-Legacy-Modern-Threat-Detection

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md
cybersec.ipynb		cybersec.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bridging Legacy and Modern Threat Detection

🚀 Overview

🔑 Key Features

📊 Results Summary

🛠️ Tech Stack

⚡ Quick Start

About

Uh oh!

Releases

Packages

Languages

MAvRK7/Malware-detection-using-MLmodels

Folders and files

Latest commit

History

Repository files navigation

Bridging Legacy and Modern Threat Detection

🚀 Overview

🔑 Key Features

📊 Results Summary

🛠️ Tech Stack

⚡ Quick Start

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages