This project classifies emails as Spam or Not Spam using structured email campaign data. Machine learning models are trained using numerical and categorical features derived from email campaigns.
Email spam impacts communication efficiency and user trust. The goal of this project is to predict whether an email is spam based on campaign-related attributes.
The dataset contains campaign-level features related to emails.
Email_TypeSubject_Hotness_ScoreEmail_Source_TypeCustomer_LocationEmail_Campaign_TypeTotal_Past_CommunicationsTime_Email_sent_CategoryWord_CountTotal_LinksTotal_Images
Email_Status0→ Not Spam1→ Spam
- Data inspection and preprocessing
- Handling categorical and numerical features
- Feature-target separation
- Train-test split
- Classification model training
- Model evaluation
- Logistic Regression
- Naive Bayes
- Accuracy
- Precision
- Recall
- F1-score
- Subject hotness score impacts spam classification
- Link and image count influence email status
- Simple classifiers perform well on structured data
- Advanced feature engineering
- Try ensemble classifiers
- Apply explainability (SHAP)
- Deploy as a web application
- Binary classification on structured datasets
- Handling categorical campaign features
- Model evaluation for imbalance scenarios