Skip to content

sayendranadh/spam_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

15 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“ง Spam Detection System

Python Streamlit scikit-learn License

An end-to-end Machine Learning web application for intelligent spam detection

๐Ÿš€ Live Demo โ€ข ๐Ÿ“– Documentation โ€ข ๐Ÿ› Report Bug โ€ข โœจ Request Feature


๐Ÿ“‹ Table of Contents


๐ŸŽฏ Overview

The Spam Detection System is a comprehensive machine learning application that classifies text messages and emails as either Spam or Ham (legitimate). Built with a focus on demonstrating the complete ML lifecycle, this project encompasses:

  • ๐Ÿ“Š Data Exploration & Analysis
  • ๐Ÿงน Text Preprocessing & NLP
  • ๐Ÿ”ง Feature Engineering
  • ๐Ÿค– Model Training & Evaluation
  • ๐Ÿš€ Production Deployment

This application showcases real-world implementation of multiple ML algorithms with an interactive web interface, making it ideal for understanding practical machine learning workflows.


โœจ Features

Core Functionality

  • ๐ŸŽฏ Real-time Spam Classification - Instant predictions on user-input text
  • ๐Ÿ”„ Multiple ML Models - Compare predictions from 4 different algorithms
  • ๐Ÿ“Š Performance Metrics - Comprehensive evaluation with accuracy, precision, recall, and F1-score
  • ๐Ÿ“ˆ Visual Analytics - Interactive charts for model comparison and feature distributions
  • โšก Optimized Performance - Cached model loading for lightning-fast responses

Technical Highlights

  • ๐Ÿง  Advanced NLP preprocessing pipeline
  • ๐Ÿ’พ Serialized model persistence for efficient deployment
  • ๐ŸŽจ Clean, intuitive Streamlit UI
  • โ˜๏ธ Cloud-ready with version-controlled dependencies
  • ๐Ÿ”’ Production-grade error handling and validation

๐Ÿš€ Live Demo

Experience the application in action:

๐Ÿ”— Streamlit Cloud Deployment

Quick Test Examples

Try these sample inputs:

Spam Example:

URGENT! You have won $1,000,000! Click here to claim your prize NOW!

Ham Example:

Hey, are we still meeting for coffee tomorrow at 3pm?

๐Ÿง  Machine Learning Models

The system employs four distinct machine learning models, each optimized for text classification:

Model Algorithm File Key Strength
Logistic Regression SAGA Solver logistic_regression_saga_model.pkl Fast training, interpretable coefficients
Linear SVC Calibrated Classifier linearsvc_calibrated_model.pkl Excellent for high-dimensional text data
Random Forest Ensemble Method random_forest_model.pkl Robust to overfitting, feature importance
Neural Network MLP Classifier neural_network_mlp_model.pkl Captures complex non-linear patterns

Additional Artifacts

  • preprocessed_data.pkl - Cleaned and processed training dataset
  • model_results.pkl - Performance metrics for all models
  • feature_distributions.png - Visualization of feature importance
  • model_comparison.png - Comparative analysis charts

๐Ÿ› ๏ธ Technology Stack

Core Technologies

Category Technologies
Language Python 3.10.8
ML Framework scikit-learn 1.2.2
NLP NLTK, SpaCy
Data Processing NumPy, Pandas, SciPy
Visualization Matplotlib, Seaborn, Plotly
Web Framework Streamlit
Deployment Streamlit Cloud

๐Ÿ’ป Installation

Prerequisites

  • Python 3.10.8
  • Anaconda/Miniconda (recommended) or pip
  • Git

Option 1: Using Conda (Recommended)

# Clone the repository
git clone https://github.com/sayendranadh/spam_project.git
cd spam_project

# Create a new conda environment
conda create -n spam_env python=3.10.8 -y

# Activate the environment
conda activate spam_env

# Install dependencies
pip install -r requirements.txt

Option 2: Using pip & venv

# Clone the repository
git clone https://github.com/sayendranadh/spam_project.git
cd spam_project

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

๐ŸŽฎ Usage

Running Locally

# Ensure your environment is activated
conda activate spam_env  # or: source venv/bin/activate

# Launch the Streamlit application
streamlit run spam_detection_ui.py

The application will open automatically in your default browser at http://localhost:8501

Using the Application

  1. Enter Text: Type or paste the message you want to classify
  2. Select Model: Choose from the available ML models (or compare all)
  3. Get Prediction: Click "Classify" to see results
  4. View Metrics: Explore performance statistics and visualizations

Training Pipeline (Optional)

To retrain models with your own data:

# Step 1: Exploratory Data Analysis
python step1_data_exploration.py

# Step 2: Feature Engineering
python step2_feature_engineering.py

# Step 3: Data Preprocessing
python step3_preprocessing.py

# Step 4: Model Training
python step4_model_training.py

# Run the UI with new models
streamlit run spam_detection_ui.py

๐Ÿ“ Project Structure

spam_project/
โ”‚
โ”œโ”€โ”€ ๐Ÿ“„ spam_detection_ui.py              # Main Streamlit application (ENTRY POINT)
โ”‚
โ”œโ”€โ”€ ๐Ÿ”ฌ ML Pipeline Scripts
โ”‚   โ”œโ”€โ”€ step1_data_exploration.py        # EDA and data insights
โ”‚   โ”œโ”€โ”€ step2_feature_engineering.py     # Text preprocessing & feature extraction
โ”‚   โ”œโ”€โ”€ step3_preprocessing.py           # Data cleaning pipeline
โ”‚   โ””โ”€โ”€ step4_model_training.py          # Model training & evaluation
โ”‚
โ”œโ”€โ”€ ๐Ÿ”ง Utilities
โ”‚   โ””โ”€โ”€ fix_preprocessed_data.py         # Data consistency fixes
โ”‚
โ”œโ”€โ”€ ๐Ÿ’พ Model Artifacts (*.pkl)
โ”‚   โ”œโ”€โ”€ logistic_regression_saga_model.pkl
โ”‚   โ”œโ”€โ”€ linearsvc_calibrated_model.pkl
โ”‚   โ”œโ”€โ”€ random_forest_model.pkl
โ”‚   โ”œโ”€โ”€ neural_network_mlp_model.pkl
โ”‚   โ”œโ”€โ”€ preprocessed_data.pkl
โ”‚   โ””โ”€โ”€ model_results.pkl
โ”‚
โ”œโ”€โ”€ ๐Ÿ“Š Visualizations
โ”‚   โ”œโ”€โ”€ model_comparison.png             # Model performance charts
โ”‚   โ””โ”€โ”€ feature_distributions.png        # Feature importance plots
โ”‚
โ”œโ”€โ”€ โš™๏ธ Configuration Files
โ”‚   โ”œโ”€โ”€ requirements.txt                 # Python dependencies
โ”‚   โ””โ”€โ”€ runtime.txt                      # Python version for deployment
โ”‚
โ””โ”€โ”€ ๐Ÿ“– README.md                         # Project documentation

๐Ÿ“Š Model Performance

Evaluation Metrics

All models are evaluated using standard classification metrics:

  • Accuracy - Overall prediction correctness
  • Precision - Spam prediction reliability
  • Recall - Spam detection coverage
  • F1-Score - Harmonic mean of precision and recall

Comparative Analysis

View the model_comparison.png file for detailed performance visualizations showing:

  • Model accuracy comparison
  • Precision-Recall trade-offs
  • Confusion matrices
  • ROC curves

โ˜๏ธ Deployment

Streamlit Cloud Deployment

Required Files

  • requirements.txt - All Python dependencies
  • runtime.txt - Python version specification

runtime.txt

python-3.10.8

Deployment Steps

  1. Prepare Repository

    git add .
    git commit -m "Deploy to Streamlit Cloud"
    git push origin main
  2. Configure Streamlit Cloud

    • Visit share.streamlit.io
    • Click "New app"
    • Select your repository: spam_project
    • Branch: main
    • Main file: spam_detection_ui.py
  3. Deploy

    • Click "Deploy!"
    • Wait for build completion
    • Your app will be live at: https://[app-name].streamlit.app/

Version Compatibility

โš ๏ธ Important: Models are trained with specific library versions

Python: 3.10.8
scikit-learn: 1.2.2

Using different versions may cause compatibility issues. The requirements.txt file pins exact versions to ensure consistency.


๐Ÿ”ฎ Future Enhancements

Planned Features

  • Deep Learning Models

    • LSTM networks for sequential text analysis
    • Transformer-based models (BERT, DistilBERT)
  • Model Interpretability

    • SHAP values for feature importance
    • LIME for local interpretability
  • API Development

    • RESTful API with FastAPI
    • Swagger/OpenAPI documentation
  • Data Management

    • Database integration (PostgreSQL/MongoDB)
    • User feedback collection system
  • Advanced Features

    • Real-time email ingestion
    • Batch processing capabilities
    • Multi-language support
    • Custom model training interface

Contributions Welcome!

Have ideas for improvements? Check out the Contributing section below.


๐Ÿค Contributing

Contributions are what make the open-source community amazing! Any contributions you make are greatly appreciated.

How to Contribute

  1. Fork the Project

    git clone https://github.com/sayendranadh/spam_project.git
  2. Create a Feature Branch

    git checkout -b feature/AmazingFeature
  3. Commit Changes

    git commit -m 'Add some AmazingFeature'
  4. Push to Branch

    git push origin feature/AmazingFeature
  5. Open a Pull Request

Contribution Guidelines

  • Write clear, descriptive commit messages
  • Follow PEP 8 style guidelines for Python code
  • Add tests for new features
  • Update documentation as needed
  • Ensure all tests pass before submitting PR

๐Ÿ‘ค Author

Sayendranadh

๐Ÿค Contributors

Special thanks to:

  • I.Vishnu Varma - Project Contributor
  • P.Sai Charan - Project Contributor

Connect & Support

If you find this project helpful:

  • โญ Star this repository
  • ๐Ÿฆ Share it with others
  • ๐Ÿค Connect on LinkedIn

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License Summary

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software.

๐Ÿ™ Acknowledgements

Special thanks to:

  • scikit-learn - Comprehensive ML library and documentation
  • Streamlit - Amazing framework for ML web apps
  • Streamlit Cloud - Free hosting for data apps
  • NLTK - Natural Language Toolkit
  • SpaCy - Industrial-strength NLP
  • Open-source NLP community for datasets and research

Inspiration & Resources


โญ Star this repository if you found it helpful!

Made with โค๏ธ by Sayendranadh

๐Ÿ” Back to Top

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages