GitHub - sameer-at-git/SMS-Spam-Classification-using-Naive-Bayes-Decision-Tree-and-Random-Forest: A complete SMS spam-classification pipeline using classic machine-learning algorithms: it ingests the publicly available SMS spam dataset, applies text preprocessing (cleaning, tokenization, vectorization), then trains and evaluates three classifiers : Naïve Bayes model, a Decision Tree, and a Random Forest

Project Overview

• Created a machine learning model that detects/classifies a SMS into SPAM or HAM (normal) based on the textual data using Natural Language Processing.  • Engineered features like word_count, contains_currency_symbol, and contains_number from the text SMS.

How will this project help?

• This project helps in filtering/cleaning the SMS from the phone.

Resources Used

• Packages: pandas, numpy, sklearn, matplotlib, seaborn, nltk.  • Dataset by UCI Machine Learing on Kaggle: https://www.kaggle.com/uciml/sms-spam-collection-dataset

Exploratory Data Analysis (EDA)

• Exploring NaN values in dataset   • Plotted countplot for SMS labels Spam vs. Ham

Feature Engineering

• Handling imbalanced dataset using Oversampling     • Creating new features from existing features e.g. word_count, contains_currency_symbol, contains_numbers, etc.   

Data Cleaning

• Removing special character and numbers using regular expression   • Converting the entire sms into lower case   • Tokenizing the sms by words   • Removing the stop words   • Lemmatizing the words   • Joining the lemmatized words   • Building a corpus of messages

Model Building and Evaluation

Metric: F1-Score   • Multinomial Naive Bayes: 0.943   • Decision Tree: 0.98   • Random Forest: 0.994   • Voting (Decision Tree + Multinomial Naive Bayes): 0.98     Note: Evaluation scores are obtained using cross validation.

Model Prediction

Do ⭐ the repository, if it helped you in anyway.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
readme-resources		readme-resources
README.md		README.md
Spam SMS Classication.ipynb		Spam SMS Classication.ipynb
Spam SMS Collection		Spam SMS Collection
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project Overview

How will this project help?

Resources Used

Exploratory Data Analysis (EDA)

Feature Engineering

Data Cleaning

Model Building and Evaluation

Model Prediction

About

Uh oh!

Languages

sameer-at-git/SMS-Spam-Classification-using-Naive-Bayes-Decision-Tree-and-Random-Forest

Folders and files

Latest commit

History

Repository files navigation

Project Overview

How will this project help?

Resources Used

Exploratory Data Analysis (EDA)

Feature Engineering

Data Cleaning

Model Building and Evaluation

Model Prediction

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages