🧪 Toxic Comment Classifier

🧪 Toxic Comment Classifier is a simple yet effective machine learning project that detects toxic comments in both English and Russian.
It uses classical NLP techniques (TF-IDF + Logistic Regression) for real-time text classification.

Whether you're building a moderation system or just exploring NLP, this project is a great starting point.

📦 Description

The script:

Downloads and extracts English and Russian toxic comment datasets.
Merges them into training and testing sets.
Uses TfidfVectorizer to convert text into numerical features.
Trains a logistic regression model.
Saves the model and vectorizer to model.pkl.
Allows the user to input a comment and checks if it is toxic.

🗃 Datasets Used

Jigsaw Toxic Comment Classification Challenge:
https://www.kaggle.com/datasets/julian3833/jigsaw-toxic-comment-classification-challenge
Russian Language Toxic Comments:
https://www.kaggle.com/datasets/blackmoon/russian-language-toxic-comments

📁 Project Structure

.
└── Toxic Comment Classifier AI
├── dataset
├── .gitattributes
├── .gitignore
├── LICENSE
├── main.py
├── model.pkl
├── README.md
└── requirements.txt

🛠 Installation

Clone the repository:

git clone https://github.com/pashudzu/ToxicCommentClassificationAI.git  
cd ToxicCommentClassificationAI
python main.py

Install dependencies:
pip install -r requirements.txt

🔍 Example Usage

Comment	Classification
"You're stupid and nobody likes you!"	❌ Toxic
"Have a great day!"	✅ Kindness

📈 Model Performance

The model prints the accuracy score after training.

🧠 Technologies Used

Python 3
scikit-learn
NLTK
pickle
TF-IDF vectorization
Logistic Regression

📌 Notes

✅ Supports both English and Russian comments.
🧪 Uses only the toxic label (binary classification)
💾 The model is saved to avoid retraining on each run.
🚀 Avoids retraining if a saved model exists

📜 License

This project is licensed under the MIT License. Use it freely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧪 Toxic Comment Classifier

📦 Description

🗃 Datasets Used

📁 Project Structure

🛠 Installation

🔍 Example Usage

📈 Model Performance

🧠 Technologies Used

📌 Notes

📜 License

Made with ❤️ by pashudzu

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
dataset		dataset
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
model.pkl		model.pkl
requirements.txt		requirements.txt

License

pashudzu/ToxicCommentClassificationAI

Folders and files

Latest commit

History

Repository files navigation

🧪 Toxic Comment Classifier

📦 Description

🗃 Datasets Used

📁 Project Structure

🛠 Installation

🔍 Example Usage

📈 Model Performance

🧠 Technologies Used

📌 Notes

📜 License

Made with ❤️ by pashudzu

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages