🕷️ Scrapy & Clustering Data

Combine web scraping with data clustering to collect, clean, and analyze web data.
Built with Scrapy, Pandas, and Scikit-learn.

Overview

This project demonstrates how to:

Scrape structured data from websites using Scrapy.
Clean and preprocess the scraped data with pandas.
Apply clustering algorithms
(K-Means, DBSCAN, Agglomerative).
Visualize results and export outputs to CSV/JSON.

Features

Multi-spider Scrapy project structure.
Configurable item pipelines (cleaning / storage).
Clustering pipeline with selectable algorithms.
Export clusters and sample visualizations.
Example commands to run scraping and clustering.

🛠️ Tech Stack

Python 3.8+
Scrapy
pandas, numpy
scikit-learn
matplotlib / seaborn
joblib

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
quotes.toscrape		quotes.toscrape
wordpress		wordpress
README.md		README.md
cluster.png		cluster.png
scrapy.png		scrapy.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

🕷️ Scrapy & Clustering Data

Overview

Features

🛠️ Tech Stack

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

sevenjuneAI/PostClustering

Folders and files

Latest commit

History

Repository files navigation

🕷️ Scrapy & Clustering Data

Overview

Features

🛠️ Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages