🔍 Web Scraper Pro

A modern full-stack web scraping application built with Next.js, designed to extract, analyze, and export website data effortlessly.
It offers server-side HTML parsing, intelligent data extraction, and multi-format export capabilities — all within a beautiful and responsive interface.

✨ Features

🌐 Universal Web Scraping – Extract structured data from any publicly accessible website
📊 Structured Data Extraction – Parse headings, paragraphs, links, images, tables, and metadata automatically
💾 Multi-Format Export – Download data as JSON, CSV, or Excel files
🎯 Intelligent URL Resolution – Automatically converts relative URLs into absolute paths
⚡ Real-Time Processing – Instant feedback with progress indicators and loading states
🎨 Modern UI – Responsive, minimal, and dark-mode ready (built with TailwindCSS)
🛡️ Ethical Scraping – Built-in rate limiting and User-Agent rotation
📱 Mobile Friendly – Works seamlessly on all screen sizes

🛠️ Tech Stack

Frontend

Next.js 14 – React framework with App Router
TypeScript – Type-safe development
TailwindCSS – Utility-first CSS framework
Lucide React – Icon system
Shadcn/ui – Reusable UI components

Backend

Next.js API Routes – Serverless endpoints
Cheerio – Fast HTML parser
XLSX – Excel file generator

🚀 Getting Started

Prerequisites

Node.js 18+
npm or yarn

Installation

# Clone repository
git clone https://github.com/yourusername/web-scraper-pro.git
cd web-scraper-pro

# Install dependencies
pnpm install
# or
npm install

# Run development server
pnpm run dev
# or
npm dev

Then, open http://localhost:3000 in your browser.

📖 Usage

Enter a website URL (e.g., https://example.com)
Click "Scrape Website"
Wait for completion and view organized data
Export results as JSON

🔧 API Documentation

POST `/api/scrape`

Scrapes a website and returns structured data.

Request Body

{ "url": "https://example.com" }

Response

{
  "url": "https://example.com",
  "title": "Example Domain",
  "description": "Example website description",
  "headings": ["Heading 1", "Heading 2"],
  "paragraphs": ["Paragraph text..."],
  "links": [{ "text": "Link text", "href": "https://example.com/link" }],
  "images": ["https://example.com/image.jpg"],
  "tables": [{ "headers": ["Col 1", "Col 2"], "rows": [["Data 1", "Data 2"]] }]
}

Error Response

{ "error": "Failed to scrape website: HTTP 404" }

📂 Project Structure

web-scraper-pro/
├── app/
│   ├── api/
│   │   └── scrape/
│   │       └── route.ts        # API endpoint
│   ├── layout.tsx              # Root layout
│   └── page.tsx                # Home page
├── components/
│   ├── ui/                     # UI components
│   ├── data-display.tsx        # Data visualization
│   ├── footer.tsx              # Footer
│   └── url-form.tsx            # URL input form
├── lib/
│   └── utils.ts                # Utility functions
├── public/                     # Static assets
├── package.json
├── tailwind.config.ts
├── tsconfig.json
└── README.md

⚖️ Legal & Ethical Considerations

This project is for educational purposes only. Please follow ethical scraping practices:

✅ Scrape only public data
✅ Respect robots.txt and site Terms of Service
✅ Implement rate limiting
❌ Do not scrape personal/sensitive data
❌ Do not bypass authentication or paywalls
❌ Do not republish copyrighted content

Disclaimer: You are responsible for ensuring compliance with all applicable laws.

🔐 Best Practices Implemented

User-Agent headers for scraper requests
Graceful error handling
Configurable rate limiting
Content size protection
URL validation and normalization

📝 License

Licensed under the MIT License. See the LICENSE file for details.

👨‍💻 Author

Sandip Singha

GitHub: @myselfsandip

🙏 Acknowledgments

⭐ If you found this project helpful, please give it a star!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
components		components
hooks		hooks
lib		lib
public		public
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
components.json		components.json
next.config.mjs		next.config.mjs
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔍 Web Scraper Pro

✨ Features

🛠️ Tech Stack

Frontend

Backend

🚀 Getting Started

Prerequisites

Installation

📖 Usage

🔧 API Documentation

POST `/api/scrape`

📂 Project Structure

⚖️ Legal & Ethical Considerations

🔐 Best Practices Implemented

📝 License

👨‍💻 Author

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

myselfsandip/web-scraper

Folders and files

Latest commit

History

Repository files navigation

🔍 Web Scraper Pro

✨ Features

🛠️ Tech Stack

Frontend

Backend

🚀 Getting Started

Prerequisites

Installation

📖 Usage

🔧 API Documentation

POST /api/scrape

📂 Project Structure

⚖️ Legal & Ethical Considerations

🔐 Best Practices Implemented

📝 License

👨‍💻 Author

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

POST `/api/scrape`

Packages