Web Crawler

A lightweight and configurable web crawler built with Node.js.

Description

This crawler recursively extracts links from websites using Node.js, Axios, and Cheerio. It respects depth limits and avoids duplicate visits for efficient crawling.

Features

Asynchronous operation for optimal performance
Recursive link extraction with configurable depth
Deduplication of visited URLs
Targeted crawling capability (e.g., specific domain)
Extensible codebase for easy customization
Error handling and reporting

Technologies

Node.js
Axios (HTTP requests)
Cheerio (HTML parsing)

Implementation

The crawler fetches and parses HTML using Axios and Cheerio, respectively. It maintains a set of visited URLs and recursively follows links within the configured depth limit and target domain. The process continues until all links are crawled or the maximum depth is reached.

Usage

Clone the repository
Install dependencies: npm install
Configure MAX_DEPTH and targetDomain in crawler.js
Run: node crawler.js

Contributing

Contributions are welcome! Open issues or submit pull requests.

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Web Crawler

Description

Features

Technologies

Implementation

Usage

Contributing

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Web Crawler

Description

Features

Technologies

Implementation

Usage

Contributing

License