Web Crawler

·····························································
:              _        ____                    _           :
:__      _____| |__    / ___|_ __ __ ___      _| | ___ _ __ :
:\ \ /\ / / _ \ '_ \  | |   | '__/ _` \ \ /\ / / |/ _ \ '__|:
: \ V  V /  __/ |_) | | |___| | | (_| |\ V  V /| |  __/ |   :
:  \_/\_/ \___|_.__/   \____|_|  \__,_| \_/\_/ |_|\___|_|   :
·····························································

Web Crawler

A very simple Python script to crawl websites and track internal/external links This was made during a python training course

The script uses 'requests' to fetch wab pages and 're' for links extraction. It supports a command line argument to specify the domain to crawl.

Installation

Clone the repository

git clone https://github.com/MxSns/web-crawler.git

Install dependencies pip3 install -r requirements.txt

Usage

Run the script with a domain argument

python3 web_crawler.py -d

Output

Crawled URLs: List of visited internal links External URLs: List on links pointing outside the domain

(real nerds listen to UwU_underground)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
requirements.txt		requirements.txt
web_crawler.py		web_crawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Crawler

Installation

Usage

Output

About

Uh oh!

Releases

Packages

Languages

0zDxt/web-crawler

Folders and files

Latest commit

History

Repository files navigation

Web Crawler

Installation

Usage

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages