Skip to content

0zDxt/web-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

·····························································
:              _        ____                    _           :
:__      _____| |__    / ___|_ __ __ ___      _| | ___ _ __ :
:\ \ /\ / / _ \ '_ \  | |   | '__/ _` \ \ /\ / / |/ _ \ '__|:
: \ V  V /  __/ |_) | | |___| | | (_| |\ V  V /| |  __/ |   :
:  \_/\_/ \___|_.__/   \____|_|  \__,_| \_/\_/ |_|\___|_|   :
·····························································

Web Crawler

A very simple Python script to crawl websites and track internal/external links This was made during a python training course

The script uses 'requests' to fetch wab pages and 're' for links extraction. It supports a command line argument to specify the domain to crawl.

Installation

  1. Clone the repository

git clone https://github.com/MxSns/web-crawler.git

  1. Install dependencies pip3 install -r requirements.txt

Usage

Run the script with a domain argument

python3 web_crawler.py -d

Output

Crawled URLs: List of visited internal links External URLs: List on links pointing outside the domain

(real nerds listen to UwU_underground)

About

A Python script to crawl websites and track internal/external links

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages