This is a recursive web crawler written in Rust that visits websites, extracts links, and stores them in a SQLite database.
- Extracts all links on the site
- Persists data in a SQLite database (table:
link) - Recursively crawls websites up to a configurable depth
- Visualizes data in a static png or dynamic graph
Each link is saved with its parent URL and depth level, allowing you to visualize the structured hierarchy of your crawled website:
After crawling a website, visualize the link hierarchy using visualize_hierachy.py:
#To test run
python -u ".../link_db_test.py.py" #Fills the database
python -u ".../visualize_hierarchy.py" --db ./data/links_test.db
#Generate all visualization layouts
python -u ".../visualize_hierarchy.py"
#Generate static layout
python -u ".../visualize_hierarchy.py" -s
python -u ".../visualize_hierarchy.py" --static
#Genaerate interactive layout
python -u ".../visualize_hierarchy.py" -i
python -u ".../visualize_hierarchy.py" --interactiveOutput file:
- [static graph]
link_hierarchy_tree.png- Hierarchical tree visualization - [dynamic graph]
link_hierarchy.html- interactive graph
For more information about NetworkX: https://networkx.org/
Install Python dependencies:
pip install networkx
pip install matplotlib
pip install pyvisPyVis: https://pyvis.readthedocs.io/en/latest/ NetworkX: https://networkx.org/
git clone https://github.com/jakobx0/FerrumWeb
cd FerrumWeb
cargo run (if rust is not installed: https://www.rust-lang.org/tools/install )Rust help: https://users.rust-lang.org/t/link-exe-not-found-despite-build-tools-already-installed/47080
On Windows the Error: linker 'link.exe' not found can be solved via:
rustup toolchain install stable-x86_64-pc-windows-gnu
rustup default stable-x86_64-pc-windows-gnuFor Linux the Error: failed to run custom build command for 'openssl-sys v0.9.109':
sudo apt install libssl-devTo analyse the DB file simply open a DBMS of your choice. For Example the DB Browser for SQLite: https://sqlitebrowser.org/
Example SQL Queries:
Count of distinct URLs:
SELECT COUNT(DISTINCT URL) FROM link;Grouped links by frequency:
SELECT URL, COUNT(*) AS count FROM link GROUP BY URL ORDER BY count DESC;


