DeadLink-Crawler 🕷️

DeadLink-Crawler is a fast, concurrent CMD tool built in Go to recursively crawl a website and detect broken (dead) links — specifically links that return HTTP status codes in the 4xx or 5xx range.

📌 Features

✅ Detects dead and working links on any given website
🔄 Recursively crawls internal links
🔗 Categorizes internal and external links
🔁 Uses concurrency to speed up crawling
🔒 Thread-safe link tracking to prevent duplicate crawls
📋 Clean CLI output showing status of each link

🛠️ Requirements

Go 1.20 or higher installed
Internet connection (for crawling)

📦 Installation

Clone the repository

git clone https://github.com/your-username/deadlink-crawler.git
cd deadlink-crawler

Install dependencies

This project uses a single external package:

go get golang.org/x/net/html

🚀 Running the Project

Simply run:

go run main.go https://scrape-me.dreamsofcode.io

You will see a list of links printed as either:

Ok LINK: <url> -> <status code>
DEAD LINK: <url> -> <status code>
Error: <url> -> <error message>

🧠 How It Works

Starts with a base URL.
Downloads and parses HTML to extract all <a href="..."> links.
Converts all links to absolute URLs and classifies them as:
- Internal: Same domain as the base
- External: Different domain
Internal links are recursively crawled.
Each link is fetched with an HTTP GET:
- 200–399: Considered OK
- 400+: Marked as dead
All crawled URLs are tracked to avoid re-checking or infinite loops.
Concurrency (via goroutines + waitgroups) is used to parallelize the crawling.

📁 File Structure

.
├── main.go         # Main Go source file
├── go.mod          # Go module file
└── README.md       # Project documentation

🧪 Sample Output

Ok LINK: https://scrape-me.dreamsofcode.io -> 200
Ok LINK: https://youtube.com/@dreamsofcode -> 200
Ok LINK: https://scrape-me.dreamsofcode.io/nirvana -> 200
Ok LINK: https://scrape-me.dreamsofcode.io/about -> 200
DEAD LINK: https://scrape-me.dreamsofcode.io/nevermind -> 404
DEAD LINK: https://scrape-me.dreamsofcode.io/in-utero -> 404
Ok LINK: https://scrape-me.dreamsofcode.io/anime?name=bleach -> 200
Ok LINK: https://scrape-me.dreamsofcode.io/anime?name=Jujutsu%20kaizen -> 200
Ok LINK: https://scrape-me.dreamsofcode.io/naruto -> 200
DEAD LINK: https://scrape-me.dreamsofcode.io/teapot -> 418
DEAD LINK: https://scrape-me.dreamsofcode.io/busted -> 401
DEAD LINK: https://scrape-me.dreamsofcode.io/mars -> 404
DEAD LINK: https://scrape-me.dreamsofcode.io/venus -> 404
Error: http://10.255.255.1 -> Get "http://10.255.255.1": dial tcp 10.255.255.1:80: connect: no route to host

✏️ Customizing the Start URL

To crawl a different website, just replace the value of startURL in main():

func main() {
	startURL := "https://yourwebsite.com"
	...
}

⚠️ Note:

Works best with static (non-JavaScript rendered) websites.
Respects the current domain only — external links are checked, not crawled.

📖 Technologies Used

Go (Golang)
golang.org/x/net/html for HTML parsing
net/http, url, sync — Go standard libraries

💡 Ideas for Future Improvements

Export dead links to a .txt or .csv file
Add depth limiting
Add support for robots.txt
Rate limiting to avoid overwhelming servers
CLI flag support (flag or cobra)

📄 License

This project is licensed under the MIT License.
Feel free to use, modify, and distribute!

🙌 Acknowledgements

Inspired by real-world site health checks
Test site: scrape-me.dreamsofcode.io

🤝 Contributing

Pull requests and suggestions are welcome.
Please open an issue first to discuss what you would like to change.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeadLink-Crawler 🕷️

📌 Features

🛠️ Requirements

📦 Installation

🚀 Running the Project

🧠 How It Works

📁 File Structure

🧪 Sample Output

✏️ Customizing the Start URL

📖 Technologies Used

💡 Ideas for Future Improvements

📄 License

🙌 Acknowledgements

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

reno4705/DeadLink-Crawler

Folders and files

Latest commit

History

Repository files navigation

DeadLink-Crawler 🕷️

📌 Features

🛠️ Requirements

📦 Installation

🚀 Running the Project

🧠 How It Works

📁 File Structure

🧪 Sample Output

✏️ Customizing the Start URL

📖 Technologies Used

💡 Ideas for Future Improvements

📄 License

🙌 Acknowledgements

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages