A powerful tool for finding sensitive files from Internet Archive's Wayback Machine (archive.org)
WAFFER is a multi-threaded tool designed to search and retrieve files from the Wayback Machine's extensive archive of over 916 billion web pages. It leverages archive.org's API to discover potentially sensitive files that were archived over time.
- 🚀 Powered by Wayback Machine - Searches through archive.org's massive historical database
- 🔄 Multi-threaded scanning for faster archive searching
- 🎯 Smart file filtering with multiple extension sets:
- Default sensitive files
- Extended file types
- Custom extensions
- No filtering option
- ⏱️ Rate limiting to respect archive.org's servers
- 📝 Detailed logging with verbose mode
- 🎨 Colored output for better readability
- Clone the repository:
git clone https://github.com/raihanadiarba/waffer.git
cd waffer- Install dependencies:
pip install colorama requestsBasic syntax:
python3 waffer.py [-u URL | -l FILE] [-e {default,all,custom,none}] [-c EXTENSIONS] [-t THREADS] [-o OUTPUT] [-v] [-ts DELAY]| Argument | Description |
|---|---|
-u, --url |
Target domain to search in Wayback Machine |
-l, --list |
File containing list of URLs to search |
-e, --extension |
Extension type to search (default/all/custom/none) |
-c, --custom |
Custom extensions (comma-separated) |
-t, --threads |
Number of concurrent threads (default: 10) |
-o, --output |
Save results to file |
-v, --verbose |
Show detailed progress |
-ts, --time-sec |
Delay between requests to archive.org |
- Search domain in Wayback Machine:
python3 waffer.py -u example.com- Search with all extensions and verbose output:
python3 waffer.py -u example.com -e all -v- Custom file types with output file:
python3 waffer.py -u example.com -e custom -c .pdf,.doc,.txt -o results.txt- Scan multiple URLs with rate limiting:
python3 waffer.py -l urls.txt -ts 1Common sensitive files including:
- Documents (.pdf, .doc, .docx)
- Data files (.xml, .json, .sql)
- Archives (.zip, .tar.gz, .7z)
- Configuration (.yml, .config, .ini)
- Security files (.key, .pem, .crt)
Extended set including:
- Web configuration (.env, .htaccess)
- Backup files (-BACKUP-, .bak)
- Development (.git, .svn)
- Database files (.sql, .sqlite)
- And many more...
After the tool finds URLs, you can access the archived files using these methods:
-
Direct Wayback Machine Access:
- Take the found URL:
http://example.com/file.pdf - Visit:
https://web.archive.org/web/*/http://example.com/file.pdf - Select the snapshot date you want to view
- Take the found URL:
-
Automated URL Format:
- Format:
https://web.archive.org/web/<timestamp>/<url> - Example:
https://web.archive.org/web/20230101000000/http://example.com/file.pdf
- Format:
-
Latest Snapshot:
- Use:
https://web.archive.org/web/2/http://example.com/file.pdf - This automatically redirects to the most recent archive
- Use:
-
First Snapshot:
- Use:
https://web.archive.org/web/0/http://example.com/file.pdf - This shows the oldest archived version
- Use:
This tool is designed for security research and should be used responsibly. Always:
- Respect archive.org's terms of service
- Use appropriate delays between requests
- Only scan domains you have permission to test
Contributions are welcome! Please feel free to submit issues and pull requests.
This tool was inspired by @coffinxp. Thanks to them for the great idea!