Proteus: A Deterministic Entropy Benchmark for Scraper Testing #3317
copyleftdev
started this conversation in
Show and tell
Replies: 1 comment
-
|
Pretty cool! Couple of questions:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone! 👋
I built Proteus, a specialized benchmark target to be a "Crash Test Dummy" for scrapers and anti-bot systems.
Unlike standard "scraper test sites" that are static, Proteus implements a Deterministic Entropy Engine. By modifying URL parameters, you can control the exact layout chaos:
?seed=123: Guarantees 100% reproducible DOM structure (great for regression testing).?entropy=1.0: Maximizes DOM mutation (shuffled nodes, class injection, text obfuscation).Why use it with Crawlee?
It answers the question: "Is my scraper breaking because of a real anti-bot update, or just brittle selectors?"
I also generated a
test_cases.jsonsuite active on the repo if you want to plug it into your CI:Proteus Test Suite (Gist)
Would love feedback from the community on what other mutation patterns break your crawlers! 🕷️
Beta Was this translation helpful? Give feedback.
All reactions