pyweb is a command-line web scraper engineered for one purpose: to be the fastest Python web scraper in existence, achieving sub-millisecond latency. It leverages a hyper-optimized, asynchronous Rust core built on tokio and a fine-tuned reqwest client using rustls to eliminate C FFI overhead and explicitly forcing HTTP/1.1 for minimal connection latency. The code is further micro-optimized to eliminate all unnecessary allocations in the hot path. Performance is further enhanced with the mimalloc high-performance memory allocator, native CPU-specific compiler optimizations, Profile-Guided Optimization (PGO), and the io_uring asynchronous I/O interface on Linux.
pyweb is definitively the fastest Python web scraper. The final benchmark, scraping 100 pages from a local aiohttp server, was conducted after applying advanced OS-level network tuning (tcp_tw_reuse, tcp_fin_timeout) to minimize TCP connection overhead. The results below compare pyweb against the best-in-class pure-Python async solution (httpx + selectolax).
| Metric | pyweb (hyper-tuned async Rust) | httpx+selectolax |
|---|---|---|
| Total Time | 0.0659 seconds | 0.1846 seconds |
| Average Latency | 13.46 ms | 92.36 ms |
| Jitter (Std Dev) | 2.23 ms | 2.48 ms |
| Requests > 50ms Threshold | 0 (0.00%) | 100 (100.00%) |
pyweb is ~2.8x faster in total execution time and achieves ~6.86x lower average latency compared to its closest competitor. This is a direct result of a holistic optimization strategy, spanning the application code, compiler, memory allocator, I/O subsystem, TLS implementation, HTTP protocol, and the underlying operating system.
pip install pyweb-scraperpyweb scrape [OPTIONS] [URLS]...Options:
-s, --selector TEXT: CSS selector to extract specific elements.-o, --output [json|text]: Output format.-c, --concurrency INTEGER: Number of concurrent requests.--help: Show this message and exit.
Example:
pyweb scrape "http://books.toscrape.com" -s "h3 > a" -c 200