📦 Amazon Scraper

With Playwright Async, Captcha Handling & Fingerprint Spoofing

📑 Features

✅ Scrapes up to 6000 products/hour
✅ Asynchronous scraping via Playwright
✅ Multiprocessing for throttling and true parallel processing
✅ Captcha detection and auto-solving via amazon-captcha
✅ Browser fingerprint spoofing (user agent, timezone, geolocation, hardware concurrency, etc.)
✅ Internet connection checks before each scrape
✅ Optional CPU usage limits
✅ Real-time progress tracking via tqdm
✅ Graceful error handling and logging

📄 Data Fields Scraped

For each product page, this scraper extracts:

ASIN
Brand Name
Status
Product Title
Price
MRP
Rating
Number of Reviews
Browse Node
Availability Status
Product Description
Bullet Point Features
Seller Name
Image URLs
Product URL
Store Link

📌 You can easily adjust fields inside your scrape_page() function.

⚙️ Performance Tuning (Important)

Configuration values:

browser_size: int = 1       # Max ASINs per browser instance
max_tabs: int = 1           # Max tabs per browser
max_browser: int = 50       # Max concurrent browser processes
headless: bool = True       # Run browsers in headless mode

Notes:

More browsers → more RAM → lower detection
More tabs → less RAM → higher throughput → higher detection risk
"Detection" means captchas to solve

🛠️ Requirements

Python 3.12+

Install dependencies:

pip install -r requirements.txt
playwright install

🚀 Run the Scraper

python main.py --input './your_file.xlsx'

📜 License

This project is licensed under the Apache-2.0 license.

💬 Feedback

⭐ Star the repo or open an issue if this helped you!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📦 Amazon Scraper

With Playwright Async, Captcha Handling & Fingerprint Spoofing

📑 Features

📄 Data Fields Scraped

⚙️ Performance Tuning (Important)

🛠️ Requirements

🚀 Run the Scraper

📜 License

💬 Feedback

About

Uh oh!

Releases

Packages

Languages

License

Djinn-Djarin/Amazon-Scraper-Multiprocessing

Folders and files

Latest commit

History

Repository files navigation

📦 Amazon Scraper

With Playwright Async, Captcha Handling & Fingerprint Spoofing

📑 Features

📄 Data Fields Scraped

⚙️ Performance Tuning (Important)

🛠️ Requirements

🚀 Run the Scraper

📜 License

💬 Feedback

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages