This repository contains Python-based scrapers for extracting product data from Farfetch. These scrapers use the Crawlbase Crawling API to bypass JavaScript rendering, CAPTCHA challenges, and anti-bot protections, enabling smooth data extraction.
➡ Read the full blog here to learn more.
The Farfetch Search Results Scraper (farfetch_serp_scraper.py) extracts product details from search listings, including:
- Brand Name
- Product Description
- Price
- Discount (if available)
- Product URL
It supports pagination, allowing multiple search results pages to be scraped. The extracted data is saved in a CSV file.
The Farfetch Product Page Scraper (farfetch_product_page_scraper.py) extracts product details from individual product pages, including:
- Product Blurb
- Brand Name
- Price
- Full Product Description
This scraper takes product URLs from the search listings scraper and extracts product details, saving the data in a CSV file.
Ensure that Python is installed on your system. Check the version using:
# Use python3 if you're on Linux/macOS
python --versionInstall the required dependencies:
pip install crawlbase beautifulsoup4- Crawlbase – Handles JavaScript rendering and bypasses bot protections.
- BeautifulSoup – Parses and extracts structured data from HTML.
- Sign up for Crawlbase here to get an API token.
- Use the JS token for Farfetch scraping, as the site relies on JavaScript-rendered content.
This scraper extracts product listings and saves them in farfetch_listings.csv:
# Use python3 if required (for Linux/macOS)
python farfetch_serp_scraper.pyOnce you have the search results, extract detailed product information using:
python farfetch_product_page_scraper.pyThis will fetch and save product details in farfetch_product_details.csv.
- Add more product details (e.g., sizes, materials, colors).
- Support JSON output in addition to CSV.
- Improve pagination to handle dynamic page numbers.
- Add better error handling and retries for failed requests.
- Bypasses anti-bot protections using Crawlbase.
- Handles JavaScript-rendered content efficiently.
- Extracts structured product data in CSV format for easy analysis.
- Supports pagination to scrape multiple search result pages.