Skip to content

update scraper to refelct website changes march 2022#6

Open
haydnjm wants to merge 1 commit intokhpeek:masterfrom
haydnjm:fix-scraper-march-2022
Open

update scraper to refelct website changes march 2022#6
haydnjm wants to merge 1 commit intokhpeek:masterfrom
haydnjm:fix-scraper-march-2022

Conversation

@haydnjm
Copy link

@haydnjm haydnjm commented Mar 12, 2022

I ran the scraper and there were a few things that weren't working any more because the website has changed (and also maybe because of the version of scrapy)

These changes fix it so that it works for almost all houses. There are a couple of exceptions but for what I was using it there wasn't much need fix all of the exception cases. Hope this helps!

@Raamkonijn
Copy link

Thank you for fixing it, but you've hardcoded the place argument in funda_spider.py line 11+12.
I've replaced lines 11-13 with the old code and everything seems to work again.

self.start_urls = ["http://www.funda.nl/koop/%s/p%s/" % (place, page_number) for page_number in range(1,301)]
self.base_url = "http://www.funda.nl/koop/%s/" % place
self.le1 = LinkExtractor(allow=r'%s+(huis|appartement)-\d{8}' % self.base_url)
start_urls = ["https://www.funda.nl/koop/amsterdam/p%s/" % (page_number) for page_number in range(500)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
start_urls = ["https://www.funda.nl/koop/amsterdam/p%s/" % (page_number) for page_number in range(500)]
def __init__(self, place= 'amsterdam') -> None:
self.start_urls = [f"https://www.funda.nl/koop/{place}/p{page_number}/" for page_number in range(500)]
self.base_url = f"https://www.funda.nl/koop/{place}/"
self.le1 = LinkExtractor(allow=r'%s+(huis|appartement)-\d{8}' % self.base_url)

replace % string with f-string and make place a variable again.
This should still work

@jellevankerk jellevankerk mentioned this pull request Jun 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants