jobs_scraper is a simple job postings scraper for the website Indeed, it is written in Python and is based on the requests and BeautifulSoup libraries.
Run the following to install the package:
pip install jobs_scraperTo use jobs_scraper you need to create a new JobsScraper object and provide the following attributes to its constructor:
country: prefix country.position: job position.location: job location.pages: number of pages to be scraped.
from jobs_scraper import JobsScraper
# Let's create a new JobsScraper object and perform the scraping for a given query.
scraper = JobsScraper(country="nl", position="Data Engineer", location="Amsterdam", pages=3)
df = scraper.scrape()In this way, the first three pages for the example query "Data Engineer" based in "Amsterdam" on the Dutch version of the portal Indeed get scraped.
The scrape method returns a Pandas dataframe, therefore it is possible to export it into a csv file.
-
max_delay: bearing in mind that this package is meant only for educational purposes, a delay in the requests can be provided. By settingmax_delayin the constructor, every job posting will be randomly scraped in an interval between0andmax_delayseconds.scraper = JobsScraper(country="...", position="...", location="...", pages=..., max_delay=5)
-
full_urls: since most of the scraped job urls are pretty long, the returned Pandas dataframe will truncate them, making it not simple to access. Settingfull_urlstoTrue, the scraped urls will not be truncated.scraper = JobsScraper(country="...", position="...", location="...", pages=..., full_urls=True)
- Add rotating proxies to prevent the scraper from being blocked once too many requests are sent.