The Job Postings Web Scraping project involves extracting more than 10,000 job postings from the Instahyre website using Selenium, a powerful web scraping library. The extracted data is then processed and cleaned using Pandas, resulting in a structured and clean dataset ready for further analysis.
- Selenium: For web scraping to automate the extraction of job postings.
- Pandas: For data manipulation and cleaning.
- Installed and configured Selenium, including the appropriate WebDriver for the browser.
- Wrote scripts to navigate the Instahyre website, locate job postings, and extract relevant information such as job titles, companies, locations, and descriptions.
- Developed a scraping script to iteratively collect data from multiple pages.
- Ensured efficient and respectful scraping by implementing appropriate delays between requests to avoid overloading the server.
- Stored the extracted data in a structured format using a Pandas DataFrame.
- Saved interim data to CSV files to ensure data persistence in case of interruptions.
- Loaded the extracted data into Pandas for cleaning.
- Performed data cleaning tasks including:
- Removing duplicates.
- Handling missing values.
- Standardizing formats (e.g., date formats, text casing).
- Validated the data to ensure consistency and accuracy.
- Conducted preliminary analysis to gain insights into the job postings.
- Explored key metrics such as the number of job postings by company, location, and job title.
scraping/: Contains the Selenium scripts for web scraping.data/: Stores raw and cleaned data files.notebooks/: Includes any Jupyter notebooks used for data cleaning and analysis.README.md: Project description and overview.
- Clone the repository to your local machine.
- Run the python file selenium craping script located in the
Web Scraping Project 1.0/folder: - Load the extracted data into a Pandas DataFrame and perform any additional data cleaning as needed.
This project successfully demonstrates the use of Selenium for web scraping and Pandas for data cleaning. By extracting a substantial number of job postings from Instahyre and processing them into a clean dataset, this project provides a valuable resource for job market analysis and insights.
Feel free to contribute to this project by suggesting improvements, reporting issues, or submitting pull requests. Happy scraping!