Issue Collector for Chromium project
Welcome to Chromium issue collection project that provides scripts
- given the query to obtain the list of issues
- given the list of issue ids to extract issue metadata and associated comments for Chromium project from its offical issue tracker.
To collect issues, their metadata and associated comments we used dynamic web scraping tool Selenium.
Using Anaconda consists of the following:
-
Install
anacondaon your computer, by selecting the latest Python version for your operating system. If you already havecondainstalled, you should be able to skip this step and move on to step 2. -
Create and activate * a new
condaenvironment withchromium_issue_collection.ymlfile provided.
These instructions also assume you have git installed for working with Github from a terminal window, but if you do not, you can download that first with the command:
conda install git
Now, we're ready to create our local environment!
- Clone the repository, and navigate to the downloaded folder. This may take a minute or two to clone due to the included image data.
git clone <github project webaddress>.git
cd chromium-issue-collection
- Navigate to
src/folder and change<username>with your current user name inchromium_issue_collection.ymlfile
prefix: /Users/<username>/opt/anaconda3/envs/chromium_issue_collection
-
Create (and activate) a new environment, named
chromium_issue_collectionwith Python 3.8. If prompted to proceed with the install(Proceed [y]/n)type y.- Linux or Mac:
conda env create -f chromium_issue_collection.yml conda activate chromium_issue_collection- Windows:
conda env create --name chromiumIssueCollection --file=chromium_issue_collection.yml activate chromiumIssueCollectionFor conda cheatsheet: https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf
-
Download Chrome Driver from here and add path to ChromeDriver to your
PATH -
Customize
scraper.pyscript if you want to add a new query or change an existing one.
To add a new query, you need to append a new key and values into queries dictionary:
'<key>': {
'explanation' : '<explanation of the query>',
'project' : '<project name>',
'urlbase' : '<base url>',
'headers' : {
'<key>': <list of columns>,
},
'output_filename' : '<output csv file name>'
}
- Customize
run_scraper.pyscript by changing function calls in__main__function.
To run scraper, navigate to src folder and run the script.
cd path/to/src/
python run_scraper.py