Updating data_acquisition.py to post to job_server#220
Updating data_acquisition.py to post to job_server#220frostyshadows wants to merge 4 commits intodevelopfrom
Conversation
cowmanjoe
left a comment
There was a problem hiding this comment.
Looks good! Left a comment. Also, I think it would be good if there was a log message indicating how many jobs were inserted.
One concern I have that you can't really address here is that the ZipRecruiter jobs seem to actually come back with different URLS every time for the same job. It appears they are tagging some kind of unique ID in the URL, maybe because they want to count the number of clicks from that link? The reason this is a concern is it messes up our idea for not allowing duplicate jobs in with the unique link index. I'm not sure how we get around this, maybe some analysis on the other fields. Anyway, it's out of the scope of this PR.
data_acquisition/data_acquisition.py
Outdated
| "longitude": 0.0, | ||
| "company_name": job["hiring_company"]["name"], | ||
| "start_date": None, | ||
| "salary_min": job["salary_min"] |
There was a problem hiding this comment.
I think we should use salary_min_annual here because salary_min can be hourly or monthly I believe.
🔨 Changes
Instead of adding jobs directly to the database, the script makes the POST request to job server for the job server to add them. This way we can run the script on AWS Lambda.
:squirrel: Testing instructions
Have job server running on localhost:5000. Run the script (
pipenv run python3 data_acquisition.py) and make sure jobs are added to your local database.📄 Relevant screenshots or documentation links
📋 Checklist