-
Objective:
- Build a PDF extractor to pull relevant details from CVs in PDF format, and match them against the job descriptions from the Hugging Face dataset.
-
Dataset Used
-
PDF Extractor(
01_pdf-data-extraction.ipynb)-
PDF Extraction was done using
pdfplumberlibrary -
Python
regexwas used to extractSkillsandEducationSections from the extracted text. -
Finally the relevent information was stores in a csv file
pdf_extracted_skills_education.csv -
Challenges Faced: Using
regexI could extract theseSkillsandEducationpart. But it was hard to generalise this over resumes of different format. So, more research will be needed to efficently extract these. ExtractingExperiencewas really tough, I couldn't think of anyregexthat could extractCompany_Name, Start_Date, End_Datewithin multiple headers(i.e., people with multiple experiences) -
Propesed Solution: One thing that I know is training Custom Named Entity Recogniton(NER) but for this we need a custom tagged dataset for skills, Education, and Experience.
-
-
CV-JD Matching(
02_cv-jd-matching.ipynb)- Got the JD dataset from hugging face
datasetslibrary. 15 JDs from the dataset were selected for this project. - Basic text cleaning like lower_case, removing punctuations/emails/phone_numbers was done on the extracted resumes.
- Tokenization and Embeddings for JDs & CVs were created using
DistilBertTokenizer, DistilBertModelfromtransformerslibrary. - For matching CV-JD,
cosine_similaritywas used fromsklearnlibrary. - Finally, Top-5 Candidates were extrated for the respective Job Descriptions, acoording to the respective similarity score.
- Got the JD dataset from hugging face
-
Overall Challenges Faced:
- This was my first time working with
PyTorchandtransformerslibrary - More than Modelling, Extracting the neccessary data is more tough as mentioned above in PDF Extractor.
- This was my first time working with
-
Notifications
You must be signed in to change notification settings - Fork 2
Extracting details from Resume(CVs) and matching with Job Description(JDs) using pretrained model like DistilBERT and ranking them using cosine similarity.
avr2002/CV-JD-Matching
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
Extracting details from Resume(CVs) and matching with Job Description(JDs) using pretrained model like DistilBERT and ranking them using cosine similarity.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published