This project is an AI-driven recruitment tool designed to scrape and analyze GitHub and Google Scholar profiles to identify top candidates for AI and ML roles. The tool classifies input queries, fetches profile details, and ranks candidates based on their GitHub repositories and Google Scholar citations.
graph TD
subgraph Query Classifier
A1[query_classifier.py] --> B1[scangit.py]
A1 --> C1[scangs.py]
A1 --> D1[filter.py]
end
subgraph GitHub Scorer
B1 --> B2[github.py]
B2 --> B3[UserGitHubDetails]
end
subgraph Google Scholar Scraper
C1 --> C2[googlescholar.py]
end
subgraph Filter Authors
D1 --> D2[prof.py]
D2 --> D3[authors.py]
end
A1 --> B1
A1 --> C1
A1 --> D1
B1 --> B2
B2 --> B3
C1 --> C2
D1 --> D2
D2 --> D3
Defines the Candidate class and methods to calculate scores based on GitHub activity, university, and other criteria.
Fetches GitHub repositories, calculates repository scores, and aggregates them to provide an overall GitHub score for a user.
Contains a dictionary of predefined university scores and methods to calculate university scores for candidates.
Stores professor data, including names, universities, and homepage URLs.
Normalizes the scores of candidates and sorts them based on the overall normalized score.
Fetches data from Google Scholar, calculates relevance scores, and provides citation information for profiles.
Scrapes co-authors from Google Scholar citations and returns a list of potential students and collaborators.
Filters out professors from the list of authors and classifies remaining individuals based on their roles and degree types.
Fetches detailed information for students, including their GitHub and Google Scholar profiles, and creates Candidate objects.
Lists all the required Python libraries for the project, ensuring all dependencies are installed.
To set up the project environment, ensure you have Python installed and then install the required libraries using requirements.txt.
- Clone the repository.
- Create a virtual environment and activate it:
shCopy code
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate - Install the dependencies:
-
shCopy code
pip install -r requirements.txt
-
Extracting Student Details: Run 'app.py' and click on the dynamic link generated to launch the web app.( Can use ctrl+Click on windows and Commans+Click on MacOS on the link to open it in a new tab)
shCopy codepython app.py -
Processing Queries: To classify input queries and fetch top profiles, run the query processing script: app.py This is the most relevant one and should be the only one of the user's concern. It calls other scripts and returns the suitable databse of candidates.
Example usage query: #Please pick one of Boston, California, Seattle, Berkeley. #Sample queries: #"Find top 6 students who have worked on TensorFlow and have a strong GitHub presence in Boston." #"Recruit top 5 students in California who have worked on computer vision projects." #"Find top 8 programmers in Seattle who have worked on GPT-3 and have published papers on NLP." #"Recruit top 3 scholars in Boston ." #top 8 people who have worked in AI labs in Boston.