API Documentation

Overview

This platform allows users to store or update their job application data, receive job applicant recommendations based on job descriptions, and delete their stored data. The platform leverages a BERT-based text processing pipeline and Milvus for efficient storage and retrieval of embeddings.

Python Package Requirements:

Flask: To create the API endpoints.
pandas: For handling and manipulating data.
torch: To run the BERT model from HuggingFace (PyTorch is required).
transformers: For loading and using the pre-trained BERT model.
pymilvus: To connect and interact with the Milvus vector database.
scikit-learn: For using the pipeline, base estimators, and transformers.
streamlit: For building an interactive web application
warnings: For suppressing warnings (comes with Python by default, so no need to install).

Key Components:

1. Database Connection

For the project, I used Zilliz Cloud to host the Milvus vector database. By utilizing this cloud-based solution, the vector database remains accessible to the website during deployment, ensuring seamless integration and real-time access to data. Zilliz Cloud was chosen due to its ability to handle large-scale data, making it ideal for a scalable job-matching platform. Milvus is specifically designed to store and search through millions of vector embeddings efficiently, allowing the platform to accommodate a growing number of users and recommendations without sacrificing performance.

By hosting Milvus on Zilliz Cloud, the project benefits from:

Scalability: As the user base grows, Milvus can manage millions of vector embeddings while maintaining quick response times during searches and queries.
Cloud Accessibility: Hosting on Zilliz Cloud ensures that the vector database is accessible from any location, enabling efficient real-time interaction from the deployed website.
High Availability and Maintenance: Leveraging a managed cloud service like Zilliz ensures that the database is highly available, with automatic updates and maintenance that reduce the operational burden.

2. Text Cleaning Functions

The text cleaning functions perform basic preprocessing to prepare the text for embedding.

lower_transform(text): Converts all text to lowercase to maintain consistency during tokenization.
remove_excess_whitespace(text): Strips leading/trailing whitespace and removes extra spaces between words.

3. BERT Embedding Transformer

A custom transformer is created by subclassing BaseEstimator and TransformerMixin to integrate BERT embeddings into the scikit-learn pipeline.

BertEmbeddingTransformer: This class is responsible for:
- Loading the tokenizer and model using Hugging Face's transformers library.
- Generating token embeddings for input text.
- Applying mean pooling to convert token embeddings into sentence embeddings.
- Normalizing the sentence embeddings using L2 normalization.

Key Points:

Mean Pooling: Converts token-level embeddings into a single vector representing the entire sentence.
Normalization: Ensures embeddings are unit vectors (L2 normalization).

4. Pipeline Creation

A scikit-learn pipeline is built using the following components:

Lowercase Transformer: Converts the input text to lowercase.
Whitespace Transformer: Removes excess whitespace from the text.
BERT Embedding Transformer: Generates sentence embeddings from the cleaned text.

Key Steps in the Pipeline:

Lowercase Transformation: Converts all text to lowercase for uniform tokenization.
Whitespace Removal: Strips unnecessary spaces and normalizes spacing within the text.
BERT Embedding Generation: Generates meaningful embeddings by:
- Tokenizing the input text.
- Generating token embeddings using a pre-trained BERT model.
- Applying mean pooling and normalizing the embeddings.

Benefits:

Efficient: All transformations, from text preprocessing to embedding generation, are handled within a single pipeline.

API Documentation

This API allows users to store, update, retrieve recommendations, and delete job-related data using email and description fields. All data should be passed via JSON in the request body.

Flask API Endpoints

1. Store Date (`/store`)

URL: /store
Method: POST
Description:Store a user's email and description into the database, along with its embedding.

Request Body:

{
  "email": "user@example.com",
  "description": "Experienced data scientist skilled in machine learning."
}

Success Response:

Code: 200 OK
Content:
```
{
"status":"stored"
}
```

Error Responses:

Code: 400 Bad Request

Content:

{
 "error": "Input JSON must contain 'email' and 'description' fields"
}

Code: 501 Internal Server Error
Content:
```
{
"error": "error message" 
}
```

2. Get Recommendations (`/get_recommendation`)

URL: /get_recommendation
Method: POST
Description:Retrieve a list of recommended applicants based on the provided job description.

Request Body:

{
  "description": "Job description"
}

Success Response:

Code: 200 OK

Content:

 {
  "email": "user@example.com",
  "description": "Description"
}

Error Responses:

Code: 400 Bad Request

Content:

{
"error": "Input JSON must contain 'description' field"
}

Code: 500 Internal Server Error
Content:
```
{
"error": "error message" 
}
```

3. Update Data(`/update`)

URL: /get_recommendation
Method: POST
Description:Update a user's description and embeddings in the database.

Request Body:

{
  "email": "user@example.com",
  "description": "Updated description"
}

Success Response:

Code: 200 OK
Content:
```
 {
   "status": "updated"

}
```

Error Responses:

Code: 400 Bad Request

Content:

{
"error": "Input JSON must contain 'email' and 'description' fields"
}

Code: 500 Internal Server Error
Content:
```
{
"error": "error message" 
}
```

4. Delete Data (`/delete`)

URL: /get_recommendation
Method: POST
Description:Delete a user's data based on their email.

Request Body:

{
  "email": "user@example.com",
}

Success Response:

Code: 200 OK
Content:
```
 {
   "status": "updated"

}
```

Error Responses:

Code: 400 Bad Request

Content:

{
"error": "Input JSON must contain 'email'  field"
}

Code: 500 Internal Server Error

Content:

{
"error": "No data found for the given email."
}

Code: 501 Internal Server Error
Content:
```
{
"error": "error message"
}
```
Error Handling
400 Bad Request: Indicates that the input data is missing required fields.
500/501 Server Error: Indicates server-side errors, such as failing to retrieve data or perform actions like insertion, deletion, etc

Notes

All data must be sent in the request body in JSON format.
Email is the primary identifier used to manage stored data.
Embedding-based search is used for recommendations and data storage.
Ensure the server URL is correctly specified when calling the API.
I used post request method only to ensure that no personal data will be sent via URL

Streamlit App

1. Store/Update Data

Description: Users can enter their job-related skills or description along with their email. The system checks if the email exists in the database (Milvus), and either updates or inserts the data accordingly.
Function used:'get_description' function
Input Fields:
- Description: The user's job-related description (skills and experience).
- Email: The user's email address.
Function flow: The get_description function processes the input description, checks whether data is stored along with the email address, generates embeddings, and stores them in Milvus. If the email already exists, the system updates the existing data.
UI Element:
- The input form is in an expandable section with two fields one for email and the other for description.
- A button triggers the "Store My Data" action which calls get_description function.
- Output: it returns whether the data has been updated or stored

2. Get Applicant Recommendations

Description: Users enter a job description to find applicants that match the description.
Function used:get_recommendation function
Input Fields:
- Job Description: A description of the job for which you are searching for matching applicants.
Function: The function generates the embeddings to compare the job description with stored applicant data and returns the top three recommendations based on similarity.
UI Element:
- The input form for the job description is in an expandable section with one field to input job description .
- A button triggers the "Get Recommendations" action.
Output: The matching applicants' emails and descriptions are displayed.

3. Delete My Data

Description: Users can delete their stored data from the platform using their email.
Function used: delete_data
Input Fields:
- Email: The email corresponding to the data you wish to delete.
Function: The used function searches for the email in the Milvus collection and deletes the corresponding record if it exists.
UI Element:
- The input form is in an expandable section with one field to enter the email address.
- A button triggers the "Delete My Data" action.
Output:it returns whether the data has been deleted or no data found for the given email

Note

all features check whether all required input field has data if do so it calls the function and if not it shows a message to the user to enter all fields

Streamlit App link

https://mak5fbaasm3o3axfgmkoss.streamlit.app/

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
Flask_app.py		Flask_app.py
LICENSE		LICENSE
README.md		README.md
create_db.py		create_db.py
model.py		model.py
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py
test_data		test_data
test_data_cleaning.ipynb		test_data_cleaning.ipynb

Folders and files

Latest commit

History

Repository files navigation

Overview

Python Package Requirements:

Key Components:

1. Database Connection

2. Text Cleaning Functions

3. BERT Embedding Transformer

4. Pipeline Creation

Key Steps in the Pipeline:

Benefits:

API Documentation

Flask API Endpoints

1. Store Date (/store)

Request Body:

Success Response:

Error Responses:

2. Get Recommendations (/get_recommendation)

Request Body:

Success Response:

Error Responses:

3. Update Data(/update)

Request Body:

Success Response:

Error Responses:

4. Delete Data (/delete)

Request Body:

Success Response:

Error Responses:

Error Handling

Notes

Streamlit App

1. Store/Update Data

2. Get Applicant Recommendations

3. Delete My Data

Note

Streamlit App link

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. Store Date (`/store`)

2. Get Recommendations (`/get_recommendation`)

3. Update Data(`/update`)

4. Delete Data (`/delete`)

Packages