The main motivation behind building rubrix was to have a visual search engine completely powered by Artificial Intelligence, tying concepts within the fields of Natural Language Processing and Computer Vision, something we like to call "combined similarity search". Currently rubrix has two main functionalities:
- take in a user input describing an image and retrieve five images that fit that description (image search)
- take in a user uploaded image and retrieve five similar images (reverse-image search)
Please click here to know more details about the architecture and how rubrix works!
You can check out some of the images retrieved by rubrix for sample queries here.
This section describes the preqrequisites, and contains instructions, to get the project up and running.
Currently, rubrix works flawlessly on Linux, and can be set up easily with all the prerequisite packages by following these instructions:
-
Download appropriate version of conda for your machine.
-
Install it by running the
conda_install.shfile, with the command:$ bash conda_install.sh
-
Add
condato bash profile:$ source ~/.bashrc
-
Navigate to
rubrix/(top-level directory) and create a conda virtual environment with the includedenvironment.ymlfile using the following command:$ conda env create -f environment.yml
-
Activate the virtual environment with the following command:
$ conda activate rubrix
-
To install the package with setuptools extras, use the following command in
rubrix/(top-level directory) containing thesetup.pyfile:$ pip install .
Once the prerequisites have been installed, follow these instructions to build the project:
-
Navigate to
rubrix/indexdirectory. -
Run the bash script
setup.shwith the following command:$ bash setup.sh
What does this do?
- Downloads flickr8k image/captions dataset.
- Builds and sets up
darknet/withinrubrix/indexto enable object detection with YOLOv4. - Creates
assets/index.jsonfile, which essentially is an inverse-image index mapping all the objects YOLOv4 was trained on, to the images containing them. - Creates
assets/imageEmbeddingLocations.jsonfile, which essentially maps all the images in the database to the sentence embedding vectors generated for each of the captions in the database. - Generates feature vectors describing all the images in the database and save it to
assets/descriptorsdirectory.
NOTE: The above script can take between 1.5 - 2 hours to complete execution.
- Download data assets from this link.
- Unzip and save the contents in
rubrix/assets. - All is left is to change the paths in
rubrix/assets/index.jsonandrubrix/assets/imageEmbeddingLocations.jsonrelative to the local machine. This can be done as follows:- Ensure corresponding virtual environment is active, or activate with the following command:
$ conda activate rubrix
- Launch Python Interpretor in the terminal and run the following code snippet:
>>> from rubrix.utils import fix_paths_in_index >>> path_to_index = <absolute/path/to/rubrix/assets/index.json> >>> path_to_emb = <absolute/path/to/rubrix/assets/imageEmbeddingLocations.json> >>> fix_paths_in_index(path_to_index, path_to_emb)
- Ensure corresponding virtual environment is active, or activate with the following command:
- Navigate to
rubrix/rubrix/indexdirectory and run the following bash script:$ bash quick_setup.sh
With the completion of these steps, you should be able to use rubrix.
- For image search, execute the
rubrix/query/query_by_textmethod. - For reverse image search, execute the
rubrix/query/query_by_image_objectsmethod.
You can also follow a working example for this here.
An alternative is to use rubrix as an application on web browser.
- Navigate to
rubrix/rubrix/webdirectory. - Enter the following command in the terminal to launch web application:
$ python app.py
This is for if you want to deploy rubrix on a server e.g. an Ubuntu Linux server on AWS
- Navigate to the top directory
- Enter the following command to build the docker image:
$ sudo docker build -t <YOUR-NAME>/rubrix .- You can then run:
$ sudo docker run -p 9000:80 <YOUR-NAME>/rubrixThe ideal setup for this would be to have a Apache/Nginx reverse proxy setup on the host system, pointing to port 9000 in this case, and the host system's Apache/Nginx would handle SSL. This would be so you can deploy the application over and over again without worrying about remaking SSL certificates.
The Dockerfile does not use the environment.yml file because using conda on any sort of production environment is a nightmare. Changes made there will not be reflected in the Dockerized container.
There are no specific guidelines for contributing, apart from a few general guidelines we tried to follow, such as:
- Code should follow PEP8 standards as closely as possible
- We use Google-Style docstrings to document the Python modules in this project.
If you see something that could be improved, send a pull request!
We are always happy to look at improvements, to ensure that rubrix, as a project, is the best version of itself.
If you think something should be done differently (or is just-plain-broken), please create an issue.
See the LICENSE file for more details.
