Skip to content

Handling Irrelevant Results in CLIP-Based Image Retrieval When No Match Exists #39

@spikejones1040

Description

@spikejones1040

Hi,

I am using CLIP-based embeddings for image retrieval and encountering an issue where irrelevant images are retrieved when no relevant images exist in the database. I have computed CLIP image embeddings for ~6000 images in my database. For retrieval, I compute text embeddings for the query and perform a cosine similarity search against image embeddings. This approach generally works well when relevant images are present in the dataset. However, when no relevant images exist (e.g., querying "badminton" when there are no badminton-related images in the dataset), CLIP still returns results with seemingly high cosine similarity but low actual relevance. I have tried thresholding on similarity scores, but it does not fully resolve the issue.

Even with a reasonable cosine similarity threshold, I still see false positives—retrieved images that are semantically unrelated to the query. Lowering the threshold too much reduces recall and prevents retrieval of relevant images when they do exist.

For text embeddings, I am using

'M-CLIP/XLM-Roberta-Large-Vit-B-16Plus'

For image embeddings, I am using https://huggingface.co/M-CLIP/XLM-Roberta-Large-Vit-B-16Plus
model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-16-plus-240', pretrained="laion400m_e32")

What is the best way to handle false positives in CLIP retrieval when no relevant images exist?
Are there recommended techniques to detect or filter out such cases?
Are there research papers, prior discussions, or known best practices addressing this issue in CLIP retrieval?
Any insights, references, or sample implementations would be greatly appreciated!

Thanks in advance for your help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions