Current multimodal embedding models are widely used for image-to-image and text-to-image retrieval, but their global embeddings often miss the fine-grained cues needed for challenging retrieval tasks. QuARI tackles this by learning a query-specific linear projection of a frozen backbone embedding space. A transformer hypernetwork maps each query to both an adapted query embedding and a low-rank projection matrix that is applied to all gallery embeddings, making the adaptation cheap enough to run over millions of items. Trained with a symmetric contrastive loss and additional “semi-positive” neighbors, QuARI emphasizes subspaces that are relevant to the current query while down-weighting irrelevant directions. Experiments on ILIAS and INQUIRE show that this simple query-conditioned adaptation consistently outperforms strong baselines, including static task-adapted encoders and heavyweight re-rankers, while remaining highly efficient at inference time.
conda env create -f env.yml
conda activate vis-langSet the appropriate download directory downloading/setup_download.sh
bash download_cc12m.sh
bash download_coco.sh
python cocototar.py \
--images-dir /path/to/coco/images \
--captions-json /path/to/coco/captions \
--out-tar /path/to/output/tarfilepython precompute_embeddings.py \
--extractor openai/clip-vit-base-patch32 \
--output_path ./precomputed/train_chunks \
--image_dir ./data/images \
--tar_regex '.*\.tar$' \
--chunk_size 50000
python mine_semipositives.py \
--embeddings_path ./precomputed/train_embeds.pt \
--output_path ./semipositives/train_semipos.pt \
--k 100 \
--top_n 2python train.py \
--json_path ./data/train.json \
--image_dir ./data/images \
--extractor openai/clip-vit-base-patch32 \
--use_precomputed \
--precomputed_dir ./precomputed \
--train_semipositives_path ./semipositives/train_semipos.pt \
--batch_size 512 \
--max_epochs 10 \
--freeze_extractors \
--output_dir ./outputspython eval_retrieval.py \
--embeddings_dir ./precomputed/val \
--checkpoint_path ./outputs/checkpoints/best.ckpt \
--distractor_dirs ./distractors/yfcc \
--eval_baselineGet pretrained weights by running download_ckpts.py.
@inproceedings{xing2025quari,
title={QuARI: Query Adaptive Retrieval Improvement},
author={Xing, Eric and Stylianou, Abby and Pless, Robert and Jacobs, Nathan},
booktitle={The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS)},
year={2025}
}