Frank Fundel · Johannes Schusterbauer · Vincent Tao Hu · Björn Ommer
CompVis @ LMU Munich, MCML
WACV 2025
We present DistillDIFT, a highly efficient approach to semantic correspondence that delivers state-of-the-art performance with significantly reduced computational cost. Unlike traditional methods that combine multiple large generative models, DistillDIFT uses a novel distillation technique to unify the strengths of two vision foundation models into a single, streamlined model. By integrating 3D data without requiring human annotations, DistillDIFT further improves accuracy.
Overall, our empirical results demonstrate that our distilled model with 3D data augmentation achieves superior performance to current state-of-the-art methods while significantly reducing computational load and enhancing practicality for real-world applications, such as semantic video correspondence.
This setup was tested with Ubuntu 22.04.4 LTS, CUDA Version: 12.2, and Python 3.9.20.
First, clone the github repo...
git clone git@github.com:CompVis/distilldift.git
cd DistillDIFTOur evaluation pipeline for SPair-71K is based on Telling-Left-From-Right for better comparability.
Follow their environment setup and data preparation, don't forget to first:
cd evalAnd then run the evaluation script via
bash eval_distilldift.shFirst use
cd trainThen you have either the option to setup a virtual environment and install all required packages with pip via
pip install -r requirements.txtor if you prefer to use conda create the conda environment via
conda env create -f environment.yamlDownload the COCO dataset and embed the images (for unsupervised training) via
bash datasets/download_coco.sh
python embed.py --dataset_name COCOAnd run the training via
- Unsupervised Distillation
accelerate launch --multi_gpu --num_processes 4 train.py distilled_us --dataset_name COCO --use_cache
- Weakly Supervised Distillation
accelerate launch --multi_gpu --num_processes 4 train.py distilled_ws --dataset_name SPair-71k --use_cache
- Supervised Training
accelerate launch --multi_gpu --num_processes 4 train.py distilled_s --dataset_name SPair-71k --use_cache
Follow the official instructions to download the CO3D dataset and then prepare the CO3D dataset via
python datasets/create_co3d.pyAnd run the training via
accelerate launch --multi_gpu --num_processes 4 train.py distilled_s --dataset_name CO3D --use_cachePlease cite our paper:
@article{fundel2025distilldift,
author = {Frank Fundel and Johannes Schusterbauer and Vincent Tao Hu and Björn Ommer},
title = {Distillation of Diffusion Features for Semantic Correspondence},
journal = {WACV},
year = {2025},
}