Official Repository for "Hyper-CL: Conditioning Sentence Representations with Hypernetworks" [Paper(arXiv)])
In this section, we describe how to train a Hyper-CL model by using our code. This code based on C-STS
Run the following script, the requirements are the same as C-STS.
Download the C-STS dataset and locate the file at data/ (reference the C-STS repository for more details.)
pip install -r requirements.txtWe provide example training scripts for finetuning and evaluating the models in the paper. Go to C-STS/ and execute the following command
bash run_sts.shFollowing the arguments of C-STS, we explain the additional arguments in following :
-
--objective: (If you train Hyper-CL, you should usetriplet_cl_mse) -
--cl_temp: Temperature for contrastive loss -
--cl_in_batch_neg: Add in-batch negative loss to main loss -
--hypernet_scaler: To set the value of K for low-rank implemented Hyper-CL (i.e., hyper64-cl, hyper85-cl), we determine the divisor of the embedding size. For instance, in the base model, 'K=64' for hyper64-cl means the embedding size 768 is divided by 12. Thus, the hypernet_scaler is set to12. -
--hypernet_dual: Dual encoding that uses separate 2 encoders for sentences 1 and 2 and for the condition.
We use the following hyperparamters for training Hyper-CL:
| Emb.Model | Learning rate (lr) | Weight decay (wd) | Temperature (temp) |
|---|---|---|---|
| DiffCSE_base+hyper-cl | 3e-5 | 0.1 | 1.5 |
| DiffCSE_base+hyper64-cl | 1e-5 | 0.0 | 1.5 |
| SimCSE_base+hyper-cl | 3e-5 | 0.1 | 1.9 |
| SimCSE_base+hyper64-cl | 2e-5 | 0.1 | 1.7 |
| SimCSE_large+hyper-cl | 2e-5 | 0.1 | 1.5 |
| SimCSE_large+hyper85-cl | 1e-5 | 0.1 | 1.9 |
We provide example training scripts for finetuning and evaluating the models in the paper. Go to sim-kcg/ and execute the following command. This code is based on SimKCG
bash scripts/preprocess.sh WN18RRbash scripts/train_wn.shWe explain the arguments in following:
--pretrained-model: Backbone model checkpoint (bert-base-uncasedorbert-large-uncased)--encoding_type: Encoding type (bi_encoderortri_encoder)--triencoder_head: Triencoder head (concat,hadamardorhypernet)- Refer to
config.pyfor other arguments.
bash scripts/eval.sh ./checkpoint/WN18RR/model_best.mdl WN18RRPlease cite our paper if you use Hyper-CL in your work:
@article{yoo2024hyper,
title={Hyper-CL: Conditioning Sentence Representations with Hypernetworks},
author={Yoo, Young Hyun and Cha, Jii and Kim, Changhyeon and Kim, Taeuk},
journal={arXiv preprint arXiv:2403.09490},
year={2024}
}