This is the official PyTorch implementation of LR2PPO. The ECCV2024 paper is available at arXiv.
Introduction video: YouTube
- Download dataset: HuggingFace Hub
- Optional: Original MovieNet dataset Official Website
- Pre-processed datasets (
datasets_trad) available: Google Drive - Optional preparation:
- Follow dataset generation guide:
datasets_trad/README.md - Access source datasets:
• MSLR-Web10K: Microsoft Research
• MQ2008: LETOR 4.0
- Follow dataset generation guide:
Download required weights for both benchmarks:
roberta_base_en_modelandvit_base_patch16_224_model- Source: from Google Drive or from its official repositories
- Save in:
./pretrained_models/
pip3 install -r requirements.txtHardware Requirement: 4 GPUs
# Stage 1: Base Model
sh pointwise.sh <your_stage1>
# Stage 2: Reward Model
sh reward_pair_dataloader.sh <your_stage2>
# Stage 3: LR<sup>2</sup>PPO
sh ppo.sh <your_stage3>
# Evaluation
sh ppo_eval.sh <your_eval># Stage 1: Base Model
sh pointwise_trad.sh <your_stage1>
# Stage 2: Reward Model
sh reward_trad.sh <your_stage2>
# Stage 3: LR<sup>2</sup>PPO
sh ppo_trad.sh <your_stage3>
# Evaluation
sh ppo_eval_trad.sh <your_eval>- Download: Google Drive
- Download: Google Drive
See LICENSE for details.
Code components borrowed from:
- TencentPretrain
- PaLM-rlhf-pytorch
- benchmarks (Transfer Task)
We are grateful for these excellent works and repositories.
If you found our work helpful in your research, please consider citing it.
@inproceedings{guo2024multimodal,
title={Multimodal Label Relevance Ranking via Reinforcement Learning},
author={Guo, Taian and Zhang, Taolin and Wu, Haoqian and Li, Hanjun and Qiao, Ruizhi and Sun, Xing},
booktitle={European Conference on Computer Vision},
pages={391--408},
year={2024},
organization={Springer}
}