vigor/README.md at main · SimingYan/vigor

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling (ECCV 2024)

This repository contains the evaluation dataset of image descriptions as described in the paper. This dataset contains paragraph-length descriptions of MSCOCO images, along with human annotated judgment of each description's correctness relative to the corresponding image and its creativity. The dataset is provided in the standard JSON format.

Attributions

The underlying images were selected from MS-COCO. The model used to generate the descriptions is LLaVA. The underlying LLM is based on LLaMA v1 by Meta (see the applicable license agreement).

Security

See CONTRIBUTING for more information.

License

This dataset is licensed under the CC-BY-NC-4.0 License. See the LICENSE file.

Citation: Bibtex

@inproceedings{yan-eccv2024-vigor,
    title = "ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling",
    author = "Yan, Siming  and
      Bai, Min and 
      Chen, Weifeng and 
      Zhou, Xiong and 
      Huang, Qixing and 
      Li, Erran",
    booktitle = "Proceedings of European Conference on Computer Vision 2024",
    year = "2024",
    url = "https://arxiv.org/abs/2402.06118",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling (ECCV 2024)

Attributions

Security

License

Citation: Bibtex

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling (ECCV 2024)

Attributions

Security

License

Citation: Bibtex