Skip to content

Will InternVL utilize LCL vision encoder? #3

@dolphin-Dang

Description

@dolphin-Dang

Great work for visual pre-training!
However, I've noticed that InternVL have poor multi-image & multi-turn conversation ability, see InternVL issue223. It's the same in my practice.
So, is there a possibility that LCL pre-trained models can be integrated into internVL series in the future?

这是一个很棒的工作!
但是我注意到,你们的InternVL系列模型的多图、多轮对话能力可能仍有缺陷,无论是在我自己的实践中(尝试用ICL的方式让模型学习示例中如何处理图片,但是模型会把示例和后面给他的新图片搞混),还是在issue 223中,都有体现。
因此,未来是否有可能将LCL模型用到InternVL系列中,个人认为使用 Interleaved image-text data训练的模型可能非常适合于进行多图多轮对话。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions