How to understand image feature may not matching with text in interleaved image-text data?

Dear author,
Hello, I have noticed that in the laion400m-random data, images are randomly placed before or after their corresponding text, which is similar to real data found on the internet. However, in the paper, images are only compared with the their preceding text for contrastive loss constraint. This may lead to the preceding text does not match the image, so it would introduce noise to the contrastive loss? Even though the generative loss will learn some image-text related information, this noise is not eliminated. May I ask how you view and address this issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to understand image feature may not matching with text in interleaved image-text data? #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to understand image feature may not matching with text in interleaved image-text data? #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions