Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
-
Updated
Aug 29, 2025 - Jupyter Notebook
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Recent Advances in Vision and Language Pre-training (VLP)
A curated list of vision-and-language pre-training (VLP). :-)
Code Implementation of "Simple Image-level Classification Improves Open-vocabulary Object Detection" (AAAI'24)
Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations
Vision-Language Pre-Training for Boosting Scene Text Detectors (CVPR2022)
A list of research papers on knowledge-enhanced multimodal learning
The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".
Official implementation of "Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval" ---【NeurIPS 2025】
Korean version of CLIP which achieves Korean cross-modal retrieval and representation generation.
Add a description, image, and links to the vision-and-language-pre-training topic page so that developers can more easily learn about it.
To associate your repository with the vision-and-language-pre-training topic, visit your repo's landing page and select "manage topics."