KM-ViPE: Online Tightly Coupled Vision-Language-Geometry Fusion for Open-Vocabulary Semantic SLAM

Contributors: Be2R Lab (ITMO), SBER Robotics Center.

Full Abstract: We present KM-ViPE (Knowledge Mapping Video Pose Engine), a real-time open-vocabulary SLAM framework for uncalibrated monocular cameras in dynamic environments. Unlike systems requiring depth sensors and offline calibration, KM-ViPE operates directly on raw RGB streams, making it ideal for ego-centric applications and harvesting internet-scale video data for training. KM-ViPE tightly couples DINO visual features with geometric constraints through a high-level features based adaptive robust kernel that handles both moving objects and movable static objects (e.g., moving furniture in ego-centric views). The system performs simultaneous online localization and open-vocabulary semantic mapping by fusing geometric and deep visual features aligned with language embeddings. Our results are competitive with state-of-the-art approaches, while existing solutions either operate offline, need depth data and/or odometry estimation, or lack dynamic scene robustness. KM-ViPE benefits from internet-scale training and uniquely combines online operation, uncalibrated monocular input, and robust handling of dynamic scenes, which makes it a good fit for autonomous robotics and AR/VR applications and advances practical spatial intelligence capabilities for embodied AI.

Arxiv

Installation

Docker

# Build new docker image
make build

# Run docker image
make DATA_DIR={YOUR_DATA_DIR} run

# Inside docker
pip install --no-build-isolation -e .


## Usage

Example usages:

```bash
# Running the full pipeline.
python run.py pipeline=default streams=raw_mp4_stream streams.base_path=YOUR_VIDEO_OR_DIR_PATH

# Running the pose-only pipeline without depth estimation.
python run.py pipeline=default streams=raw_mp4_stream streams.base_path=YOUR_VIDEO_OR_DIR_PATH pipeline.post.depth_align_model=null

Acknowledgments

ViPE is built on top of many great open-source research projects and codebases. Some of these include (not exhaustive):

Citation

If you find KM-ViPE useful in your research or application, please consider citing the following paper:

@misc{nasser2025kmvipeonlinetightlycoupled,
      title={KM-ViPE: Online Tightly Coupled Vision-Language-Geometry Fusion for Open-Vocabulary Semantic SLAM}, 
      author={Zaid Nasser and Mikhail Iumanov and Tianhao Li and Maxim Popov and Jaafar Mahmoud and Malik Mohrat and Ilya Obrubov and Ekaterina Derevyanka and Ivan Sosin and Sergey Kolyubin},
      year={2025},
      eprint={2512.01889},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.01889}, 
}

License

This project will download and install additional third-party models and softwares. Note that these models or softwares are not distributed by NVIDIA. Review the license terms of these models and projects before use. This source code is released under the Apache 2 License.

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
assets		assets
configs		configs
csrc		csrc
dino		dino
docker		docker
envs		envs
scripts		scripts
vipe		vipe
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
THIRD_PARTY_LICENSES.md		THIRD_PARTY_LICENSES.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
run.py		run.py
setup.py		setup.py
visualize_slam_map.py		visualize_slam_map.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KM-ViPE: Online Tightly Coupled Vision-Language-Geometry Fusion for Open-Vocabulary Semantic SLAM

Installation

Docker

Acknowledgments

Citation

License

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

License

be2rlab/km-vipe

Folders and files

Latest commit

History

Repository files navigation

KM-ViPE: Online Tightly Coupled Vision-Language-Geometry Fusion for Open-Vocabulary Semantic SLAM

Installation

Docker

Acknowledgments

Citation

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages