VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

🔥 Overview

VIKI comprises VIKI-Bench (a hierarchical multi-agent visual reasoning benchmark) and VIKI-R (a two-stage learning framework).

VIKI-Bench introduces a three-level evaluation suite—Agent Activation, Task Planning, Trajectory Perception—with 23,737 tasks across 100 scenes, 6 robot morphologies, and over 1,000 asset combinations, offering both global and first-person views.
VIKI-R builds on Qwen2.5-VL-Instruct (3B/7B) via:
1. Supervised Fine-Tuning (SFT) with high quality Chain-of-Thought (CoT) annotations.
2. Reinforcement Fine-Tuning (RFT) using Grouped Relative Policy Optimization (GRPO) and combined diverse rewards.

TODO

Open source an easy-use data generation pipeline for public use.

🕓 Update Timeline

25.10.20 – Released checkpoints of VIKI-R! Includes 3-layer models with 3B and 7B parameters.
25.09.19 – Our paper was accepted to NeurIPS 2025 (Datasets and Benchmarks Track) 🎉
25.08.15 – Our work became part of the MARS Challenge (Plan Track) — welcome to participate!
25.06.09 – Released the paper, code and dataset for public access.

🎯 Key Features

Hierarchical Dataset: 23,737 tasks, 100 scenes, 6 robot types, ≥1,000 asset combos.
GRPO RL: Structured planning with dual-format and correctness rewards.
Robotic-Focused: Home layouts, varied embodied multi-agent tasks.
Metrics: Activation Accuracy, Planning Correctness & Efficiency, Trajectory RMSE/HD/DFD.

📊 Datasets

VIKI-Bench Levels

Level 1: Agent Activation
Select the appropriate subset of agents given a scene and instruction.
Level 2: Task Planning
Generate executable multi-agent action sequences within reference length.
Level 3: Trajectory Perception
Predict spatial trajectories of visible agents from first-person views; evaluate via RMSE, Hausdorff, and Dynamic Fréchet Distance.

Statistics:

23,737 task samples
100 diverse 3D scenes
6 heterogeneous robot morphologies (e.g., dual-arm, tracked, legged, humanoid)
>1,000 asset combinations
Global view + multi ego-perspectives

🚀 Quick Start

🔧 Environment Setup

# Clone repository
git clone https://github.com/MARS-EAI/VIKI-R.git
cd VIKI-R

# Create Conda environment
conda env create -f roboviki.yml
conda activate roboviki

📦 Framework Installation

# Install verl framework
cd verl
pip install --no-deps -e .
cd ..

# Install FlashAttention (download wheel from: https://github.com/Dao-AILab/flash-attention)
pip install flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

📥 Data Preparation

# Download VIKI-R dataset from Hugging Face
git clone https://huggingface.co/datasets/henggg/VIKI-R

🏋️ Training

Step 1: Supervised Fine-Tuning (SFT)

# Prepare LLaMA-Factory environment
# Use https://github.com/hiyouga/LLaMA-Factory and put the CoT data in llamafactory's dataset_info.json

# Train 3B model with SFT
llamafactory-cli train configs/viki-1-3b.yaml

Step 2: Reinforcement Learning with GRPO

# Navigate to GRPO training directory
cd train/3BGRPO/VIKI-L1

# Initialize VIKI-R-zero training
bash VIKI-R-zero.sh

# Start VIKI-R
bash VIKI-R.sh

🎯 Evaluation

# Navigate to evaluation directory
cd VIKI-R/eval

# Evaluate on Level 1: Agent Activation
cd VIKI-L1
python qwen.py

# Evaluate on Level 2: Task Planning  
cd ../VIKI-L2
python qwen.py

# Evaluate on Level 3: Trajectory Perception
cd ../VIKI-L3
python qwen.py

# Alternative: Use answer generation script for each level
cd ../VIKI-L1
python qwen_ans.py

cd ../VIKI-L2  
python qwen_ans.py

cd ../VIKI-L3
python qwen_ans.py

# Evaluation with feedback (if available)
cd ../eval_with_fb
python gpt4o.py

📊 Evaluation Metrics

Level 1 (Agent Activation): Activation Accuracy
Level 2 (Task Planning): Planning Correctness & Efficiency
Level 3 (Trajectory Perception): RMSE, Hausdorff Distance, Dynamic Fréchet Distance

🗂️ Model Zoo

Model Size	Levels Supported	Training Stages	Download	Status
3B	L1 / L2 / L3	SFT + RFT (GRPO)	viki-r-3b	Public ✅
7B	L1 / L2 / L3	SFT + RFT (GRPO)	viki-r-7b	Public ✅

📑 Citation

If our work is helpful to you, please consider citing our work!

@article{kang2025viki,
  title={VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning},
  author={Kang, Li and Song, Xiufeng and Zhou, Heng and Qin, Yiran and Yang, Jie and Liu, Xiaohong and Torr, Philip and Bai, Lei and Yin, Zhenfei},
  journal={arXiv preprint arXiv:2506.09049},
  year={2025}
}

@article{qin2025robofactory,
  title={RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints},
  author={Qin, Yiran and Kang, Li and Song, Xiufeng and Yin, Zhenfei and Liu, Xiaohong and Liu, Xihui and Zhang, Ruimao and Bai, Lei},
  journal={arXiv preprint arXiv:2503.16408},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
configs		configs
eval		eval
train		train
verl		verl
.env.example		.env.example
LICENSE		LICENSE
README.md		README.md
roboviki.yml		roboviki.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

🔥 Overview

TODO

🕓 Update Timeline

🎯 Key Features

📊 Datasets

VIKI-Bench Levels

🚀 Quick Start

🔧 Environment Setup

📦 Framework Installation

📥 Data Preparation

🏋️ Training

Step 1: Supervised Fine-Tuning (SFT)

Step 2: Reinforcement Learning with GRPO

🎯 Evaluation

📊 Evaluation Metrics

🗂️ Model Zoo

📑 Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

MARS-EAI/VIKI-R

Folders and files

Latest commit

History

Repository files navigation

VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

🔥 Overview

TODO

🕓 Update Timeline

🎯 Key Features

📊 Datasets

VIKI-Bench Levels

🚀 Quick Start

🔧 Environment Setup

📦 Framework Installation

📥 Data Preparation

🏋️ Training

Step 1: Supervised Fine-Tuning (SFT)

Step 2: Reinforcement Learning with GRPO

🎯 Evaluation

📊 Evaluation Metrics

🗂️ Model Zoo

📑 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages