Skip to content

Commit 99a01b1

Browse files
committed
RoboMonkey Paper Upload
1 parent da27db6 commit 99a01b1

File tree

7 files changed

+52
-0
lines changed

7 files changed

+52
-0
lines changed

_data/people.yml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,11 @@ shayantalaei:
1919
url: https://www.linkedin.com/in/shayan-talaei-6b65a0229/
2020
title: PhD Student
2121

22+
jackykwok:
23+
name: Jacky Kwok
24+
url: https://www.linkedin.com/in/jackykwok02/
25+
title: PhD Student
26+
2227
# Visiting
2328

2429
bradleybrown:
@@ -100,6 +105,17 @@ percyliang:
100105
title: Professor
101106
not_current: True
102107

108+
marcopavone:
109+
name: Marco Pavone
110+
url: https://research.nvidia.com/person/marco-pavone
111+
title: Professor
112+
not_current: True
113+
114+
ionstoica:
115+
name: Ion Stoica
116+
url: https://people.eecs.berkeley.edu/~istoica/
117+
title: Professor
118+
not_current: True
103119
# Alumni
104120

105121
#example:

_pubs/robomonkey.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
title: "RoboMonkey: Scaling Test-Time Sampling and Verification for Vision-Language-Action Models"
3+
authors:
4+
- key: jackykwok
5+
affiliation: Stanford University
6+
- name: Christopher Agia
7+
affiliation: Stanford University
8+
- name: Rohan Sinha
9+
affiliation: Stanford University
10+
- name: Matt Foutter
11+
affiliation: Stanford University
12+
- name: Shulu Li
13+
affiliation: UC Berkeley
14+
- key: ionstoica
15+
affiliation: UC Berkeley
16+
- key: azaliamirhoseini
17+
affiliation: Stanford University
18+
- key: marcopavone
19+
affiliation: Stanford, NVIDIA
20+
venue: preprint
21+
year: 2025
22+
date: 2025-06-21
23+
has_pdf: true
24+
doi: 10.48550/arXiv.2506.17811
25+
tags:
26+
- robotics
27+
- machine learning
28+
- generative ai
29+
teaser: RoboMonkey is a test-time scaling framework that improves the robustness and generalization of Vision-Language-Action (VLA) models. RoboMonkey achieves significant performance improvements across both in-distribution and out-of-distribution tasks, as well as on new robot setups. Our findings show that scaling test-time compute through a generate-then-verify paradigm provides a practical and effective path towards building general-purpose robotics foundation models.
30+
materials:
31+
- name: Paper
32+
url: https://arxiv.org/abs/2506.17811
33+
type: file-pdf
34+
---
35+
36+
Vision-Language-Action (VLA) models have demonstrated remarkable capabilities in visuomotor control, yet ensuring their robustness in unstructured real-world environments remains a persistent challenge. In this paper, we investigate test-time scaling through the lens of sampling and verification as means to enhance the robustness and generalization of VLAs. We first demonstrate that the relationship between action error and the number of generated samples follows an exponentiated power law across a range of VLAs, indicating the existence of inference-time scaling laws. Building on these insights, we introduce RoboMonkey, a test-time scaling framework for VLAs. At deployment, RoboMonkey samples a small set of actions from a VLA, applies Gaussian perturbation and majority voting to construct an action proposal distribution, and then uses a Vision Language Model (VLM)-based verifier to select the optimal action. We propose a synthetic data generation pipeline for training such VLM-based action verifiers, and demonstrate that scaling the synthetic dataset consistently improves verification and downstream accuracy. Through extensive simulated and hardware experiments, we show that pairing existing VLAs with RoboMonkey yields significant performance gains, achieving a 25% absolute improvement on out-of-distribution tasks and 8% on in-distribution tasks. Additionally, when adapting to new robot setups, we show that fine-tuning both VLAs and action verifiers yields a 7% performance increase compared to fine-tuning VLAs alone.

imgs/people/ionstoica.jpg

1.31 MB
Loading

imgs/people/jackykwok.jpg

264 KB
Loading

imgs/people/marcopavone.jpg

240 KB
Loading

imgs/teasers/robomonkey.png

640 KB
Loading

imgs/thumbs/robomonkey.png

797 KB
Loading

0 commit comments

Comments
 (0)