BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning

Paper | Project Page | NeurIPS 2025

This repository is the official implementation for the paper BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning (NeurIPS 2025).

BEAST introduces a novel, highly efficient action representation for imitation learning. By encoding action sequences using B-Splines, it creates a compact, continuous, and expressive tokenization of robot trajectories. Our primary model, BEAST-F, models this continuous action-token space using a Rectified Flow model, achieving state-of-the-art performance on challenging long-horizon benchmarks, including CALVIN and LIBERO.

This repository provides all the code necessary to download the datasets, preprocess the data, and reproduce our training and evaluation results.

Installation

To begin, clone this repository locally

git clone git@github.com:intuitive-robots/beast_calvin.git
export BEAST_ROOT=$(pwd)/beast_calvin

Install requirements (Note we provided a changed verison of pyhash, given numerous problems we encountered when installing it manually on our slurm cluster) You can also try to install setup tools using pip.

cd $BEAST_ROOT
conda create -n beast_cal python=3.9
conda activate beast_cal
conda install cmake
cd calvin_env/tacto
pip install -e .
cd ..
pip install -e .
cd ..
cd LIBERO
pip install -r requirements.txt
pip install -e .
pip install numpy~=1.23
cd ..
pip install setuptools==57.5.0
conda install conda-forge::pyhash
cd MP_lite_PyTorch
pip install -e .
pip install addict
cd ..

Next we can install the rest of the missing packages

pip install -r requirements.txt

Download

CALVIN Dataset

If you want to train on the CALVIN dataset, choose a split with:

cd $BEAST_ROOT/dataset
sh download_data.sh D | ABCD

LIBERO Dataset

If you want to train on the LIBERO dataset, choose a split with:

cd $BEAST_ROOT/LIBERO
python benchmark_scripts/download_libero_datasets.py --datasets DATASET_NAME

where DATASET_NAME is chosen from [libero_spatial, libero_object, libero_100, libero_goal].

Training

To train the BEAST-F with the 4 GPUS, run:

python beast/training_calvin.py

Note that during training the full CALVIN eval or LIBERO rollouts will be called every n*1k training steps.

For replication of the orginial training results I recommend to use 4 GPUs with a batch_size of 8 and train them for 40k steps for ABC (ABCD). See configs for details.

Preprocessing with CALVIN

Since BEAST uses action chunking, it needs to load multiple (~10) episode_{}.npz files for each inference. In combination with batching, this results in a large disk bandwidth needed for each iteration (usually ~2000MB/iteration). This has the potential of significantly reducing your GPU utilization rate during training depending on your hardware. Therefore, you can use the script extract_by_key.py to extract the data into a single file, avoiding opening too many episode files when using the CALVIN dataset.

Usage example:

python preprocess/extract_by_key.py -i /YOUR/PATH/TO/CALVIN/ \
    --in_task all

Params:

Run this command to see more detailed information:

python preprocess/extract_by_key.py -h

Important params:

--in_root: /YOUR/PATH/TO/CALVIN/, e.g /data3/geyuan/datasets/CALVIN/
--extract_key: A key of dict(episode_xxx.npz), default is 'rel_actions', the saved file name depends on this (i.e ep_{extract_key}.npy)

Optional params:

--in_task: default is 'all', meaning all task folders (e.g task_ABCD_D/) of CALVIN
--in_split: default is 'all', meaning both training/ and validation/
--out_dir: optional, default is 'None', and will be converted to {in_root}/{in_task}/{in_split}/extracted/
--force: whether to overwrite existing extracted data

Acknowledgements

This work is only possible because of the code from the following open-source projects and datasets. We thank all authors for their work:

BibTeX

If you find this work useful, please cite our paper:

@inproceedings{
    zhou2025beast,
    title={{BEAST}: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning},
    author={Hongyi Zhou and Weiran Liao and Xi Huang and Yucheng Tang and Fabian Otto and Xiaogang Jia and Xinkai Jiang and Simon Hilber and Ge Li and Qian Wang and {\"O}mer Erdin{\c{c}} Ya{\u{g}}murlu and Nils Blank and Moritz Reuss and Rudolf Lioutikov},
    booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
    year={2025},
    url={https://openreview.net/forum?id=rQCl1sf62w}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LIBERO @ c9b053c		LIBERO @ c9b053c
MP_lite_PyTorch @ 6aaff00		MP_lite_PyTorch @ 6aaff00
beast		beast
calvin_env @ 797142c		calvin_env @ 797142c
conf		conf
preprocess		preprocess
pyhash-0.9.3		pyhash-0.9.3
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning

Installation

Download

CALVIN Dataset

LIBERO Dataset

Training

Preprocessing with CALVIN

Usage example:

Params:

Acknowledgements

CALVIN

LIBERO

Mimictest

HULC

MP_lite_PyTorch

FLOWER

BibTeX

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

intuitive-robots/beast_calvin

Folders and files

Latest commit

History

Repository files navigation

BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning

Installation

Download

CALVIN Dataset

LIBERO Dataset

Training

Preprocessing with CALVIN

Usage example:

Params:

Acknowledgements

CALVIN

LIBERO

Mimictest

HULC

MP_lite_PyTorch

FLOWER

BibTeX

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages