Skip to content

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x8 and 10x256). Proprio Input : 10 vs Actual size 8 #12

@mizuh0n

Description

@mizuh0n

Hi, Thanks for your great work and now I got a self trained checkpoint but have an error.

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x8 and 10x256)
  File "/work/modules/otter/script/eval_libero.py", line 247, in <module>
    eval_libero()                                                                                                                                                                                                                             File "/work/x86_64_24.05-py3/lib/python3.10/site-packages/draccus/argparsing.py", line 228, in wrapper_inner
    response = fn(cfg, *args, **kwargs)
  File "/work/modules/otter/script/eval_libero.py", line 182, in eval_libero
    action = model(                                                                                                                                                                                                                           File "/work/modules/python/otter/policy/otter_interface.py", line 298, in __call__
    action = self.model.forward_inference(                                                                                                                                                                                                    File "/work/x86_64_24.05-py3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/work/modules/python/otter/policy/otter.py", line 319, in forward_inference
    transformer_input = self.forward_encoder(images, text, proprio, text_mask)
  File "/work/modules/python/otter/policy/otter.py", line 258, in forward_encoder
    proprio_token = self.proprio_encoder(proprio)  # (B*T, proprio_dim)
  File "/work/x86_64_24.05-py3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/work/x86_64_24.05-py3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/work/modules/python/otter/policy/models.py", line 53, in forward
    return self.mlp(x)
  File "/work/x86_64_24.05-py3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/work/x86_64_24.05-py3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/work/x86_64_24.05-py3/lib/python3.10/site-packages/torch/nn/modules/container.py", line 250, in forward
    input = module(input)
  File "/work/x86_64_24.05-py3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/work/x86_64_24.05-py3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/work/x86_64_24.05-py3/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 125, in forward
    return F.linear(input, self.weight, self.bias)
  RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x8 and 10x256)

This is because the shape of proprio input to model is actually (8,) (=3+3+2)

otter/script/eval_libero.py

Lines 176 to 178 in 4be49c4

proprio=np.concatenate(
(obs["robot0_eef_pos"], quat2axisangle(obs["robot0_eef_quat"]), obs["robot0_gripper_qpos"])
)

but the model has the proprio size 10. How to solve this error?

Loading model from checkpoint: data/weights/checkpoint_75000.pt
======================================================================
                             OTTER Model
======================================================================

                          Model Architecture
----------------------------------------------------------------------
CLIP Model     : ViT-L/14
Cameras        : 2 (image_primary, image_wrist)
Seq Length     : 12
Action Horizon : 12

                              Dimensions
----------------------------------------------------------------------
Proprio Input                : 10
Action Dim                   : 10
Num Readouts                 : 4
First K Tokens               : 15
Vision Pool Out (per camera) : 256
Text Pool Out                : 128
Proprio Out                  : 64
FT Dim                       : 704
Transformer In               : 768

                              Parameters
----------------------------------------------------------------------
Total                           : 458.67M
CLIP                            : 427.62M
Policy                          : 28.37M
Trainable (incl. modality enc.) : 0

======================================================================

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions