Skip to content

Questions about training performance #31

@lingxiao-guo

Description

@lingxiao-guo

Hi, @chernyadev! I follow [https://github.com/robobase-org/robobase.git]'s implementation and run the Diffusion Policy/ACT training code on bigym [https://github.com/chernyadev/bigym/tree/fix_original_demos] but even through extensive hardwork on debugging, I still can't make things work. Here's what I've tried:

I choose dishwasher_close and drawer_top_close tasks for training, where in the original bigym paper, these two tasks' performance are both almost near 100% success rate on Diffusion Policy and ACT. The robobase doesn't figure out how to change the robobase_config.yaml to fit imitation learning pipeline, so I change the robobase_config.yaml according to the bigym paper. Here's my robobase_config:

defaults:
  - _self_
  - env: null
  - method: null
  - intrinsic_reward_module: null
  - launch: null
  - override hydra/launcher: joblib

# Universal settings
create_train_env: true
num_train_envs: 1
replay_size_before_train: 3000
num_pretrain_steps: 1100000
num_train_frames: 1100000
eval_every_steps: 10000
num_eval_episodes: 5
update_every_steps: 2
num_explore_steps: 2000
save_snapshot: false
snapshot_every_n: 10000
batch_size: 256

# Demonstration settings
demos: 200
demo_batch_size: 256 # null  # If set to > 0, introduce a separate buffer for demos
use_self_imitation: true # false  # When using a separate buffer for demos, If set to True, save successful (online) trajectories into the separate demo buffer

# Observation settings
pixels: false
visual_observation_shape: [84, 84]
frame_stack: 2
frame_stack_on_channel: true
use_onehot_time_and_no_bootstrap: false

# Action settings
action_repeat: 1
action_sequence: 16 # ActionSequenceWrapper
execution_length: 8  # If execution_length < action_sequence, we use receding horizon control
temporal_ensemble: true  # Temporal ensemling only applicable to action sequence > 1
temporal_ensemble_gain: 0.01
use_standardization: true  # Demo-based standardization for action space
use_min_max_normalization: true  # Demo-based min-max normalization for action space
min_max_margin: 0.0  # If set to > 0, introduce margin for demo-driven min-max normalization
norm_obs: true

# Replay buffer settings
replay:
  prioritization: false
  size: 1000000
  gamma: 0.99
  demo_size: 1000000
  save_dir: null
  nstep: 3
  num_workers: 4
  pin_memory: true
  alpha: 0.7  # prioritization
  beta: 0.5  # prioritization
  sequential: false
  transition_seq_len: 1  # The length of transition sequence returned from sample() call. Only applicable if sequential is True

# logging settings
wandb:  # weight and bias
  use: true
  project: ${oc.env:USER}RoboBase
  name: null

tb:  # TensorBoard
  use: false
  log_dir: /tmp/robobase_tb_logs
  name: null

# Misc
experiment_name: exp
seed: 1
num_gpus: 1
log_every: 1000
log_train_video: false
log_eval_video: true
log_pretrain_every: 100
save_csv: false

hydra:
  run:
    dir: ./exp_local/${now:%Y.%m.%d}/${now:%H%M%S}_${hydra.job.override_dirname}
  sweep:
    dir: ./exp_local/${now:%Y.%m.%d}/${now:%H%M}_${hydra.job.override_dirname}
    subdir: ${hydra.job.num}

Then I utilize the command to start training on diffusion policy:

python train.py method=act env=bigym/dishwasher_close  replay.nstep=1  

But nothing works. Success rate is always zero. Besides, It seems that there's a bug in your temporal ensemble code. In the robobase/robobase/envs/wrappers/action_sequence.py, def step_sequence(), your temporal ensemble code is:

        self._action_history[
            self._cur_step, self._cur_step : self._cur_step + self._sequence_length
        ] = action
        for i, sub_action in enumerate(action):
            if self._temporal_ensemble and self._sequence_length > 1:
                # Select all predicted actions for self._cur_step. This will cover the
                # actions from [cur_step - sequence_length + 1, cur_step)
                # Note that not all actions in this range will be valid as we might have
                # execution_length > 1, which skips some of the intermediate steps.
                cur_actions = self._action_history[:, self._cur_step]
                indices = np.all(cur_actions != 0, axis=1)
                cur_actions = cur_actions[indices]

                # earlier predicted actions will have smaller weights.
                exp_weights = np.exp(-self._gain * np.arange(len(cur_actions)))
                exp_weights = (exp_weights / exp_weights.sum())[:, None]
                sub_action = (cur_actions * exp_weights).sum(axis=0)

            observation, reward, termination, truncation, info = self.env.step(
                sub_action
            )
            self._cur_step += 1
            if self.is_demo_env:
                demo_actions[i] = info.pop("demo_action")
            total_reward += reward
            action_idx_reached += 1
            if termination or truncation:
                break
            
            if not self.is_demo_env:
                if action_idx_reached == self._execution_length:
                    break

It seems that even being set to temporal ensemble, the function still execute the ‘for' loop and output sub_action multiple times. And each time the sub_action is the same. So I change the code to the following code to avoid this:

        self._action_history[
            self._cur_step, self._cur_step : self._cur_step + self._sequence_length
        ] = action
        for i, sub_action in enumerate(action):
            if self._temporal_ensemble and self._sequence_length > 1:
                # Select all predicted actions for self._cur_step. This will cover the
                # actions from [cur_step - sequence_length + 1, cur_step)
                # Note that not all actions in this range will be valid as we might have
                # execution_length > 1, which skips some of the intermediate steps.
                cur_actions = self._action_history[:, self._cur_step]
                indices = np.all(cur_actions != 0, axis=1)
                cur_actions = cur_actions[indices]

                # earlier predicted actions will have smaller weights.
                exp_weights = np.exp(-self._gain * np.arange(len(cur_actions)))
                exp_weights = (exp_weights / exp_weights.sum())[:, None]
                sub_action = (cur_actions * exp_weights).sum(axis=0)

            observation, reward, termination, truncation, info = self.env.step(
                sub_action
            )
            self._cur_step += 1
            if self.is_demo_env:
                demo_actions[i] = info.pop("demo_action")
            total_reward += reward
            action_idx_reached += 1
            if termination or truncation:
                break
            if self._temporal_ensemble and self._sequence_length > 1:
                break
            if not self.is_demo_env:
                if action_idx_reached == self._execution_length:
                    break

Then I launch another training but still nothing works.
Then I take a look at the eval video:

Drawer_top_close:
eval_rollout_27000_d9dafc292b347b6e5e92

It seems that the robot completed the task to a certain degree, but the overall success rate is still zero and the task reward is zero, too. This is wired.

Dishwasher_close:
eval_rollout_22000_232e8d2b078038481e28

The robot fails to complete the task at all.

Besides I found you use DDIM for training. But as far as I know DDIM can be only used to accelerate sampling of DDPM, but the training code still relies on DDPM. I'm not a diffusion expert so I don't know if I'm wrong.

And so I tried to launch training on ACT code and it turns out:

Error executing job with overrides: ['method=act', 'env=bigym/dishwasher_close', 'replay.nstep=1']
Error in call to target 'robobase.method.act.ActBCAgent':
AttributeError("'NoneType' object has no attribute 'output_shape'")
full_key: method

I figure out that this is because the ACT code only supports image obs, where in above experiments I used state obs. So I change the pixel term in the robobase_config.yaml:

# Observation settings
pixels: true

And it turns out [Inquiry Regarding Using Repo with BiGym · Issue #3 · robobase-org/robobase (github.com)](robobase-org/robobase#3) which remains not answered.

Then I want to know if there's anything wrong on the demos. But I can't run your demo replay code since my server doesn't have a monitor or GUI.

So I'm stuck here and really wish for your help. To be honest, I think your bigym is really an amazing benchmark since it's the only one I know that supports long-horizon mobile manipulation and has plenty of human demos and tasks. I believe that better maintenance on this respository will definitly enlarge the impact of this work in the community. I hope my feedback can aid you to better develop this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions