Skip to content

CLeARoboticsLab/MultiFidelityPolicyGradients

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MultiFidelityPolicyGradients

This repository contains the official code for the work: “A Multi-Fidelity Control Variate Approach for Policy Gradient Estimation.”

MFPG is a reinforcement learning framework that mixes a small amount of data from the target environment with a control variate formed from a large volume of low-fidelity simulation data to construct an unbiased, variance-reduced estimator for on-policy policy gradients and improve sample efficiency.

Teaser Figure

Installation

  1. Create a conda environment: conda create -n mfpg

  2. Activate the environment conda activate mfpg

  3. Install dependencies: pip install -r requirements.txt

  4. Install MuJoCo:

  • pip3 install -U 'mujoco-py<2.2,>=2.1'

  • Follow the official MuJoCo installation guide (The "Install MuJoCo" section)

  • pip install "cython<3"

  • Add the following to your ~/.bashrc file:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$YOUR_HOME_DIRECTORY/.mujoco/mujoco210/bin

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia

Running Experiments

Configuration files are located in config/odrl_mujoco_sweep, covering:

  • MFPG, High-Fidelity Only, More High-Fidelity Data (15×) (config/odrl_mujoco_sweep/baseline_reinforce_mfpg/)

  • DARC

  • PAR

  • Low-Fidelity Only

Example: training MFPG on gravity and friction shift tasks:

python src/train/train_on_odrl_benchmark.py --algorithm "baseline_reinforce_mfpg" --config_file_name "mfpg_gravity_friction"

To reproduce Appendix C results, replace --algorithm with the name of a folder under config/odrl_mujoco_sweep and select the corresponding config file with --config_file_name.

Acknowledgements

We implement our project on top of Stable-Baselines3 and conduct our evaluation using settings from ODRLBenchmark (files located in odrl_benchmark/ are copied from ODRLBenchmark).

About

Code for the work "A Multi-Fidelity Control Variate Approach for Policy Gradient Estimation."

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages