MultiFidelityPolicyGradients

This repository contains the official code for the work: “A Multi-Fidelity Control Variate Approach for Policy Gradient Estimation.”

MFPG is a reinforcement learning framework that mixes a small amount of data from the target environment with a control variate formed from a large volume of low-fidelity simulation data to construct an unbiased, variance-reduced estimator for on-policy policy gradients and improve sample efficiency.

Installation

Create a conda environment: conda create -n mfpg
Activate the environment conda activate mfpg
Install dependencies: pip install -r requirements.txt
Install MuJoCo:

pip3 install -U 'mujoco-py<2.2,>=2.1'
Follow the official MuJoCo installation guide (The "Install MuJoCo" section)
pip install "cython<3"
Add the following to your ~/.bashrc file:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$YOUR_HOME_DIRECTORY/.mujoco/mujoco210/bin

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia

Running Experiments

Configuration files are located in config/odrl_mujoco_sweep, covering:

MFPG, High-Fidelity Only, More High-Fidelity Data (15×) (config/odrl_mujoco_sweep/baseline_reinforce_mfpg/)
DARC
PAR
Low-Fidelity Only

Example: training MFPG on gravity and friction shift tasks:

python src/train/train_on_odrl_benchmark.py --algorithm "baseline_reinforce_mfpg" --config_file_name "mfpg_gravity_friction"

To reproduce Appendix C results, replace --algorithm with the name of a folder under config/odrl_mujoco_sweep and select the corresponding config file with --config_file_name.

Acknowledgements

We implement our project on top of Stable-Baselines3 and conduct our evaluation using settings from ODRLBenchmark (files located in odrl_benchmark/ are copied from ODRLBenchmark).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
config/odrl_mujoco_sweep		config/odrl_mujoco_sweep
media		media
odrl_benchmark		odrl_benchmark
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MultiFidelityPolicyGradients

Installation

Running Experiments

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

CLeARoboticsLab/MultiFidelityPolicyGradients

Folders and files

Latest commit

History

Repository files navigation

MultiFidelityPolicyGradients

Installation

Running Experiments

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages