This repository contains the official code for the work: “A Multi-Fidelity Control Variate Approach for Policy Gradient Estimation.”
MFPG is a reinforcement learning framework that mixes a small amount of data from the target environment with a control variate formed from a large volume of low-fidelity simulation data to construct an unbiased, variance-reduced estimator for on-policy policy gradients and improve sample efficiency.
-
Create a conda environment:
conda create -n mfpg -
Activate the environment
conda activate mfpg -
Install dependencies:
pip install -r requirements.txt -
Install MuJoCo:
-
pip3 install -U 'mujoco-py<2.2,>=2.1' -
Follow the official MuJoCo installation guide (The "Install MuJoCo" section)
-
pip install "cython<3" -
Add the following to your
~/.bashrcfile:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$YOUR_HOME_DIRECTORY/.mujoco/mujoco210/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia
Configuration files are located in config/odrl_mujoco_sweep, covering:
-
MFPG, High-Fidelity Only, More High-Fidelity Data (15×) (
config/odrl_mujoco_sweep/baseline_reinforce_mfpg/) -
Low-Fidelity Only
Example: training MFPG on gravity and friction shift tasks:
python src/train/train_on_odrl_benchmark.py --algorithm "baseline_reinforce_mfpg" --config_file_name "mfpg_gravity_friction"
To reproduce Appendix C results, replace --algorithm with the name of a folder under config/odrl_mujoco_sweep and select the corresponding config file with --config_file_name.
We implement our project on top of Stable-Baselines3 and conduct our evaluation using settings from ODRLBenchmark (files located in odrl_benchmark/ are copied from ODRLBenchmark).
