Implementation of 'Distributional Reinforcement Learning with Regularized Wasserstein Loss' (NeurIPS 2024)
This repository contains the Pytorch implementation of our Sinkhorn Distributional RL paper "Distributional Reinforcement Learning with Regularized Wasserstein Loss". We include all distributional RL algorithms considered in our paper, including DQN, C51, QR-DQN, MMDDRL, and SinkhornDRL(ours).
Install OpenAI Baselines, and then install dependency in requirement.txt.
git clone https://github.com/openai/baselines.git
cd baselines
pip install -e .
We take the Breakout environment as an example. Run the following code respectively and then plot the learning curves. (Note: --multi 0 by default)
python main.py --game breakout --method DQN --iter 10000000 --seed 1 --gpu 1
python main.py --game breakout --method C51 --iter 10000000 --seed 1 --gpu 1
python main.py --game breakout --method QRDQN --iter 10000000 --seed 1 --gpu 1
python main.py --game breakout --method MMD --iter 10000000 --seed 1 --gpu 1
python main.py --game breakout --method Sinkhorn --iter 10000000 --niter_sink 10 --epsilon 10 --samples 200 --seed 1 --gpu 1
python main.py --game Asteroids --method MMD --iter 10000000 --multi 1 --seed 1 --gpu 0
python main.py --game Asteroids --method Sinkhorn --iter 10000000 --multi 1 --seed 1 --gpu 1
We consider five Atari games with multi-dimensional rewards: AirRaid, Asteroids, Gopher, MsPacman, UpNDown, and Pong. For the reward decomposition method, please refer to the file reward-compose.
This implementation is adapted from ShangtongZhang's Modularized Implementation of Deep RL Algorithms. Our implementation in the multi-dimensional setting is adapted from the original implementation of the paper Distributional Reinforcement Learning for Multi-Dimensional Reward Functions (NeurIPS 2021).
Please contact ksun6@ualberta.ca if you have any questions.
Please cite our paper if you use our implementation in your research:
@inproceedings{sun2024distributional,
title={Distributional Reinforcement Learning with Regularized Wasserstein Loss},
author={Sun, Ke and Zhao, Yingnan and Liu, Wulong and Jiang, Bei and Kong, Linglong},
booktitle={Advances in Neural Information Processing Systems},
year={2024}
}
