This repository is still a work in progress. Pull requests are happily welcome!
Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains. The algorithm is based on the paper Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor presented at ICML 2018.
This implementation uses Pytorch.
It supports the Combined experience replay introduced by this paper: A Deeper Look at Experience Replay
It supports the new Ranger optimizer introduced by this blog post: New Deep Learning Optimizer, Ranger: Synergistic combination of RAdam + LookAhead for the best of both.
It supports the Munchausen-SAC introduced by this paper: Munchausen Reinforcement Learning
Soft Actor-Critic can be run locally.
Examples:
Train agent on the Humanoid-v2 mujoco environment and save checkpoints and tensorboard summary to directory Humanoid-v2/
python3 main.py --env_name=Humanoid-v2 --log_dir=Humanoid-v2
Continue training the aformentioned agent
python3 main.py --env_name=Humanoid-v2 --log_dir=Humanoid-v2 --continue_training
Test the agent trained on Ant-v3 in the environment with weights loaded from Ant-v3/
python3 main.py --env_name=Ant-v3 --log_dir=Ant-v3 --test --render_testing --num_test_games=10
Evaluation Reward for Humanoid Environment

Evaluation Reward for Ant Environment

Most of the models require a Mujoco license.
The soft actor-critic algorithm was developed by Tuomas Haarnoja under the supervision of Prof. Sergey Levine and Prof. Pieter Abbeel at UC Berkeley. Special thanks to Vitchyr Pong, who wrote some parts of the code, and Kristian Hartikainen who helped testing, documenting, and polishing the code and streamlining the installation process. The work was supported by Berkeley Deep Drive.
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. ICML, 2018.
Soft Actor-Critic Algorithms and Applications.
Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine. arXiv preprint, 2018.
Lookahead Optimizer: k steps forward, 1 step back.
Michael R. Zhang, James Lucas, Geoffrey Hinton, Jimmy Ba
On the Variance of the Adaptive Learning Rate and Beyond.
Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han
A Deeper Look at Experience Replay.
Shangtong Zhang, Richard S. Sutton
Munchausen Reinforcement Learning.
Nino Vieillard and Olivier Pietquin and Matthieu Geist