Soft Actor-Critic

This repository is still a work in progress. Pull requests are happily welcome!

Soft Actor-Critic

Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains. The algorithm is based on the paper Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor presented at ICML 2018.

This implementation uses Pytorch.

What is novel about this implementation?

It supports the Combined experience replay introduced by this paper: A Deeper Look at Experience Replay
It supports the new Ranger optimizer introduced by this blog post: New Deep Learning Optimizer, Ranger: Synergistic combination of RAdam + LookAhead for the best of both.
It supports the Munchausen-SAC introduced by this paper: Munchausen Reinforcement Learning

Getting Started

Soft Actor-Critic can be run locally.

Examples:

Train agent on the Humanoid-v2 mujoco environment and save checkpoints and tensorboard summary to directory Humanoid-v2/
python3 main.py --env_name=Humanoid-v2 --log_dir=Humanoid-v2

Continue training the aformentioned agent
python3 main.py --env_name=Humanoid-v2 --log_dir=Humanoid-v2 --continue_training

Test the agent trained on Ant-v3 in the environment with weights loaded from Ant-v3/
python3 main.py --env_name=Ant-v3 --log_dir=Ant-v3 --test --render_testing --num_test_games=10

To change hyperparameters of the SAC algorithm feel free to look at pytorch-SAC/Hyperparameters.py

Results

Evaluation Reward for Humanoid Environment

Evaluation Reward for Ant Environment

Prerequisites

Most of the models require a Mujoco license.

Credits

The soft actor-critic algorithm was developed by Tuomas Haarnoja under the supervision of Prof. Sergey Levine and Prof. Pieter Abbeel at UC Berkeley. Special thanks to Vitchyr Pong, who wrote some parts of the code, and Kristian Hartikainen who helped testing, documenting, and polishing the code and streamlining the installation process. The work was supported by Berkeley Deep Drive.

Reference

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. ICML, 2018.

Soft Actor-Critic Algorithms and Applications.
Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine. arXiv preprint, 2018.

Lookahead Optimizer: k steps forward, 1 step back.
Michael R. Zhang, James Lucas, Geoffrey Hinton, Jimmy Ba

On the Variance of the Adaptive Learning Rate and Beyond.
Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han

A Deeper Look at Experience Replay.
Shangtong Zhang, Richard S. Sutton

Munchausen Reinforcement Learning.
Nino Vieillard and Olivier Pietquin and Matthieu Geist

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
assets		assets
pytorch-SAC		pytorch-SAC
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Soft Actor-Critic

What is novel about this implementation?

Getting Started

To change hyperparameters of the SAC algorithm feel free to look at pytorch-SAC/Hyperparameters.py

Results

Prerequisites

Credits

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Fable67/Soft-Actor-Critic

Folders and files

Latest commit

History

Repository files navigation

Soft Actor-Critic

What is novel about this implementation?

Getting Started

To change hyperparameters of the SAC algorithm feel free to look at pytorch-SAC/Hyperparameters.py

Results

Prerequisites

Credits

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages