Skip to content

Fable67/Soft-Actor-Critic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository is still a work in progress. Pull requests are happily welcome!

Soft Actor-Critic

Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains. The algorithm is based on the paper Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor presented at ICML 2018.

This implementation uses Pytorch.

What is novel about this implementation?

It supports the Combined experience replay introduced by this paper: A Deeper Look at Experience Replay
It supports the new Ranger optimizer introduced by this blog post: New Deep Learning Optimizer, Ranger: Synergistic combination of RAdam + LookAhead for the best of both.
It supports the Munchausen-SAC introduced by this paper: Munchausen Reinforcement Learning

Getting Started

Soft Actor-Critic can be run locally.

Examples:

Train agent on the Humanoid-v2 mujoco environment and save checkpoints and tensorboard summary to directory Humanoid-v2/
python3 main.py --env_name=Humanoid-v2 --log_dir=Humanoid-v2

Continue training the aformentioned agent
python3 main.py --env_name=Humanoid-v2 --log_dir=Humanoid-v2 --continue_training

Test the agent trained on Ant-v3 in the environment with weights loaded from Ant-v3/
python3 main.py --env_name=Ant-v3 --log_dir=Ant-v3 --test --render_testing --num_test_games=10

To change hyperparameters of the SAC algorithm feel free to look at pytorch-SAC/Hyperparameters.py

Results

Evaluation Reward for Humanoid Environment

Evaluation Reward for Ant Environment

Prerequisites

Most of the models require a Mujoco license.

Credits

The soft actor-critic algorithm was developed by Tuomas Haarnoja under the supervision of Prof. Sergey Levine and Prof. Pieter Abbeel at UC Berkeley. Special thanks to Vitchyr Pong, who wrote some parts of the code, and Kristian Hartikainen who helped testing, documenting, and polishing the code and streamlining the installation process. The work was supported by Berkeley Deep Drive.

Reference

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. ICML, 2018.

Soft Actor-Critic Algorithms and Applications.
Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine. arXiv preprint, 2018.

Lookahead Optimizer: k steps forward, 1 step back.
Michael R. Zhang, James Lucas, Geoffrey Hinton, Jimmy Ba

On the Variance of the Adaptive Learning Rate and Beyond.
Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han

A Deeper Look at Experience Replay.
Shangtong Zhang, Richard S. Sutton

Munchausen Reinforcement Learning.
Nino Vieillard and Olivier Pietquin and Matthieu Geist

About

Implementation of Soft Actor-Critic Reinforcement Learning Algorithm using Pytorch.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages