Single-file implementations focused on clarity rather than proper code standards :)
| Algo | Path | Discrete Actions | Continuous Actions | Multi-CPU | Other |
|---|---|---|---|---|---|
| TRPO | trpo/ | trpo.py | cont.py | ||
| PPO | ppo/ | ppo_disc.py | ppo.py | *_multi.py | |
| MAML | maml/ | SineWave = maml_wave.py | |||
| DQN | dqn.py | ||||
| REINFORCE | reinforce/ | reinforce_jax.py | reinforce_cont.py | Pytorch = policy_grad.py Time Comparison = reinforce_torchVSjax.py |
|
| DDPG | ddpg/ | ddpg_jax.py | TD3_DDPG = ddpg_td3.py | ||
| A2C | a2c/ | a2c.py | *_multi.py |
For a better understanding of TRPO optimization check out Natural Gradient Descent without the Tears