RL Implementations in JAX

Single-file implementations focused on clarity rather than proper code standards :)

Algo	Path	Discrete Actions	Continuous Actions	Multi-CPU	Other
TRPO	trpo/	trpo.py	cont.py
PPO	ppo/	ppo_disc.py	ppo.py	*_multi.py
MAML	maml/				SineWave = maml_wave.py
DQN		dqn.py
REINFORCE	reinforce/	reinforce_jax.py	reinforce_cont.py		Pytorch = policy_grad.py Time Comparison = reinforce_torchVSjax.py
DDPG	ddpg/		ddpg_jax.py		TD3_DDPG = ddpg_td3.py
A2C	a2c/	a2c.py		*_multi.py

For a better understanding of TRPO optimization check out Natural Gradient Descent without the Tears

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
a2c		a2c
ddpg		ddpg
demos		demos
maml		maml
ppo		ppo
reinforce		reinforce
trpo		trpo
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
qlearn.py		qlearn.py
tmp.md		tmp.md

Provide feedback