bandit.py:K-臂老虎机td.py:时序差分算法,包含单步 Sarsa,多步 Sarsa,Q-Learningdyna-q.py:Dyna-Q 算法dqn.py:DQN 算法及其两种进阶:Double DQN 与 Dueling DQNreinforce.py:策略梯度算法actor_critic.py:演员-评论员算法trpo.py:TRPO 算法ppo.py:PPO 算法
-
Couldn't load subscription status.
- Fork 2
haukzero/rl-basic-learn
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
强化学习基础算法 [K-臂老虎机 | Sarsa | Q-Learning | Dyna-Q | DQN | REINFORCE | TRPO | PPO]
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published