DRL university course lecture notes & exercises
| Chapter | Sections recap |
|---|---|
| Hello world | Basic terminology and definitions (based on spinning up RL, by openAI) |
| RL Basics | MDPs, Polciy/Value-Iteration, MC, SARSA & Q-Learning |
| DQN & it's derivatives | Deep Q-Network (DQN), Double DQN, Dueling-DQN |
| Policy Gradients | REINFORCE, REINFORCE with Baseline, Actor-Critic methods |
| Imitation Learning | Apprenticeship, Supervised and forward learning. Dagger, Dagger with coaching |
| Multi-Armed Bandit | Bandit algorithm, Gradient based algorithm, contextual bandits, Thompson sampling |
| RL use-case: AlphaGo | Monte Carlo Tree Search, AlphaGo, AlphaZero |
| Meta and Transfer Learning | Concepts in Meta learning and Transfer learning in the context of RL |
| Large action spaces | Examining some papers discussing handling with large action spaces |
| Advanced model learning & exploration | Learning in latent space, next states predictions, exploration schemes |
| Exercise | Description |
|---|---|
| ex1 | Q-Learning and Deep-Q-Learning (DQN) implementations from scratch |
| ex2 | REINFORCE (with and without baseline) and Monte Carlo Actor-Critic implementations from scratch |