Q-learning is one of the early breakthroughs in reinforcement learning developed by Chris Watkins in 1989 when he was a graduate student at Cambridge University. The algorithm is efficient and simple. It is still one of the most popular algorithms today. This report shows the implementation of Q-learning in a navigation task. The impact of different parameters (e.g. learning rate, discount rate, exploration factor, decay) are studied. A grid search is performed to find the optimal combination. Furthermore, Double Q-learning, invented to solve overestimation from Q-learning, is implemented to understand the differences to Q-learning. In addition, the parallelism between Q-learning and psychological learning theories is discussed.
Author: Harry Li, Xin Li