Chapter 8: Training loop and min_progress

Unless I'm mistaken, there is something odd about the main training loop (Listing 8.13) for the Super Mario game in Chapter 8. The way that the current x-position is checked against the `min_progress` parameter makes no sense to me. 
More precisely: in line 23 of the main training loop, the environment step is taken (6 times) and `last_x_pos` is set to the current x-position:
```
state2, e_reward_, done, info = env.step(action)
last_x_pos = info['x_pos']
```
In the following lines of code, neither `last_x_pos` nor `info['x_pos']` are changed. Then in line 33 the two are compared to one another:
```
if episode_length > params['max_episode_len']:
     if (info['x_pos'] - last_x_pos) < params['min_progress']:
          done = True
     else:
          last_x_pos = info['x_pos']
```
Isn't `info['x_pos'] - last_x_pos` **always** going to be zero here? This would always reset the environment as soon as `episode_length > params['max_episode_len']`.  
What is the `min_progress` parameter meant to be intuitively? The progress from beginning till the end of one episode? The progress from time 0 till `max_episode_len`? Or the progress against a certain checkpoint in a certain amount of time? If so, how are these checkpoints chosen?
This has not become clear to me yet, neither from the book nor from the code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chapter 8: Training loop and min_progress #34

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Chapter 8: Training loop and min_progress #34

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions