Chapter4: More episode duration leads to a decrease in policy gradient method!

Accoring to what authors say in chapter 4, more episode duration will allow the model to hold the game longer.

Then I download the code of chapter 4, run it locally with MAX_EPISODES = 250.

Surprisingly, this makes the model be bad at the task, only 22 times exceed 180s while the original model can make it by 90 times.

And I also reset the model, try with higher MAX_EPISODES, but all of them fail to beat the beginning set.

What may contribute to this phenomenon?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chapter4: More episode duration leads to a decrease in policy gradient method! #24

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Chapter4: More episode duration leads to a decrease in policy gradient method! #24

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions