Skip to content

Chapter4: More episode duration leads to a decrease in policy gradient method! #24

@DemonsHunter

Description

@DemonsHunter

Accoring to what authors say in chapter 4, more episode duration will allow the model to hold the game longer.

Then I download the code of chapter 4, run it locally with MAX_EPISODES = 250.

Surprisingly, this makes the model be bad at the task, only 22 times exceed 180s while the original model can make it by 90 times.

And I also reset the model, try with higher MAX_EPISODES, but all of them fail to beat the beginning set.

What may contribute to this phenomenon?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions