Skip to content

Conversation

@LambertWSJ
Copy link

  • In play_rl(), skip writing to table when the agent returns -1 to avoid writing out of bounds.
  • Do the same in play_game() so negamax/MCTS/RL won't update the board with an invalid move.

- In play_rl(), skip writing to table when the agent
  returns -1 to avoid writing out of bounds.
- Do the same in play_game() so negamax/MCTS/RL won't
  update the board with an invalid move.
@visitorckw
Copy link
Collaborator

How did you find this problem?
Was it caught during runtime or purely by code review?

@visitorckw
Copy link
Collaborator

IIUC, get_action_exploit() only returns -1 when the board is full, which causes the issue you described. However, in that scenario, play_game() should stop calling play_rl() because check_win() would return 'D'. Therefore, calling play_rl() when there are no empty spaces shouldn't happen.

If this error was observed during runtime, I suspect an implementation error elsewhere is the root cause. Adding a check here would likely just hide the real problem.

@LambertWSJ
Copy link
Author

LambertWSJ commented Nov 23, 2025

When porting the RL module to kxo[1], I occasionally encountered cases where move = -1. Using dmesg, I noticed that a bit shift had exceeded 31 bits. Because of this, I added last_e to get_action_exploit so that if no best action was found, it would return the last visited position.

Strangely, when I switched back to this commit[1] and removed the max_action == -1 check, the issue disappeared.

[1] kxo - Port reinforcement learning from ttt

@LambertWSJ
Copy link
Author

LambertWSJ commented Nov 23, 2025

I found that this error occurs when the state space isn’t fully initialized at the beginning during the porting of the RL module to kxo, which can cause move to become -1.

However, ttt initializes the entire state space from the beginning, ensuring that the agent knows every possible board state and preventing the move = -1 issue.

As this issue does not occur in the current implementation, this PR can be closed.

@LambertWSJ LambertWSJ closed this Nov 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants