I'm learning about machine learning algorithms by implementing them in Java. Included in the project are some tests against simulations using the algorithms implemented.
- NEAT Algorithm
- AlphaZero
- Inference implementation
- Gradient descent learning implementation
- Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
- A Simple Alpha(Go) Zero Tutorial
- Gradient Descent Neural Network
- Q-Learning
- Deep Q-Learning
- XOR test 👍
| experiment #1 | experiment #2 |
|---|---|
| neat algorithm: | |
| population size: | |
| 150 | |
| input topology: | |
| 1 for X | |
| 1 for Y | |
| output topology: | |
| 1 sigmoid | 2 sigmoid |
| bias topology: | |
| 1 with bias of 1 | |
| initial hidden layer topology: | |
| 0 layers | |
| sample results: | |
iteration: 1 generation: 37 species: 45 hidden nodes: 1 expressed connections: 6 total connections: 8 maximum fitness: 3.403556 |
iteration: 1 generation: 4 species: 1 hidden nodes: 0 expressed connections: 6 total connections: 6 maximum fitness: 3.578723 |
| experiment #1 | experiment #2 |
|---|---|
| neat algorithm: | |
| population size: | |
| 150 | |
| input topology: | |
| 1 for cart position | |
| 1 for cart velocity | |
| 1 for pole angle | |
| 1 for pole velocity at tip | |
| output topology: | |
| 1 sigmoid | 2 sigmoid |
| bias topology: | |
| 1 with bias of 1 | |
| initial hidden layer topology: | |
| 0 layers | |
| sample results: | |
iteration: 1 generation: 11 species: 28 hidden nodes: 1 expressed connections: 6 total connections: 6 maximum fitness: 60.009998 |
iteration: 1 generation: 3 species: 1 hidden nodes: 0 expressed connections: 10 total connections: 10 maximum fitness: 60.009998 |
| experiment #1 | experiment #2 | experiment #3 | experiment #4 |
|---|---|---|---|
| neat algorithm: | |||
| population size: | |||
| 150 | |||
| input topology: | |||
| 1 for player 1 | |||
| 1 for player 2 | |||
| output topology: | |||
| 1 tanh (value network) | 2 tanh (value network) | ||
| 9 sigmoid (policy network) | 18 sigmoid (policy network) | ||
| bias topology: | |||
| 0 | |||
| initial hidden layer topology: | |||
| 0 layers | 2 layers of 5, 5 | ||
| classic monte carlo tree search duels: | |||
| training: 12 matches (6 as X player and 6 as O player) | |||
| acceptance: 55% win rate vs 30 cached classic monte carlo simulations | |||
| alpha zero: | |||
| maximum expansions: 15 | |||
| value reversed on player 2: | |||
| state heuristic as value network disabled | |||
| policy reversed on player 2 | |||
| dirichlet noise on root node disabled | dirichlet noise on root node enabled | ||
| shape: 0.03, epsilon: 0.25 | |||
| cpuct set to 1 | |||
| back propagation set to BackPropagationType.REVERSED_ON_OPPONENT | |||
| temperature threshold: 3rd depth | |||
| sample results: | |||
iteration: 1 generation: 195 species: 69 hidden nodes: 1 expressed connections: 20 total connections: 23 maximum fitness: 2.208129 |
iteration: 1 generation: 138 species: 74 hidden nodes: 1 expressed connections: 41 total connections: 42 maximum fitness: 1.960799 |
iteration: 1 generation: 130 species: 77 hidden nodes: 3 expressed connections: 42 total connections: 49 maximum fitness: 1.958826 |
iteration: 1 generation: 61 species: 89 hidden nodes: 12 expressed connections: 138 total connections: 141 maximum fitness: 2.234746 |
| experiment #1 |
|---|
|
| monte carlo tree search: |
| heuristics: |
| snake shape board: 25%, higher free tile count: 25%, monotonicity: 50% |
| maximum selections: |
| 200 |
| maximum simulation rollouts depth: |
| 8 |
