This project implements the classic 2048 puzzle game with a Deep Q-Network (DQN) AI agent that learns to play the game efficiently. The AI uses reinforcement learning to develop strategies for achieving high scores by making optimal moves based on the game state.
- Classic 2048 game implementation with Pygame
- Deep Q-Learning agent that improves with training
- GPU acceleration for faster training
- Visualization of training progress
- Ability to play the game manually or watch the AI play
- Configurable training parameters
- Model saving/loading for continued training or evaluation
- Python 3.10+
- PyTorch
- NumPy
- Pygame
- Matplotlib
- tqdm
Deep Q-Network is a reinforcement learning algorithm that combines Q-Learning with deep neural networks. It enables an agent to learn optimal strategies in complex environments by approximating the Q-value function, which represents the expected future rewards for taking actions in different states.
Our DQN uses a simple yet effective fully connected neural network:
- Input layer: 16 neurons (one for each cell in the 4x4 grid)
- Hidden layers: 2 layers with 256 neurons each, using ReLU activation
- Output layer: 4 neurons (representing up, down, left, right actions)
The agent stores experiences (state, action, reward, next_state, done) in a replay buffer and randomly samples from this buffer during training. This breaks the correlation between consecutive training samples and improves learning stability.
We use a separate target network that is periodically updated from the main network. This stabilizes training by providing consistent targets for Q-value updates.
Our reward function includes:
- Score increases from merging tiles
- Bonuses for creating higher value tiles
- Penalties for invalid moves
The agent uses an epsilon-greedy policy:
- Initially explores randomly (high epsilon)
- Gradually shifts toward exploitation of learned values (decaying epsilon)
- Eventually settles on a minimal exploration rate (
epsilon_min)
- The agent observes the current board state
- It selects an action (move direction) based on its policy
- The game executes the action, providing a new state and reward
- The agent stores this experience and learns by updating its Q-values
- Over many episodes, the agent improves its strategy
agent.py: Implements theDQNAgentclass with neural network models, experience replay, and training logicboard.py: Handles the 2048 board mechanics (moves, merging, game state)train.py: Manages the training process, including rewards, episode tracking, and model savingplay_ai.py: Provides an interface to watch the trained AI play
The DQN agent typically shows significant improvement over time:
- Early episodes (~100): Random moves, rarely reaches tiles above 64
- Mid training (~1000 episodes): Develops basic strategies, regularly reaches 512 or 1024
- Well-trained (~5000+ episodes): Consistently achieves 2048 and beyond
Training visualizations are saved as training_history.png, showing score improvement and maximum tile values over time.
The implementation supports GPU acceleration via PyTorch. When a CUDA-capable GPU is available, the training process automatically uses it for faster computation. Mixed-precision training is also enabled for compatible GPUs, further improving performance.
- More episodes generally lead to better performance
- Slower epsilon decay (e.g.,
0.998) for more thorough exploration - Larger replay buffer for more diverse experiences
- Larger batch size for more stable updates
- Consider modifying the reward function to encourage keeping large values in corners
- Add rewards for maintaining empty tiles
- Penalize board configurations that limit future moves
This project combines elements of reinforcement learning and game AI development. The 2048 game mechanics are based on the original game by Gabriele Cirulli.