Built an AI agent using reinforcement learning to autonomously improve at Pong, with a focus on agile prompt engineering.
- AI Agent: The AI controls one paddle in the game and learns how to return the ball effectively through trial and error.
- Reinforcement Learning: The agent is trained by observing its performance, where successful returns are rewarded, and missed returns are penalized. This helps the agent improve its decision-making over time.
- Neural Network Architecture: The network consists of two hidden layers, with 200 neurons in the first layer and 100 neurons in the second layer. The input is a preprocessed version of the game screen, which is fed into the neural network to predict optimal actions.
- Adam Optimizer: The Adam optimizer is used to update the weights of the neural network. This allows the AI agent to adjust its learning rate dynamically, helping it converge faster.
- Exploration and Exploitation: The model starts with an epsilon-greedy approach, where it initially explores random actions, then gradually shifts to exploiting its learned behavior as epsilon decays.
- Performance Tracking: The AI’s performance is tracked by plotting the episode rewards and running mean, allowing us to visualize the learning progress.
- Preprocessing: The game screen is preprocessed to reduce complexity by cropping, downsampling, and converting it to a binary format (paddles and ball vs background).
- Forward Pass: The preprocessed screen is fed into the neural network, which computes the probability of taking certain actions (move the paddle up or down).
- Backward Pass: After each game, the AI agent learns from its mistakes by updating the neural network weights using policy gradients. Gradients are calculated for all layers using the leaky ReLU activation function.
- Discounted Rewards: The agent calculates discounted rewards, prioritizing immediate rewards over future rewards. This helps focus on short-term gains and improves gameplay faster.
- Checkpointing: The progress, including model weights and training data, is saved periodically, allowing training to resume from the last saved checkpoint.
- Adam_Pong.py: The main script for training the neural network using reinforcement learning and Adam optimizer. This script contains the full implementation of the neural network, preprocessing, and training loop.
- rewards_and_running_mean.png: A visual representation of the agent’s learning progress, where the running mean of rewards is plotted against the episodes.
- training_progressV2.p: A pickle file storing the model's progress, weights, and rewards, allowing training to be resumed or evaluated later.
- LOGO.png: The project logo representing the blend of neural networks and the Pong game.
- Neural Network: Built with 2 hidden layers, it processes the input (game screen) and predicts the optimal action for the AI agent.
- Reinforcement Learning: The agent uses a reward-based system to learn. Successful actions (paddle returns the ball) are rewarded, and failed actions (paddle misses) are penalized.
- Optimization: The Adam optimizer adjusts the weights of the neural network to improve the AI agent's performance efficiently.
- Multi-Agent Gameplay: Expanding the game to feature two AI agents playing against each other using competitive reinforcement learning.
- Improved Environment Interaction: Exploring more complex environments and introducing different variations of the Pong game to improve AI versatility.

