Journey to Mastering Reinforcement Learning

Trained SAC+HER agent performing pick and place task in FetchPickAndPlace-v4 environment

Overview

Goal: Intensive 3-day journey from foundational Reinforcement Learning (RL) concepts to successfully training and evaluating a robotic manipulation agent for pick and place task.

Key Achievements

Learnt about the fundamentals of RL concepts (learning notes here).
Trained a SAC+HER (Soft Actor-Critic with Hindsight Experience Replay) agent using Stable Baselines3 and Gymnasium.
Achieved a 100% success rate in the FetchPickAndPlace-v4 environment.

Technical Implementation

Framework: Stable Baselines3 + Gymnasium
Algorithm: SAC (Soft Actor-Critic) with HER (Hindsight Experience Replay)
Environment: FetchPickAndPlace-v4 from Gymnasium-Robotics (powered by MuJoCo)
Python Version: 3.12.3
Hardware: macOS Sequoia 15.2 (CPU training on M1 Pro)

Quick Start Guide

Setup

# Clone repository
git clone https://github.com/weijieyong/Mastering-RL.git
cd Mastering-RL

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Note

Gymnasium-Robotics is installed from source due to initial state issues

Training

python train.py # default config (SAC, FetchPickAndPlace-v4)
# --- or --- 
python train.py --model SAC --env FetchPickAndPlace-v4  #specify algorithm and environment

Important

Before training, ensure you have created a hyperparameter configuration file in the hyperparams/ folder for your specific algorithm and environment combination (e.g., hyperparams/SAC_FetchPickAndPlace-v4.yaml).

Evaluation

python eval.py  # Automatically uses latest model

View training metrics

tensorboard --logdir logs/FetchPickAndPlace-v4/tensorboard

Future Roadmap

Algorithm Exploration
- Implement and compare other RL algorithms (TD3, PPO)
- Experiment with custom reward shaping for faster convergence
- Explore multi-task learning capabilities
Environment Complexity
- Create custom environments for more challenging manipulation tasks
- Add obstacles and constraints to the environment
- Implement real-world domain randomization
Performance Optimization
- Leverage GPU acceleration for faster training
- Implement parallel environment sampling
- Optimize hyperparameters using automated tuning e.g. optuna
Real-World Application
- Bridge sim-to-real gap with domain adaptation
- Test on physical robotic hardware
- Develop practical applications in industrial settings

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Gymnasium-Robotics @ 40d4212		Gymnasium-Robotics @ 40d4212
docs		docs
hyperparams		hyperparams
logs/FetchPickAndPlace-v4		logs/FetchPickAndPlace-v4
videos		videos
.gitignore		.gitignore
.gitmodules		.gitmodules
FetchPickAndPlace-v4.zip		FetchPickAndPlace-v4.zip
README.md		README.md
eval.py		eval.py
learning_notes.md		learning_notes.md
requirements.in		requirements.in
requirements.txt		requirements.txt
train.py		train.py
video_recorder.py		video_recorder.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Journey to Mastering Reinforcement Learning

Overview

Key Achievements

Technical Implementation

Quick Start Guide

Setup

Training

Evaluation

View training metrics

Future Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

Uh oh!

weijieyong/mastering-rl

Folders and files

Latest commit

History

Repository files navigation

Journey to Mastering Reinforcement Learning

Overview

Key Achievements

Technical Implementation

Quick Start Guide

Setup

Training

Evaluation

View training metrics

Future Roadmap

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages