This directory contains scripts to run reinforcement learning experiments in parallel on SLURM.
1. Create virtual environment and install packages:
cd rl_final_proj
./setup_venv.shThis creates a venv/ directory with all required packages using uv.
Before submitting all jobs, test with one experiment:
Activate the virtual environment and run:
cd rl_final_proj
source venv/bin/activate
python train_rl.py --task "Hopper-v5" --algorithm "SAC" --seed 0This will:
- Train one agent (Hopper-v5 with SAC, seed 0)
- Save results to
rl_experiments/runs/andrl_experiments/models/ - Append to
rl_experiments/final_eval_returns.csv
For a quick test with fewer timesteps, you can temporarily modify config.yaml:
timesteps_per_task:
"Hopper-v5": 10000 # Reduced from 1000000 for quick test-
Configure experiments (optional): Edit
config.yamlto customize tasks, algorithms, seeds, and other settings. You can also adjustslurm.max_concurrent_jobsto control parallelism. -
Generate the job list:
python generate_joblist.py
This creates
joblist.txtwith all combinations of tasks, algorithms, and seeds fromconfig.yaml. -
Submit the SLURM job array (EASY WAY):
chmod +x prepare_submit.sh ./prepare_submit.sh
This automatically calculates the array size and submits with the correct parallelism settings.
OR manually:
# Check number of jobs wc -l joblist.txt # Edit submit_rl.sbatch to update --array=1-N%M where N=total jobs, M=concurrent sbatch submit_rl.sbatch
YES, this runs in parallel! The SLURM job array (--array=1-N%M) means:
- N = total number of jobs (one per task/algorithm/seed combination)
- M = maximum concurrent jobs (default: 20)
- Jobs run independently and in parallel
For example, with 100 jobs and %20:
- 20 jobs start immediately
- As jobs finish, new ones start automatically
- All 100 jobs will complete (just 20 at a time)
Adjust max_concurrent_jobs in config.yaml to change parallelism.
All experiment settings are in config.yaml:
- Tasks: List of environments to run
- Algorithms: List of RL algorithms
- Seeds: List of random seeds
- Timesteps: Per-task training timesteps
- Directories: Output paths
- Algorithm settings: Algorithm-specific parameters
Default configuration:
- Tasks: Hopper-v5, Walker2d-v5, HalfCheetah-v5, Ant-v5, Humanoid-v5
- Algorithms: SAC, TD3, DDPG, PPO
- Seeds: 0-4 (5 seeds)
- Total jobs: 5 tasks × 4 algorithms × 5 seeds = 100 jobs
All results are saved in rl_experiments/:
runs/: Individual run JSON files ({task}_{algo}_seed{seed}.json)models/: Trained model files ({task}_{algo}_seed{seed}.zip)final_eval_returns.csv: Final evaluation resultslearning_curves.csv: Learning curves during training
- Time limit: 48 hours per job
- Memory: 8GB per job
- Concurrent jobs: 20 (adjust
%20in submit_rl.sbatch if needed) - Logs:
logs/rl_{JOB_ID}_{ARRAY_ID}.outand.err
- Jobs will skip if results already exist (cached runs)
- Each job runs one (task, algorithm, seed) combination
- Results are automatically appended to CSV files
- Models are saved after training completes