Maximizing GPU utilization is usually pretty hard in RL, even with parallel environments, so I'm thinking of running a parallel simulator on a separate CPU only server with max possible number of CPUs and do the actual training on a GPU node (within same placement group + all possible networking optimizations) in whatever cloud.
Could this work or is the extra network latency too much to make this feasible?