Users' network setup might not be a full network mesh. Thus, we should allow users to connect in the following way:
Node: A, B, C, D
A<->B with RDMA
C<->D with RDMA
thus A and C should be in the same group(either trainer or inference).
We should allow something such as:
- Trainer
- IP0
- IP1
- Inference
- IP2
- IP3
to avoid our auto placement