For example in single A100 machine. Llama2 13B training speed with TP2 DP 4 + Zero1 is more faster than FSDP.