Your question
How can I enable FP8 dispatch for MoE model training? I've found the FP8 Linear and GroupLinear implementations, but I don't see any code covering FP8 All-to-All communication. DeepEP is capable for dispatch FP8 tensor, but it seems megatron does not pass fp8 tensor to DeepEP.
megatron code
deepep interface code
deepep require pass a tuple(fp8_tensor, fp8_scale) to dispatch fp8 tensor, but megatron only pass one tensor