[QUESTION] How to enable fp8 dispatch while training MOE models?

**Your question**
How can I enable FP8 dispatch for MoE model training? I've found the FP8 Linear and GroupLinear implementations, but I don't see any code covering FP8 All-to-All communication. DeepEP is capable for dispatch FP8 tensor, but it seems megatron does not pass fp8 tensor to DeepEP. 
[megatron code](https://github.com/NVIDIA/Megatron-LM/blob/9088d4fb16dce1be4029fbeac8f66a4791a6e685/megatron/core/transformer/moe/fused_a2a.py#L114)
[deepep interface code](https://github.com/deepseek-ai/DeepEP/blob/567632dd59810d77b3cc05553df953cc0f779799/deep_ep/buffer.py#L340)
deepep require pass a tuple(fp8_tensor, fp8_scale) to dispatch fp8 tensor, but megatron only pass one tensor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] How to enable fp8 dispatch while training MOE models? #3578

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QUESTION] How to enable fp8 dispatch while training MOE models? #3578

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions