Skip to content

[QUESTION] How to enable fp8 dispatch while training MOE models? #3578

@new-TonyWang

Description

@new-TonyWang

Your question
How can I enable FP8 dispatch for MoE model training? I've found the FP8 Linear and GroupLinear implementations, but I don't see any code covering FP8 All-to-All communication. DeepEP is capable for dispatch FP8 tensor, but it seems megatron does not pass fp8 tensor to DeepEP.
megatron code
deepep interface code
deepep require pass a tuple(fp8_tensor, fp8_scale) to dispatch fp8 tensor, but megatron only pass one tensor

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions