Skip to content

[megatron][perf] Integrate megatron dynamic context parallelism #1019

@erictang000

Description

@erictang000

Allow for dynamically increasing context parallelism based on the largest sequence in a batch to avoid over provisioning CP for the max context length. Feature is currently in development on a dev branch in Megatron-LM: NVIDIA/Megatron-LM#2000, we should integrate it when ready.

See blog for more details: https://developer.nvidia.com/blog/speeding-up-variable-length-training-with-dynamic-context-parallelism-and-nvidia-megatron-core/

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions