[megatron][perf] Integrate megatron dynamic context parallelism

Allow for dynamically increasing context parallelism based on the largest sequence in a batch to avoid over provisioning CP for the max context length. Feature is currently in development on a dev branch in Megatron-LM: https://github.com/NVIDIA/Megatron-LM/pull/2000, we should integrate it when ready.

See blog for more details: https://developer.nvidia.com/blog/speeding-up-variable-length-training-with-dynamic-context-parallelism-and-nvidia-megatron-core/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[megatron][perf] Integrate megatron dynamic context parallelism #1019

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[megatron][perf] Integrate megatron dynamic context parallelism #1019

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions