Allow for dynamically increasing context parallelism based on the largest sequence in a batch to avoid over provisioning CP for the max context length. Feature is currently in development on a dev branch in Megatron-LM: NVIDIA/Megatron-LM#2000, we should integrate it when ready.
See blog for more details: https://developer.nvidia.com/blog/speeding-up-variable-length-training-with-dynamic-context-parallelism-and-nvidia-megatron-core/