Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for context parallelism (CP) in the GatedDeltaNet module by adjusting sequence length calculations and implementing all-to-all communication between CP and head parallelism domains. The changes also include CP-aware parameter fetching for convolutions and gate calculations. Review feedback identified several critical issues, including a missing import for tensor_a2a_hp2cp, a NameError caused by a missing self. reference to A_log, and a potential configuration error in the groups parameter of the manual F.conv1d call.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements Context Parallelism (CP) support within the GatedDeltaNet module, including adjustments for sequence length, parameter slicing for depthwise convolutions, and All-to-All communication for projections. The review feedback identifies critical issues with the sequence unpacking logic used during All-to-All operations, which is noted as being both inefficient and incorrect for packed sequences. Additionally, a bug was found in the F.conv1d call where the groups parameter could be incorrectly set to None, and it was suggested to remove the _unpack_sequence helper function to simplify the implementation.
No description provided.