[common] Remove kvpacked and qkvpacked attention functions for every kernel type. #2287

pggPL · 2025-10-20T11:18:27Z

Description

There are 3 variants of fused_attention functions: for separate QKV, KV packed and QKV packed, which differ only by pointers to qkv. This results in code duplication for each type of the fused attention kernel: arbitrary seqlen, max 512 and fp8. This PR deduplicates the code and moves pointer computation one abstraction layer - from the functions like fused_attn_max_512_fwd_qkvpacked into the functions like nvte_fused_attn_fwd_qkvpacked in common c++ api.

These packed versions of common attention api functions are used by JAX, so I think running JAX CI is good test of that changes. PyTorch uses only non-packed function.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

pggPL · 2025-10-21T10:55:09Z

/te-ci jax

greptile-apps

_{7 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

cyanguwa · 2025-10-22T00:21:46Z

I think this is similar to #2272 :) Yes, Jax needs a bit of fixing in order to get its attention working.

pggPL and others added 5 commits October 20, 2025 10:55

code drop

10967bb

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

9085883

for more information, see https://pre-commit.ci

fix

6b6e78a

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

6f9b466

for more information, see https://pre-commit.ci

Merge branch 'main' into remove_multiple_qkvpacked_functions

519ef78

pggPL marked this pull request as ready for review October 21, 2025 10:54

greptile-apps bot reviewed Oct 21, 2025

View reviewed changes

pggPL requested a review from cyanguwa November 3, 2025 17:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[common] Remove kvpacked and qkvpacked attention functions for every kernel type. #2287

[common] Remove kvpacked and qkvpacked attention functions for every kernel type. #2287

Uh oh!

pggPL commented Oct 20, 2025 •

edited

Loading

Uh oh!

pggPL commented Oct 21, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

cyanguwa commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[common] Remove kvpacked and qkvpacked attention functions for every kernel type. #2287

Are you sure you want to change the base?

[common] Remove kvpacked and qkvpacked attention functions for every kernel type. #2287

Uh oh!

Conversation

pggPL commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

pggPL commented Oct 21, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

cyanguwa commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pggPL commented Oct 20, 2025 •

edited

Loading