[Blackwell] add non-causal bwd/FA with TMA and atomic_add #603

manman-ren · 2025-10-30T18:12:49Z

Summary: Copied from Hongtao's TLX implementation in third_party/tlx/tutorials/blackwell-fa-ws-pipelined-persistent_test.py

Test Plan:
python run.py --op blackwell_attentions --seq-len 8192 --batch 4 --n-heads 32 --d-head 128 --only triton_tutorial_flash_v2_persistent_blackwell --bwd --force --metrics tflops

Reviewers:

Subscribers:

Tasks:

Tags:

meta-codesync · 2025-10-30T18:30:30Z

@manman-ren has imported this pull request. If you are a Meta employee, you can view this in D85880773.

htyu

LGTM.

njriasan

LGTM!

njriasan · 2025-10-30T18:57:15Z

tritonbench/kernels/blackwell_triton_fused_attention.py

+    for blk_idx in range(num_steps):
+        q = desc_q.load([(off_bh + curr_m).to(tl.int32), 0])
+        qT = tl.trans(q)
+        # Load m before computing qk to reduce pipeline stall.


Is this still relevant/required with WS?

depending on whether the 1D load of m is in the load partition or not. We can load m first, then wait for qk.

xuzhao9 · 2025-10-30T20:24:31Z

Can you help run ufmt format . to fix the linting error?

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

manman-ren temporarily deployed to docker-s3-upload October 30, 2025 18:12 — with GitHub Actions Inactive

meta-cla bot added the cla signed label Oct 30, 2025

manman-ren requested review from htyu, njriasan and xuzhao9 October 30, 2025 18:13

manman-ren changed the title ~~[Blackwell] add non-causal bwd/FA~~ [Blackwell] add non-causal bwd/FA with TMA and atomic_add Oct 30, 2025

htyu approved these changes Oct 30, 2025

View reviewed changes

njriasan approved these changes Oct 30, 2025

View reviewed changes

xuzhao9 approved these changes Oct 30, 2025

View reviewed changes

add non-causal bwd/FA

6974d4d

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

manman-ren force-pushed the fa-bwd-b200 branch from 2bb886a to 6974d4d Compare November 7, 2025 16:53

manman-ren had a problem deploying to docker-s3-upload November 7, 2025 16:54 — with GitHub Actions Failure

manman-ren temporarily deployed to docker-s3-upload November 7, 2025 16:54 — with GitHub Actions Inactive

ufmt

1dee230

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

manman-ren had a problem deploying to docker-s3-upload November 7, 2025 18:34 — with GitHub Actions Failure

manman-ren temporarily deployed to docker-s3-upload November 7, 2025 18:34 — with GitHub Actions Inactive

manman-ren merged commit f9168e4 into meta-pytorch:main Nov 7, 2025
8 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Blackwell] add non-causal bwd/FA with TMA and atomic_add #603

[Blackwell] add non-causal bwd/FA with TMA and atomic_add #603

Uh oh!

manman-ren commented Oct 30, 2025 •

edited

Loading

Uh oh!

meta-codesync bot commented Oct 30, 2025

Uh oh!

htyu left a comment

Uh oh!

njriasan left a comment

Uh oh!

njriasan Oct 30, 2025

Uh oh!

manman-ren Nov 7, 2025

Uh oh!

xuzhao9 commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Blackwell] add non-causal bwd/FA with TMA and atomic_add #603

[Blackwell] add non-causal bwd/FA with TMA and atomic_add #603

Uh oh!

Conversation

manman-ren commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync bot commented Oct 30, 2025

Uh oh!

htyu left a comment

Choose a reason for hiding this comment

Uh oh!

njriasan left a comment

Choose a reason for hiding this comment

Uh oh!

njriasan Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

manman-ren Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

xuzhao9 commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

manman-ren commented Oct 30, 2025 •

edited

Loading