Fusion rule for handling transformers exported models #2632

justinchuby · 2025-10-14T23:01:42Z

When torch.onnx exports a model from transformers with SDPA, it generates a Concat
node to concatenate past_key/value with the new key/value to produce the graph output
for kv cache. This pattern can be fused into the Attention node, which has present_key
and present_value outputs. It is necessary for ONNX Runtime because it requires the outputs
to be produced by the Attention node when past_key and past_value inputs are provided.

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

codecov · 2025-10-14T23:12:26Z

Codecov Report

❌ Patch coverage is 73.68421% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.39%. Comparing base (811937c) to head (c408516).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
...ipt/rewriter/rules/fusion/_attention_present_kv.py	76.47%	4 Missing ⚠️
onnxscript/rewriter/onnx_fusions/_onnx_fusions.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2632   +/-   ##
=======================================
  Coverage   70.38%   70.39%           
=======================================
  Files         222      223    +1     
  Lines       26288    26309   +21     
  Branches     2629     2629           
=======================================
+ Hits        18503    18519   +16     
- Misses       6865     6870    +5     
  Partials      920      920

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Output present key value from the Attention op because past key value is provided. Previously the Attention op created would consume past key/value but not produce present key/value, which is not correct for ORT. <img width="1377" height="1225" alt="image" src="https://github.com/user-attachments/assets/118958b4-bc27-4912-b70b-000549887c0f" /> Replaces #2632 Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

justinchuby · 2025-10-16T02:42:42Z

This is still useful when enable_gqa=True

justinchuby added 4 commits October 14, 2025 13:42

WIP

86b1687

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

Create fusion rule for attention kv

96ea92b

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

docs

d7d2f5d

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

Add to default fusion

c408516

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

github-project-automation bot added this to ONNX Script Review Board Oct 14, 2025

github-project-automation bot moved this to Todo in ONNX Script Review Board Oct 14, 2025

justinchuby requested review from Copilot and gramalingam and removed request for Copilot October 14, 2025 23:01

justinchuby added this to the 0.5.4 milestone Oct 14, 2025

justinchuby added the module: rewriter label Oct 14, 2025

justinchuby closed this Oct 15, 2025

github-project-automation bot moved this from Todo to Done in ONNX Script Review Board Oct 15, 2025

justinchuby mentioned this pull request Oct 15, 2025

Fix GQA fusion to produce present key/value #2634

Merged

justinchuby reopened this Oct 16, 2025

justinchuby modified the milestones: 0.5.4, 0.5.5 Oct 16, 2025

justinchuby marked this pull request as draft October 17, 2025 20:10

justinchuby removed this from the 0.5.5 milestone Oct 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fusion rule for handling transformers exported models #2632

Fusion rule for handling transformers exported models #2632

justinchuby commented Oct 14, 2025

Uh oh!

codecov bot commented Oct 14, 2025 •

edited

Loading

Uh oh!

justinchuby commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fusion rule for handling transformers exported models #2632

Are you sure you want to change the base?

Fusion rule for handling transformers exported models #2632

Conversation

justinchuby commented Oct 14, 2025

Uh oh!

codecov bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

justinchuby commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Oct 14, 2025 •

edited

Loading