[ET-VK][ez] Use tree reduction in q8ta_linear_gemv shader by SS-JIA · Pull Request #17792 · pytorch/executorch

SS-JIA · 2026-03-02T21:03:33Z

Stack from ghstack (oldest at bottom):

Replace the serial O(WGS) reduction loop with a tree reduction pattern
(O(log2(WGS))). Previously, only thread 0 summed all 64 partial
accumulators sequentially. Now all threads participate in a classic
halving reduction, matching the pattern already used in
linear_q4gsw_coop.glsl.

Authored by Claude.

Differential Revision: D94949137

Replace the serial O(WGS) reduction loop with a tree reduction pattern (O(log2(WGS))). Previously, only thread 0 summed all 64 partial accumulators sequentially. Now all threads participate in a classic halving reduction, matching the pattern already used in linear_q4gsw_coop.glsl. Authored by Claude. Differential Revision: [D94949137](https://our.internmc.facebook.com/intern/diff/D94949137/) [ghstack-poisoned]

pytorch-bot · 2026-03-02T21:03:38Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17792

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 15 New Failures, 1 Cancelled Job, 2 Unrelated Failures

As of commit a1e2aa2 with merge base ae41854 ():

NEW FAILURES - The following jobs have failed:

Build Presets / linux (linux, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc11-aarch64) / build (gh)
Build Presets / linux (llm, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc11-aarch64) / build (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (CBAA:3B993D:107ECC:13F717:69A5FB2F)
Build Presets / linux (pybind, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc11-aarch64) / build (gh)
pull / test-coreml-bc-macos (macos-m2-stable) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (D93A:2DE155:10D0B1:1447A8:69A5FB36)
pull / test-models-linux-basic (vit, xnnpack-quantization-delegation, cmake, linux.arm64.2xlarge, execut... / linux-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (EB86:3266E:108384:13FC02:69A5FB30)
Test Metal Backend / export-model-metal-artifact (mistralai, Voxtral-Mini-3B-2507, non-quantized) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (D704:269D57:10F95B:1471C4:69A5FB32)
Test Metal Backend / export-model-metal-artifact (mistralai, Voxtral-Mini-3B-2507, quantized-int4-metal) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (D6BF:3CC34C:1084D7:13FDBF:69A5FB32)
Test Metal Backend / export-model-metal-artifact (mistralai, Voxtral-Mini-4B-Realtime-2602, quantized-int4-metal) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (D6F7:2E3025:101143:1389E8:69A5FB2E)
Test Metal Backend / export-model-metal-artifact (nvidia, parakeet-tdt, quantized-int4-metal) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (D923:193E62:FB592:132EF9:69A5FB2E)
Test Metal Backend / export-model-metal-artifact (openai, whisper-large-v3-turbo, non-quantized) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (D92F:3CC34C:1084D9:13FDC2:69A5FB32)
Test Metal Backend / export-model-metal-artifact (openai, whisper-large-v3-turbo, quantized-int4-metal) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (EEB2:381148:1002C2:135C71:69A5FB2E)
Test Metal Backend / export-model-metal-artifact (openai, whisper-small, non-quantized) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (D6B4:1856C:10A671:141D5D:69A5FB2E)
Test Metal Backend / export-model-metal-artifact (openai, whisper-small, quantized-int4-metal) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (EEBE:2CA45F:1157F5:14D08A:69A5FB32)
Test Metal Backend / test-executorch-metal-build / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (D4A5:1A09F5:FE5D0:13601F:69A5FB32)
Test Metal Backend / test-metal-backend-modules / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (D499:3CADF7:10F800:147022:69A5FB2E)

CANCELLED JOB - The following job was cancelled. Please retry:

Check Labels (gh)

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Test CUDA Windows Export and E2E / test-model-cuda-windows-e2e (mistralai, Voxtral-Mini-3B-2507, non-quantized) / windows-job (gh) (trunk failure)
Process completed with exit code 1.
Test CUDA Windows Export and E2E / test-model-cuda-windows-e2e (mistralai, Voxtral-Mini-3B-2507, quantized-int4-weight-only) / windows-job (gh) (trunk failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-03-02T21:04:41Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Replace the serial O(WGS) reduction loop with a tree reduction pattern (O(log2(WGS))). Previously, only thread 0 summed all 64 partial accumulators sequentially. Now all threads participate in a classic halving reduction, matching the pattern already used in linear_q4gsw_coop.glsl. Authored by Claude. Differential Revision: [D94949137](https://our.internmc.facebook.com/intern/diff/D94949137/) ghstack-source-id: 346524552 Pull Request resolved: #17792

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 2, 2026

This was referenced Mar 2, 2026

[ET-VK][qconv] Enable im2col to handle grouped convolution #17793

Merged

[ET-VK][qconv] Add dynamic PACKED_INT8_CONV2D memory layout for device-adaptive conv2d #17794

Merged

[ET-VK][testing] Add GPU device name override for on-device model tests #17795

Merged

meta-codesync bot added fb-exported meta-exported labels Mar 2, 2026

manuelcandales approved these changes Mar 2, 2026

View reviewed changes

meta-codesync bot merged commit ad8ff12 into gh/SS-JIA/453/base Mar 3, 2026
189 of 216 checks passed

meta-codesync bot deleted the gh/SS-JIA/453/head branch March 3, 2026 08:28

meta-codesync bot temporarily deployed to cherry-pick-bot March 3, 2026 08:28 Inactive

pytorchbot mentioned this pull request Mar 3, 2026

[ET-VK][ez] Use tree reduction in q8ta_linear_gemv shader #17808

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK][ez] Use tree reduction in q8ta_linear_gemv shader#17792

[ET-VK][ez] Use tree reduction in q8ta_linear_gemv shader#17792
meta-codesync[bot] merged 1 commit intogh/SS-JIA/453/basefrom
gh/SS-JIA/453/head

SS-JIA commented Mar 2, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SS-JIA commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17792

❌ 15 New Failures, 1 Cancelled Job, 2 Unrelated Failures

Uh oh!

github-actions bot commented Mar 2, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SS-JIA commented Mar 2, 2026 •

edited

Loading

pytorch-bot bot commented Mar 2, 2026 •

edited

Loading

This PR needs a `release notes:` label