[ET-VK][qconv] Add dynamic PACKED_INT8_CONV2D memory layout for device-adaptive conv2d by SS-JIA · Pull Request #17794 · pytorch/executorch

SS-JIA · 2026-03-02T21:03:43Z

Stack from ghstack (oldest at bottom):

Performance testing of quantized int8 convolutions reveals that different
algorithms perform better on different GPU architectures: im2col is faster on
Mali while direct convolution is faster on Adreno. The optimal memory layout
differs per algorithm (4C for im2col, 4C1W for direct convolution).

This introduces a new "dynamic" memory layout PACKED_INT8_CONV2D that is
serialized at export time and resolved to a concrete layout at runtime based
on the device's GPU architecture. The resolution logic in ResolveLayouts.cpp
mirrors the im2col vs direct convolution decision in Q8taConv2d.cpp.

Differential Revision: D94949134

…e-adaptive conv2d Performance testing of quantized int8 convolutions reveals that different algorithms perform better on different GPU architectures: im2col is faster on Mali while direct convolution is faster on Adreno. The optimal memory layout differs per algorithm (4C for im2col, 4C1W for direct convolution). This introduces a new "dynamic" memory layout PACKED_INT8_CONV2D that is serialized at export time and resolved to a concrete layout at runtime based on the device's GPU architecture. The resolution logic in ResolveLayouts.cpp mirrors the im2col vs direct convolution decision in Q8taConv2d.cpp. Differential Revision: [D94949134](https://our.internmc.facebook.com/intern/diff/D94949134/) [ghstack-poisoned]

pytorch-bot · 2026-03-02T21:03:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17794

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 14 New Failures, 4 Unrelated Failures

As of commit 31f1d0b with merge base ae41854 ():

NEW FAILURES - The following jobs have failed:

pull / test-coreml-bc-macos (macos-m2-stable) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (D97F:3CC34C:108B8B:1405EE:69A5FB4D)
pull / test-multimodal-linux (gemma3-4b) / linux-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/aws-actions/amazon-ecr-login/tarball/062b18b96a7aff071d4dc91bc00c4c1a7945b076' (DA48:273D84:100A33:138348:69A5FB51)
pull / unittest-editable / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_mv3_model
Test Metal Backend / export-model-metal-artifact (mistralai, Voxtral-Mini-3B-2507, non-quantized) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (EEEB:3715E:11A3C7:151BE2:69A5FB41)
Test Metal Backend / export-model-metal-artifact (mistralai, Voxtral-Mini-3B-2507, quantized-int4-metal) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (D703:350925:FA7CA:13208D:69A5FB49)
Test Metal Backend / export-model-metal-artifact (mistralai, Voxtral-Mini-4B-Realtime-2602, quantized-int4-metal) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (D6F8:364F9A:FE65B:1360E7:69A5FB45)
Test Metal Backend / export-model-metal-artifact (nvidia, parakeet-tdt, non-quantized) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (EEF6:171855:102244:139C85:69A5FB45)
Test Metal Backend / export-model-metal-artifact (nvidia, parakeet-tdt, quantized-int4-metal) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (D4DD:350925:FA70D:131FA7:69A5FB45)
Test Metal Backend / export-model-metal-artifact (openai, whisper-large-v3-turbo, non-quantized) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (D4E8:23578:10D85D:1452FA:69A5FB49)
Test Metal Backend / export-model-metal-artifact (openai, whisper-large-v3-turbo, quantized-int4-metal) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (D968:3266E:1089D9:140388:69A5FB46)
Test Metal Backend / export-model-metal-artifact (openai, whisper-small, non-quantized) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (D973:2E87DE:109BC3:1416C4:69A5FB49)
Test Metal Backend / export-model-metal-artifact (openai, whisper-small, quantized-int4-metal) / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (EF03:3CC34C:108AA6:1404D4:69A5FB49)
Test Metal Backend / test-executorch-metal-build / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (D6EC:381148:100819:1362AA:69A5FB41)
Test Metal Backend / test-metal-backend-modules / macos-job (gh)
An action could not be found at the URI 'https://api.github.com/repos/actions/checkout/tarball/11bd71901bbe5b1630ceea73d27597364c9af683' (D4D2:193E62:FB86D:1332B6:69A5FB41)

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / test-models-linux (emformer_join, portable, linux.4xlarge.memory) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
pull / test-models-linux (emformer_join, xnnpack-quantization-delegation, linux.4xlarge.memory) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Test CUDA Windows Export and E2E / test-model-cuda-windows-e2e (mistralai, Voxtral-Mini-3B-2507, non-quantized) / windows-job (gh) (trunk failure)
Process completed with exit code 1.
Test CUDA Windows Export and E2E / test-model-cuda-windows-e2e (mistralai, Voxtral-Mini-3B-2507, quantized-int4-weight-only) / windows-job (gh) (trunk failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-03-02T21:04:50Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…e-adaptive conv2d Performance testing of quantized int8 convolutions reveals that different algorithms perform better on different GPU architectures: im2col is faster on Mali while direct convolution is faster on Adreno. The optimal memory layout differs per algorithm (4C for im2col, 4C1W for direct convolution). This introduces a new "dynamic" memory layout PACKED_INT8_CONV2D that is serialized at export time and resolved to a concrete layout at runtime based on the device's GPU architecture. The resolution logic in ResolveLayouts.cpp mirrors the im2col vs direct convolution decision in Q8taConv2d.cpp. Differential Revision: [D94949134](https://our.internmc.facebook.com/intern/diff/D94949134/) ghstack-source-id: 346525918 Pull Request resolved: #17794

This was referenced Mar 2, 2026

[ET-VK][ez] Use tree reduction in q8ta_linear_gemv shader #17792

Merged

[ET-VK][qconv] Enable im2col to handle grouped convolution #17793

Merged

[ET-VK][testing] Add GPU device name override for on-device model tests #17795

Merged

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 2, 2026

manuelcandales approved these changes Mar 2, 2026

View reviewed changes

meta-codesync bot merged commit 968542d into gh/SS-JIA/455/base Mar 3, 2026
186 of 213 checks passed

meta-codesync bot deleted the gh/SS-JIA/455/head branch March 3, 2026 08:28

meta-codesync bot temporarily deployed to cherry-pick-bot March 3, 2026 08:28 Inactive

pytorchbot mentioned this pull request Mar 3, 2026

[ET-VK][qconv] Add dynamic PACKED_INT8_CONV2D memory layout for device-adaptive conv2d #17810

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK][qconv] Add dynamic PACKED_INT8_CONV2D memory layout for device-adaptive conv2d#17794

[ET-VK][qconv] Add dynamic PACKED_INT8_CONV2D memory layout for device-adaptive conv2d#17794
meta-codesync[bot] merged 1 commit intogh/SS-JIA/455/basefrom
gh/SS-JIA/455/head

SS-JIA commented Mar 2, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SS-JIA commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17794

❌ 14 New Failures, 4 Unrelated Failures

Uh oh!

github-actions bot commented Mar 2, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SS-JIA commented Mar 2, 2026 •

edited

Loading

pytorch-bot bot commented Mar 2, 2026 •

edited

Loading

This PR needs a `release notes:` label