Skip to content

Conversation

@arun-thmn
Copy link
Contributor

@arun-thmn arun-thmn commented Jan 6, 2026

This patch shuffles the output of a bf16 type non-vnni packed vector.contract operation (flat layout). The output of the contraction operation is shuffle to match the flat layout, before get stored in the acc matrix.

Following this transform schedule, the vector.contract will be lowered to one of the following operations:

  • x86vector::DotBF16Op with B matrix shuffled to compensate the flat layout, or
  • vector.fma with loads + broadcast using bf16 packed operations (supported as part of this PR).

@github-actions
Copy link

github-actions bot commented Jan 6, 2026

✅ With the latest revision this PR passed the C/C++ code formatter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant