-
Notifications
You must be signed in to change notification settings - Fork 14
Closed
Description
I am not sure how to prepare the input to say __builtin_amdgcn_mfma_i32_16x16x16i8.
The a and b expected to be int32. I packed the four int8 to an int32 as below
for (int i = 0; i < 4; ++i) { const int r_idx = thread_x * K + i + thread_y * 4; a |= (int32_t(src[r_idx]) << 8 * (3 - i)); }
In above, src is an array of int8's. And a is an int32.
calling the instruction, does not seem to be producing the expected results.
would any one please advise how to prepare the input data a and b in above instruction.
Or ideally add a test with in/int8 and out/int32 please.
Apologies if this is not the right place to ask for this. I was not allowed to add this in Discussion part.
Thanks,
Metadata
Metadata
Assignees
Labels
No labels