Skip to content

Conversation

@zhangnju
Copy link

Motivation

when trying GEMM sample on R9700, if I change A_rows,A_cols,B_cols from default value to be 4096, validation will fail

Technical Details

  1. set the input matrix value to be random, which is the same with other GEMM application
  2. develop a GEMM CPU validation function to test the GPU output , in order to speed up GEMM on CPU, I chose the block GEMM on CPU

Test Plan

  1. run the default value: the validation test can pass:
    ./hip_matrix_multiplication
    Matrix multiplication: [2048x1024] * [1024x1024], block size: 16x16
    Validation passed.
  2. input size to be 4096, it still can pass
    ./hip_matrix_multiplication --A_rows 4096 --A_cols 4096 --B_cols 4096
    Matrix multiplication: [4096x4096] * [4096x4096], block size: 16x16
    Validation passed.
  3. run other size, and it still can pass
    ./hip_matrix_multiplication --A_rows 4096 --A_cols 512 --B_cols 2048
    Matrix multiplication: [4096x512] * [512x2048], block size: 16x16
    Validation passed.

Test Result

all the test can pass

Added/Updated documentation?

  • Yes
  • [* ] No, does not apply to this PR.

Included Visual Studio files?

  • Yes
  • [ *] No, does not apply to this PR.

Submission Checklist

  • New examples contain proper CMake and Make files.
  • I have updated relevant CI workflows and the new examples are being built and tested in CI.
  • Check and guard against unsupported ASICs if relevant dependent libraries have limited support.
  • [ *] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

@zhangnju zhangnju requested a review from a team as a code owner December 16, 2025 15:06
@zichguan-amd
Copy link
Collaborator

cc @neon60 @j-stephan @adeljo-amd I wonder if this example overlaps with matrix multiplication from #375? If they are similar enough, we should probably just keep one.

@zhangnju
Copy link
Author

I think these two examples have different kernels ,which should not be redundant

Copy link
Collaborator

@zichguan-amd zichguan-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with the change, the CPU/GPU error should be in the same range, we can definitely use double for more precision. I'll let others weigh in.

constexpr float b_value = 0.02F;
std::fill(B.begin(), B.end(), b_value);
// Set matrix elements to random value on the host.
for (size_t i = 0; i < A.size(); ++i) A[i] = static_cast<float>(rand() / RAND_MAX );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be static_cast<double>(rand()) / RAND_MAX, static_cast<float>(rand() / RAND_MAX ) would result in 0 most of the time (integer division)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reminding. we can change it to be "static_cast(rand() / (RAND_MAX+1.0f) );", and it will generate [0,1) random float.

std::fill(B.begin(), B.end(), b_value);
// Set matrix elements to random value on the host.
for (size_t i = 0; i < A.size(); ++i) A[i] = static_cast<float>(rand() / RAND_MAX );
for (size_t i = 0; i < B.size(); ++i) B[i] = static_cast<float>(rand() / RAND_MAX );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated it to be "static_cast(rand() / (RAND_MAX+1.0f) )", and verified that it can generate [0,1) random float value

#include <cstdlib>
#include <cassert>
#include <cstddef>
#include <memory>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed them in the new commit

@zichguan-amd zichguan-amd requested a review from neon60 December 22, 2025 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants