-
Notifications
You must be signed in to change notification settings - Fork 9
SWDEV-558359 - [Topaz][Gigapixel][Performance] The clc 512x512 models are running with overall lower performance for MIGraphX EP than for DirectML EP #188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…pMode() function - prevent casting of int64 to int32 in IsUnsupportedOpMode() function for "Slice" operator
|
This will need to be cherry-picked into wml-main after merging |
|
What in the slice check is being unsupported? Why are you modifying code that's doing index checks? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix is in toVector call to use a larger size, its using int which may default to 32 bits on some systems. If you're seeing rollover that's likely the culprit
Either make this a larger type to avoid casting things down to int32 (since int-> unsigned) or use size_t/int64_t
std::vector<int> toVector(const ONNX_NAMESPACE::int64s& nums) {
std::vector<int> result;
int num = nums.size();
for (int i = 0; i < num; ++i) {
result.push_back(static_cast<int>(nums[i]));
}
return result;
}
Also don't remove the .at() bounds checks here. These shouldn't be removed for direct access [ ]
8dfad21
into
ROCm:rocm7.1_internal_testing
|
@Zhaeong good to cherry-pick this off the tip of rocm7.1_internal_testing. I'll upstream this to OnnxRT |
|
Upstreamed here - microsoft#26403 |
|
Merged to wml-main: |
Description
Bug Fix: large integers incorrectly converted to -1 in IsUnsupportedOpMode() function
prevent casting of int64 to int32 in IsUnsupportedOpMode() function for "Slice" operator
Motivation and Context
Currently clc models in topaz underperform DML by 52x and the root cause is due to the "Slice" node being flagged as unsupported.
This change fixes an issue where start and end attributes for the slice node were being converted to -1 instead of their real int64 value. The -1 end or start value was causing migraphx/onnxruntime to incorrectly flag this node as unsupported.
Local testing on my machine suggests inference time goes from 90 seconds to 3 seconds which is massive boost in performance and closes the gap between DML and Migraphx performance considerably.