Skip to content

Conversation

@mangupta
Copy link
Contributor

@mangupta mangupta commented Mar 4, 2020

  • Fixes SWDEV-219322

@mangupta mangupta changed the base branch from master to clang_tot_upgrade March 4, 2020 05:11
@mangupta mangupta requested review from jeffdaily and scchan March 4, 2020 05:11
@jeffdaily
Copy link
Collaborator

The error message for the failing tests is curious. error while loading shared libraries: libhsa-runtime64.so.1: cannot open shared object file: No such file or directory

@jeffdaily
Copy link
Collaborator

I kicked off a unit test run on my local system to see if I could reproduce the failures.

Kalmar::HSADevice* device = static_cast<Kalmar::HSADevice*>(getDev());

wait();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comment would help. It seems calling 'wait()' is equivalent to holding qmutex and wait_no_lock() which is what is lines 4185 below after locking rocrQueuesMutex. Does this fix imply that rocrQueuesMutex should NOT be held before qmutex as it may cause deadlock? If so, shouldn't line 4185 locking of qmutex be removed?
Or, could moving acquiring of rocrQueuesMutex to just before calling removeRocrQueue() help?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait_no_lock() could potentially call EnqueueMarkerNoLock(). If the HSAQueue does not hold a rocr queue, it will end up calling createOrstealRocrQueue() and lock rocrQueuesMutex.

Moving the lock on rocrQueuesMutex until just before line 4206 might work, too.

@scchan
Copy link
Collaborator

scchan commented Mar 5, 2020

The error message for the failing tests is curious. error while loading shared libraries: libhsa-runtime64.so.1: cannot open shared object file: No such file or directory

That's a known problem that @david-salinas will address

@scchan
Copy link
Collaborator

scchan commented Mar 5, 2020

@jeffdaily do we still need an explicit wait if we clear the asyncops vector?

@jeffdaily
Copy link
Collaborator

@jeffdaily do we still need an explicit wait if we clear the asyncops vector?

Isn't the wait is necessary due to HCC_OPT_FLUSH?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants