Skip to content

Conversation

@itzmeanjan
Copy link
Owner

Harpocrates SYCL kernels can now process input

  • ~18x faster, when running on Intel Iris Xe MAX Graphics 🔥
  • ~10% faster, when running on Nvidia Tesla V100 GPU
  • ~19% faster, when running on Intel UHD Graphics P630

because often accessed (inverse) look up table (LUT) is first explicitly cached in work-group local memory, which makes it cheaper to access for all work-items of certain work-group. Work-group leader takes up the responsibility of explicitly copying look up table ( of 256 -bytes ) to faster work-group local memory & until that operation is finished all other work-items wait for work-group leader.

Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
… Iris Max GPU

Code checked out at commit 2271291

Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
…commit 32c9f06

Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants