Skip to content

[adreno] Texture support in more ops#18

Open
lhez wants to merge 15 commits intooctoml:masterfrom
lhez:upstream/more-texture-ops
Open

[adreno] Texture support in more ops#18
lhez wants to merge 15 commits intooctoml:masterfrom
lhez:upstream/more-texture-ops

Conversation

@lhez
Copy link
Copy Markdown

@lhez lhez commented May 2, 2022

This PR contains changes for performance improvement. It also contains minor fixes for NDK 23 and cpp_rpc; these have already been fixed in the mainline TVM. In particular, this PR contains,

  1. A simple schedule for injective ops on Adreno GPU.
  2. Texture support for some additional ops - concat, add, relu, maximum, multiply, pad
  3. Use select in OpenCL codegen for texture access.
  4. A more general CL codegen improvement that should make the resulting code easier for the OpenCL compiler to optimize.
  5. Disable cache_read(texture).

lhez added 15 commits April 25, 2022 21:45
* This is the same as concat schedule and should be unified
* This utility function checks a CallNode outputs NCHW4c by
  1) looking at the associated Attrs node if available (e.g. for Conv2D)
  2) looking at the tensor type of CallNode if the type
    a) has 5 dimensions
    b) with the last dimension (fastest varying) being 4
* This is similar to the one in annotate_texture_storage.cc
* add, multiply, maximum, nn.pad, nn.relu
* This pass rewrites @tir.if_then_else(@tir.texture2d_load()) to
  @tir.texture2d_load(@tir.if_then_else()), i.e. originally the
  conditional is applied to choose between the value of a
  texture2d_load and 0 and after the rewrite the conditional is
  applied to the coordinates passed to texture2d_load. When OOB,
  -1 will be passed to texture2d_load().
* Note that this alone does not improve performance. But it makes
  using OpenCL intrinsic select() easier.
* When lowering function calls to @tir.texture2d_load(), lower
  @tir.if_then_else() to OpenCL intrinsic select() instead of
  the ternary op (? :).
* This is applied only when lowering @tir.texture2d_load() is
  because calling select() in more general cases require more
  verbose syntax for resolve ambiguity and texture coordinate
  is easier to deal with.
* This pass uses max_pool2d as a mechanism for converting
  between buffer and image
* Ideally layout_tansform should support cl image
@TejashShah TejashShah requested a review from csullivan May 2, 2022 17:55
enable_atomics_ = true;
}
CodeGenC::VisitExpr_(op, os);
} else if (op->op.same_as(builtin::if_then_else())) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that using SelectNode visitor is more proper way to do such changes. Please take a look on this PR: https://github.com/apache/tvm/pull/11038/files
Also, on the tvm/main we use select for all data types. Probably, we could remove lowering_texture2d_load_ variable.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advice - that would be cleaner. In that case, do we need to transform if_then_else to SelectNode? In this repo, if_then_else seems preserved until the codegen phase - that is why I am directly handling if_then_else. I haven't checked tvm/main yet, does it transform if_then_else to SelectNode?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can check, but I think that we don't need to transform if_then_else to SelectNode. It will be done automatically. You can just copy-paste code from the PR above or from the tvm/main (https://github.com/apache/tvm/blob/main/src/target/source/codegen_opencl.cc#L543-L551) and it should be enough.

Copy link
Copy Markdown
Contributor

@echuraev echuraev May 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhez, probably I wasn't right and overriding SelectNode won't be enough for your task. Please, check it.
Today I just worked on one issue and found that in case which I have fixed in tvm/main the select node was generated from tir.select. But here you have if_then_else. So, probably, you did it in the right way. Sorry for confusing.

else:
return topi.cuda.schedule_pool(outs, attrs.layout)

def is_nchw4c(outs):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, we can use this function in schedule_pool_adreno

}
} else if (auto attrs = call->attrs.as<PadAttrs>()) {
if (isNCHW4c(call)) {
supports_texture_storage = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent here and in several conditions below

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants