Conversation
* This is the same as concat schedule and should be unified
* This utility function checks a CallNode outputs NCHW4c by
1) looking at the associated Attrs node if available (e.g. for Conv2D)
2) looking at the tensor type of CallNode if the type
a) has 5 dimensions
b) with the last dimension (fastest varying) being 4
* This is similar to the one in annotate_texture_storage.cc
* add, multiply, maximum, nn.pad, nn.relu
* This pass rewrites @tir.if_then_else(@tir.texture2d_load()) to @tir.texture2d_load(@tir.if_then_else()), i.e. originally the conditional is applied to choose between the value of a texture2d_load and 0 and after the rewrite the conditional is applied to the coordinates passed to texture2d_load. When OOB, -1 will be passed to texture2d_load(). * Note that this alone does not improve performance. But it makes using OpenCL intrinsic select() easier.
* When lowering function calls to @tir.texture2d_load(), lower @tir.if_then_else() to OpenCL intrinsic select() instead of the ternary op (? :). * This is applied only when lowering @tir.texture2d_load() is because calling select() in more general cases require more verbose syntax for resolve ambiguity and texture coordinate is easier to deal with.
* This pass uses max_pool2d as a mechanism for converting between buffer and image * Ideally layout_tansform should support cl image
| enable_atomics_ = true; | ||
| } | ||
| CodeGenC::VisitExpr_(op, os); | ||
| } else if (op->op.same_as(builtin::if_then_else())) { |
There was a problem hiding this comment.
I think that using SelectNode visitor is more proper way to do such changes. Please take a look on this PR: https://github.com/apache/tvm/pull/11038/files
Also, on the tvm/main we use select for all data types. Probably, we could remove lowering_texture2d_load_ variable.
There was a problem hiding this comment.
Thanks for the advice - that would be cleaner. In that case, do we need to transform if_then_else to SelectNode? In this repo, if_then_else seems preserved until the codegen phase - that is why I am directly handling if_then_else. I haven't checked tvm/main yet, does it transform if_then_else to SelectNode?
There was a problem hiding this comment.
You can check, but I think that we don't need to transform if_then_else to SelectNode. It will be done automatically. You can just copy-paste code from the PR above or from the tvm/main (https://github.com/apache/tvm/blob/main/src/target/source/codegen_opencl.cc#L543-L551) and it should be enough.
There was a problem hiding this comment.
@lhez, probably I wasn't right and overriding SelectNode won't be enough for your task. Please, check it.
Today I just worked on one issue and found that in case which I have fixed in tvm/main the select node was generated from tir.select. But here you have if_then_else. So, probably, you did it in the right way. Sorry for confusing.
| else: | ||
| return topi.cuda.schedule_pool(outs, attrs.layout) | ||
|
|
||
| def is_nchw4c(outs): |
There was a problem hiding this comment.
Probably, we can use this function in schedule_pool_adreno
| } | ||
| } else if (auto attrs = call->attrs.as<PadAttrs>()) { | ||
| if (isNCHW4c(call)) { | ||
| supports_texture_storage = true; |
There was a problem hiding this comment.
nit: indent here and in several conditions below
This PR contains changes for performance improvement. It also contains minor fixes for NDK 23 and
cpp_rpc; these have already been fixed in the mainline TVM. In particular, this PR contains,concat,add,relu,maximum,multiply,padselectin OpenCL codegen for texture access.cache_read(texture).