-
Notifications
You must be signed in to change notification settings - Fork 178
tflite: add patches for GPU/OpenCL fixes and new recipes #1385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
tdarote
commented
Jan 13, 2026
- Added patches for GPU work-group sizing, OpenCL loading, CMake fixes, and integration of multi-model tools.
- Introduced pkg-config template and new recipes for flatbuffers and TensorFlow Lite 2.20.0.
lumag
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide a sensible commit message. It should be describing the design decisions, the issues that you faced, etc. rather than simply stating the patch contents.
More importantly, please send patches upstream before contributing them here.
| - Improves compatibility across GPUs. | ||
| - Prevents oversized work-groups and incorrect buffer alignment on Adreno devices. | ||
|
|
||
| Upstream-Status: Pending |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please submit upstream.
| Date: Thu, 9 Oct 2025 22:42:17 +0200 | ||
| Subject: [PATCH 07/11] tensorflow-lite: Major version dlopen for OpenCL libs | ||
|
|
||
| Upstream-Status: Inappropriate [OE specific] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing commit message, it's impossible to judge whether it's really OE-specific or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. I’ll update the commit message to include the rationale. The patch makes TensorFlow Lite use dlopen with major versioned OpenCL libraries (e.g., libOpenCL.so.1) instead of unversioned names. This is required in OE because the unversioned symlink (libOpenCL.so) is often in -dev packages and not present on target images, causing runtime failures. Upstream doesn’t enforce this because they assume full development environments, so this is OE-specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my Debian system I also don't have libOpenCL.so, unless I install the dev package. Please work with upstream in order to implement a generic fix for the issue.
| @@ -0,0 +1,29 @@ | |||
| SUMMARY = "Memory Efficient Serialization Library" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a separate -native recipe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need flatbuffers-native because TFLite’s build requires the flatc tool at build time, and it must match FlatBuffers v24.3.25 to avoid ABI/schema mismatches. This version isn’t available on the build host, and we didn’t find a native provider in our current layers. Providing a native recipe ensures deterministic builds and the exact tool version required by TFLite 2.20.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be better to carry a full downgrade version of the recipe in meta-oe/recipes-devtools/flatbuffers and select this one using with the PREFERRED_VERSION
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need separate -native recipe? Can't we use BBCLASSEXTEND?
I agree with @quaresmajose , the separate version should be provided in meta-oe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can handle this using the bbappend file in the flatbuffer layer we shared. I’ll include the update in the next patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you missed the point. The flatbuffers recipe is a part of meta-oe layer. Please keep it there.
| @@ -0,0 +1,4 @@ | |||
| # Tensorflow-lite needs an extremely specific version, so lock it to that | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base path is different
| @@ -0,0 +1,136 @@ | |||
| inherit cmake pkgconfig | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we packaging it instead of using meta-tensorflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bulk of meta-tensorflow is openjdk and bazel support, but we can't use bazel for 2.20, so we opt for the cmake option. That means we aren't reusing anything from meta-tensorflow anymore.
This should be mentioned in the PR description at the minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We evaluated meta-tensorflow, which currently builds TensorFlow 2.19.0 with full C++ APIs and additional components. Our requirement is to upgrade to TensorFlow Lite 2.20.0 and build only the C APIs for a minimal footprint. The full TensorFlow build in meta-tensorflow introduces unnecessary dependencies, increases image size, and build time, which is not acceptable for our target.
Therefore, we introduced a dedicated TFLite recipe that:
Pins to 2.20.0 (latest stable for our BSP).
Compiles only the C API (no C++ interpreter, Python bindings, or extra tooling).
Applies OE-specific adjustments (e.g., OpenCL major-version dlopen for runtime compatibility).
This approach ensures a lean build optimized for embedded targets while meeting version and feature requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, please work with meta-tensorflow maintainers to update tflite (the layer also provides tflite) to 2.20.
I'd really prefer to avoid fragmentation here: there is an established layer which provides TF and TF Lite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we decide to integrate entire recipes which are known to also be available as part of other common layers, we should at least have a proper discussion with upstream, and only bring these recipes here when really required (something specific to our own BSP).
So please work first with upstream, see if it is possible to update the revision there and if they would also accept making the recipe more flexible in order for us to later decide how to build it (e.g. based on pkgconfig values) as part of meta-qcom-distro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latest recipe for tflite is 2.19 https://layers.openembedded.org/layerindex/recipe/396204/
there are 20 patches which do not affect our case
on top of that basel is used instead of cmake, where it is clearly stated by tflite repo owners that cmake must be used for cross compilation
none of the timing performance optimizations are there
so bbappend whould here to fix all those challenges or to try submitting everything to meta-tensorflow?
| } | ||
|
|
||
| FILES:${PN} += "${libdir} ${bindir}" | ||
| INSANE_SKIP:${PN} += "dev-so \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Your git commit message is notexisting
|
Please have a look at the recipe in #1319, that has a much better bb structure and has some comments (but not enough!) explaining the weird bits. |
| libeigen \ | ||
| " | ||
|
|
||
| SRCREV = "${AUTOREV}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a pinned version
|
|
||
| TF_TARGET_EXTRA ??= "" | ||
|
|
||
| do_configure[network] = "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break yocto check layer and is not allowed
| @@ -0,0 +1,29 @@ | |||
| SUMMARY = "Memory Efficient Serialization Library" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be better to carry a full downgrade version of the recipe in meta-oe/recipes-devtools/flatbuffers and select this one using with the PREFERRED_VERSION
| From c7df7a3627ef250bf7a391e3bc9e247753837e07 Mon Sep 17 00:00:00 2001 | ||
| From: Koen Kooi <koen.kooi@oss.qualcomm.com> | ||
| Date: Thu, 9 Oct 2025 18:11:16 +0200 | ||
| Subject: [PATCH 4/8] cmake: lite/tools/benchmark: require protobug through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protobug -> protobuf
| diff --git a/tensorflow/lite/delegates/gpu/cl/opencl_wrapper.cc b/tensorflow/lite/delegates/gpu/cl/opencl_wrapper.cc | ||
| index 49551fd372a..b8229ec1f96 100644 | ||
| --- a/tensorflow/lite/delegates/gpu/cl/opencl_wrapper.cc | ||
| +++ b/tensorflow/lite/delegates/gpu/cl/opencl_wrapper.cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move changes in opencl_wrapper to recipes-ml/tflite/files/0006-feat-tflite-Add-dynamic-OpenCL-library-loading-suppo.patch
This commit introduces a series of patches that enhance TensorFlow Lite's GPU capabilities and build system: **GPU Optimizations:** - Fix GPU work group size adjustments and remove Adreno-specific optimizations - Improve softmax 1x1 operations to account for reported maximum threads - Optimize work group picking to ensure max_z_size doesn't exceed max work group size - Add dynamic OpenCL library loading support for better cross-platform compatibility **Build System Improvements:** - Fix protobuf dependencies in benchmark tools and label_image examples - Enhance shared library linking and build configuration - Add project versioning with VERSION and SOVERSION settings - Introduce pkg-config support with tensorflow-lite.pc.in file **New Recipes:** - Add complete TensorFlow Lite recipe (v2.20.0) with all patches applied - Append flatbuffers recipe to ensure proper protobuf dependencies These changes significantly improve GPU performance, build reliability, and cross-platform compatibility for TensorFlow Lite applications. Signed-off-by: Tushar Darote <tdarote@qti.qualcomm.com>
2d93af3 to
1d23b87
Compare