Skip to content

Conversation

@tdarote
Copy link

@tdarote tdarote commented Jan 13, 2026

  • Added patches for GPU work-group sizing, OpenCL loading, CMake fixes, and integration of multi-model tools.
  • Introduced pkg-config template and new recipes for flatbuffers and TensorFlow Lite 2.20.0.

Copy link
Contributor

@lumag lumag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide a sensible commit message. It should be describing the design decisions, the issues that you faced, etc. rather than simply stating the patch contents.

More importantly, please send patches upstream before contributing them here.

- Improves compatibility across GPUs.
- Prevents oversized work-groups and incorrect buffer alignment on Adreno devices.

Upstream-Status: Pending
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please submit upstream.

Date: Thu, 9 Oct 2025 22:42:17 +0200
Subject: [PATCH 07/11] tensorflow-lite: Major version dlopen for OpenCL libs

Upstream-Status: Inappropriate [OE specific]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing commit message, it's impossible to judge whether it's really OE-specific or not

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I’ll update the commit message to include the rationale. The patch makes TensorFlow Lite use dlopen with major versioned OpenCL libraries (e.g., libOpenCL.so.1) instead of unversioned names. This is required in OE because the unversioned symlink (libOpenCL.so) is often in -dev packages and not present on target images, causing runtime failures. Upstream doesn’t enforce this because they assume full development environments, so this is OE-specific.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my Debian system I also don't have libOpenCL.so, unless I install the dev package. Please work with upstream in order to implement a generic fix for the issue.

@@ -0,0 +1,29 @@
SUMMARY = "Memory Efficient Serialization Library"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a separate -native recipe?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need flatbuffers-native because TFLite’s build requires the flatc tool at build time, and it must match FlatBuffers v24.3.25 to avoid ABI/schema mismatches. This version isn’t available on the build host, and we didn’t find a native provider in our current layers. Providing a native recipe ensures deterministic builds and the exact tool version required by TFLite 2.20.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be better to carry a full downgrade version of the recipe in meta-oe/recipes-devtools/flatbuffers and select this one using with the PREFERRED_VERSION

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need separate -native recipe? Can't we use BBCLASSEXTEND?

I agree with @quaresmajose , the separate version should be provided in meta-oe.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can handle this using the bbappend file in the flatbuffer layer we shared. I’ll include the update in the next patch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you missed the point. The flatbuffers recipe is a part of meta-oe layer. Please keep it there.

@@ -0,0 +1,4 @@
# Tensorflow-lite needs an extremely specific version, so lock it to that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base path is different

@@ -0,0 +1,136 @@
inherit cmake pkgconfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we packaging it instead of using meta-tensorflow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bulk of meta-tensorflow is openjdk and bazel support, but we can't use bazel for 2.20, so we opt for the cmake option. That means we aren't reusing anything from meta-tensorflow anymore.

This should be mentioned in the PR description at the minimum.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We evaluated meta-tensorflow, which currently builds TensorFlow 2.19.0 with full C++ APIs and additional components. Our requirement is to upgrade to TensorFlow Lite 2.20.0 and build only the C APIs for a minimal footprint. The full TensorFlow build in meta-tensorflow introduces unnecessary dependencies, increases image size, and build time, which is not acceptable for our target.
Therefore, we introduced a dedicated TFLite recipe that:

Pins to 2.20.0 (latest stable for our BSP).
Compiles only the C API (no C++ interpreter, Python bindings, or extra tooling).
Applies OE-specific adjustments (e.g., OpenCL major-version dlopen for runtime compatibility).
This approach ensures a lean build optimized for embedded targets while meeting version and feature requirements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, please work with meta-tensorflow maintainers to update tflite (the layer also provides tflite) to 2.20.
I'd really prefer to avoid fragmentation here: there is an established layer which provides TF and TF Lite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we decide to integrate entire recipes which are known to also be available as part of other common layers, we should at least have a proper discussion with upstream, and only bring these recipes here when really required (something specific to our own BSP).

So please work first with upstream, see if it is possible to update the revision there and if they would also accept making the recipe more flexible in order for us to later decide how to build it (e.g. based on pkgconfig values) as part of meta-qcom-distro.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest recipe for tflite is 2.19 https://layers.openembedded.org/layerindex/recipe/396204/
there are 20 patches which do not affect our case
on top of that basel is used instead of cmake, where it is clearly stated by tflite repo owners that cmake must be used for cross compilation
none of the timing performance optimizations are there
so bbappend whould here to fix all those challenges or to try submitting everything to meta-tensorflow?

}

FILES:${PN} += "${libdir} ${bindir}"
INSANE_SKIP:${PN} += "dev-so \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Your git commit message is notexisting

@koenkooi
Copy link
Contributor

Please have a look at the recipe in #1319, that has a much better bb structure and has some comments (but not enough!) explaining the weird bits.

libeigen \
"

SRCREV = "${AUTOREV}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a pinned version


TF_TARGET_EXTRA ??= ""

do_configure[network] = "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break yocto check layer and is not allowed

@@ -0,0 +1,29 @@
SUMMARY = "Memory Efficient Serialization Library"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be better to carry a full downgrade version of the recipe in meta-oe/recipes-devtools/flatbuffers and select this one using with the PREFERRED_VERSION

@lumag lumag marked this pull request as draft January 14, 2026 02:03
From c7df7a3627ef250bf7a391e3bc9e247753837e07 Mon Sep 17 00:00:00 2001
From: Koen Kooi <koen.kooi@oss.qualcomm.com>
Date: Thu, 9 Oct 2025 18:11:16 +0200
Subject: [PATCH 4/8] cmake: lite/tools/benchmark: require protobug through

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protobug -> protobuf

diff --git a/tensorflow/lite/delegates/gpu/cl/opencl_wrapper.cc b/tensorflow/lite/delegates/gpu/cl/opencl_wrapper.cc
index 49551fd372a..b8229ec1f96 100644
--- a/tensorflow/lite/delegates/gpu/cl/opencl_wrapper.cc
+++ b/tensorflow/lite/delegates/gpu/cl/opencl_wrapper.cc

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move changes in opencl_wrapper to recipes-ml/tflite/files/0006-feat-tflite-Add-dynamic-OpenCL-library-loading-suppo.patch

This commit introduces a series of patches that enhance TensorFlow Lite's GPU capabilities and build system:

**GPU Optimizations:**
- Fix GPU work group size adjustments and remove Adreno-specific optimizations
- Improve softmax 1x1 operations to account for reported maximum threads
- Optimize work group picking to ensure max_z_size doesn't exceed max work group size
- Add dynamic OpenCL library loading support for better cross-platform compatibility

**Build System Improvements:**
- Fix protobuf dependencies in benchmark tools and label_image examples
- Enhance shared library linking and build configuration
- Add project versioning with VERSION and SOVERSION settings
- Introduce pkg-config support with tensorflow-lite.pc.in file

**New Recipes:**
- Add complete TensorFlow Lite recipe (v2.20.0) with all patches applied
- Append flatbuffers recipe to ensure proper protobuf dependencies

These changes significantly improve GPU performance, build reliability, and cross-platform compatibility for TensorFlow Lite applications.

Signed-off-by: Tushar Darote <tdarote@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants