From ea06714bc8dd94cfa18d271356b305a607b2faf6 Mon Sep 17 00:00:00 2001 From: Peter Harris Date: Wed, 10 Dec 2025 14:34:40 +0000 Subject: [PATCH 1/2] Improve Markdown documentation --- README.md | 65 +++++----- docs/about_layers.md | 111 ++++++++++++++++ docs/building.md | 142 ++++++--------------- docs/creating.md | 149 ++++++++++++++++++++++ docs/faq.md | 25 ++-- docs/running_android.md | 57 +++++---- docs/running_linux.md | 8 +- docs/updating_protobuf_files.md | 19 --- layer_gpu_profile/README_LAYER.md | 25 ++-- layer_gpu_profile/docs/developer-docs.md | 16 +-- layer_gpu_support/README_LAYER.md | 32 ++--- layer_gpu_timeline/README_LAYER.md | 56 ++++---- layer_gpu_timeline/docs/developer-docs.md | 20 +-- 13 files changed, 456 insertions(+), 269 deletions(-) create mode 100644 docs/about_layers.md create mode 100644 docs/creating.md delete mode 100644 docs/updating_protobuf_files.md diff --git a/README.md b/README.md index 4b04c4e..c9196d6 100644 --- a/README.md +++ b/README.md @@ -1,54 +1,56 @@ # About -libGPULayers provides tooling to rapidly create new Vulkan layer drivers, -allowing developers to quickly generate new layers that can be used for -ad hoc experiments during development. +libGPULayers provides tooling to create new Vulkan layer drivers, allowing +you to quickly generate new layers suitable for creation of new developer tools +or for ad hoc experiments during development. In addition, we provide a number of pre-built layers that have been built -using these tools. These layers can be used as standalone tools in their -own right, and some can be used alongside other Arm tools such as Arm -Performance Studio. +using this framework. These layers might be used as standalone tools in their +own right, or might be used alongside other Arm tools such as +[Arm Performance Studio][2]. ## What are layer drivers? -Layers drivers provide a standard mechanism to inject diagnostic functionality +Layer drivers provide a standard mechanism to inject diagnostic functionality between an application and the underlying graphics driver. Layer drivers intercept the graphics API calls from the application, perform their diagnostic -function, and then make any necessary API calls into the underlying graphics -driver to actually perform the rendering operations. The ability to see, and -change, everything that the native driver sees makes layers an exceptionally -powerful tool for debugging functional and performance issues. +function, and then call into the underlying graphics driver to actually perform +the requested operation. The ability to see, and change, the calls into the +native driver makes layers an exceptionally powerful tool for debugging both +functional and performance issues. -The Vulkan API defines a standard layer driver mechanism. The API uses layers -to implement API parameter validation and error checking, but they are also a -general purpose mechanism for all kinds of developer tooling. +Layer drivers are designed in to the Vulkan API, and they are the mechanism +for common workflows such as error checking using the Vulkan Validation Layer +(VVL), but they are also a general purpose mechanism suitable for all kinds of +developer tooling. ## What is the purpose of this project? -We support many application developers during their development cycle. We -rarely get access to application source code, so layer drivers provide us with -an invaluable mechanism to make modifications to application API usage. The -`GPU Support` layer in this project is a a tool we use during technical support -investigations to quickly triage developers problems. +We help many application developers to investigate issues during their +development cycle. We rarely get access to application source code for these +investigations, and cannot change drivers on production devices. Layer drivers +provide us with an invaluable mechanism to monitor and make modifications to +application API usage without needing to modify the application itself. The +`GPU Support` layer in this project is a tool we use during technical +support investigations to quickly triage problems. We also use layer drivers as a way to develop new API-aware debug and profiling capabilities. The performance layers in this repository, such as the -`GPU Timeline` layer, are often early prototypes that we want to share with -developers to test new ideas and gather feedback. Some are designed to be used -as standalone development tools, others can also be used alongside other Arm -tools such as the Arm Streamline profiler in [Arm Performance Studio][2]. +`GPU Profile` and `GPU Timeline` layers, are used to profile performance, +or add API-aware annotations to performance captures made using other tooling. -As you can tell, we find layers exceptionally useful. However, creating a new -layer from scratch requires a lot of boilerplate code and is fiddly to get -right. We therefore also wanted to take this opportunity to share our layer -generation tools which make it trivial to create a complete bare-bones layer -that is ready to extend and use. +As you might be able to tell, we find layers exceptionally useful, and we +often want to create ad hoc layers to use for one-off experiments. Creating a +new layer from scratch requires a lot of code and is fiddly to get right, with +obscure errors when it doesn't work, so we wrote a tool to automate layer +creation. This final part of this project is this layer generation tooling, +which you use to quickly create a new layer that is ready to deploy. ## Supported devices This library is currently tested on devices running Android or Linux, and using Arm® Immortalis™ and Arm Mali™ GPUs. Contributions adding support for other -platforms is welcome. +platforms are welcome. # License @@ -60,14 +62,17 @@ from this repository you acknowledge that you accept terms specified in the Common documentation -* [Building a new layer](./docs/building.md) +* [Building a layer](./docs/building.md) +* [Creating a new layer](./docs/creating.md) * [Running using a layer on Android](./docs/running_android.md) * [Running using a layer on Linux](./docs/running_linux.md) +* [About layers design notes](./docs/about_layers.md) * [Frequently asked questions](./docs/faq.md) Layer documentation * [Layer: GPU Support](./layer_gpu_support/README_LAYER.md) +* [Layer: GPU Profile](./layer_gpu_support/README_LAYER.md) * [Layer: GPU Timeline](./layer_gpu_timeline/README_LAYER.md) # Support diff --git a/docs/about_layers.md b/docs/about_layers.md new file mode 100644 index 0000000..7b6762d --- /dev/null +++ b/docs/about_layers.md @@ -0,0 +1,111 @@ +# About Vulkan layers + +This page captures some interesting points to note about Vulkan layers, and is +mostly intended for developers of layers and maintainers of this project. + +## Android vs Linux differences + +Linux and Windows use the Khronos Vulkan loader, which has been extended over +time to have a richer loader-layer interface to support more use cases. The +current Khronos loader implements the v2 protocol from the +[Loader-Layer Interface][LLI] specification. + +Android uses a custom Vulkan loader, which supports the basic v0 protocol with +some Android-specific limitations. This interface is functional, but lacks +some useful capabilities of the Khronos layer such as being able to intercept +pre-instance functions in implicit layers. + +The libGPULayers framework has been designed to support both loaders, but +currently only supports functionality that works with the Android loader. There +are some areas that could be improved for Linux. + +## Layer lifetime + +Layer lifetime is managed by the Vulkan loader, and it is possible for a layer +to get loaded and unloaded multiple times within the lifetime of a single +application process. When a layer is unloaded, any global state is lost so +there is no way to use memory to persist per-process state in a layer as you +cannot guarantee you stay loaded. + +On Android, layer libraries are loaded when the loader needs them (for a query +or to create an instance) and will be unloaded when a non-zero `VkInstance` +refcount is decremented and hits zero. They might subsequently be reloaded +again if the application restarts using Vulkan functionality. + + +## Querying Instance version + +It could be useful for a layer to query `vkEnumerateInstanceVersion()` to +determine the maximum possible Vulkan API version supported on a platform, +although note that a device version might be a lower version so you need +to check both. + +It is not possible for a layer to hook pre-instance functions on Android, and +only implicit layers are allowed to do it with the Khronos loader, so we do +not support doing this in libGPULayers. Layers must defer checking the +supported API versions until they get a concrete `VkDevice` to query, which +they would have to anyway, because the device version might be different to the +instance version. + +Note: querying device version is much easier, because that uses normal +dispatchable Vulkan API functions, not pre-instance functions. + +## Querying Instance extensions + +Similar to the above, it could be useful for a layer to query the available +instance extensions using `vkEnumerateInstanceExtensionProperties()`. This +function is also a pre-instance function, and has the same limitations on +layer use as `vkEnumerateInstanceVersion()` in the section above, so it's not +supported in libGPULayers. + +Because a layer cannot query the supported Vulkan version, or the available +instance extensions, layers that require the implementation beneath them to +support a specific extension simply have to assume that it is available. + +This might result in an error on instance creation if the extension is not +supported. One possibility is that `vkCreateInstance()` will return +`VK_ERROR_EXTENSION_NOT_PRESENT`, because the extension is known but not +supported. Alternatively, it might result in undefined behavior, because the +layer passes in an extension structure on the `pNext` chain which is not known +by the version of Vulkan implemented by the loader or the driver. + +Note: querying device extensions is much easier, as that uses normal +dispatchable Vulkan API functions, not pre-instance functions. + +## Adding new extensions + +Layers might expose extensions that the driver does not. Layers advertise their +new extensions by adding the extension strings and versions to the extension +properties list returned by `vkEnumerateInstanceExtensionProperties()` and +`vkEnumerateDeviceExtensionProperties()` when the `pLayerName` parameter is +the current layer's name. + +For device extensions, it is also possible to modify the extension list +returned by the driver below by adding our extensions to the list returned +when `pLayerName` is `nullptr`. + +The specification requires that implementations do not expose extensions that +conflict with other extensions but, given that a layer has no way to check +what other layers might be exposing, we just assume that our list is safe to +expose. + +## Hiding device extensions + +Layers might hide device extensions exposed by the layers below by modifying +the list returned by `vkEnumerateDeviceExtensionProperties()` when calling +down the stack, removing entries that the layer wants to hide before +returning it to the caller. + +Note: This is only possible for device extensions, because instance extensions +are discovered per component by the loader, not in a layered manner. + + +## References + +* [Loader-Layer Interface][LLI]. + +- - - + +[LLI]: https://github.com/KhronosGroup/Vulkan-Loader/blob/main/docs + +_Copyright © 2025, Arm Limited and contributors._ diff --git a/docs/building.md b/docs/building.md index debe1c6..6c01d21 100644 --- a/docs/building.md +++ b/docs/building.md @@ -1,104 +1,22 @@ -# Creating a new layer +# Building layers -Layer skeleton code is generated from the Khronos specification XML, allowing -new layers to be quickly built based on the latest specifications. +This page gives you instructions for building a layer that already exists. -The code for a layer is split into two parts, which are generated separately. +## Building for Android -* Common code provides the dispatch framework which intercept all entrypoints, - and then forwards these to an appropriate handler. The common code provides a - default pass-through handler for each API entrypoint. -* User code provides layer-specific implementations of function intercepts and - can extend the generated `Device` and `Instance` classes with whatever - additional stateful persistence is needed to implement the layer. - -## Checking out the code - -From the directory you want to contain the code, check out the project and all -third-party dependencies: - -```sh -git clone https://github.com/ARM-software/libGPUlayers ./ -git submodule update --init -``` - -## Generate the common code - -The common code is checked into the repository, and should not need -regenerating unless you need to use a newer version of the specification. - -Update the version of the Vulkan specification by updating the git version of -the `khronos/vulkan` submodule. - -Once updated, regenerate the common code using the Python script: - -``` -python3 ./generator/generate_vulkan_common.py -``` - -## Generate the layer skeleton - -To create a new layer, use the Python script to generate a layer driver -skeleton for it. Replace the placeholder "Demo" with your layer name. - -``` -python3 ./generator/generate_vulkan_layer.py --project-name VkLayerDemo --output layer_demo -``` - -The Vulkan layer name must start with `VkLayer` and have a title-case name, -e.g. `VkLayerExampleName`. - -The output directory name should start with `layer_` and have a snake-case -name, e.g. `layer_example_name`. - -The output directory must be in the root directory of the git checkout, making -it a sibling of the `source_common` directory. This ensures that autogenerated -CMake include paths work correctly. - -**Note:** The skeleton layer does nothing other than intercept all of the -Vulkan API entry points and forward them to the next layer/driver in the stack. -You must edit the skeleton source code to make it do something useful ... - -## Adding custom intercepts to your layer - -Custom intercept functions are implemented in your layer source tree. We use -C++ template tag dispatch in the common code to automatically select the -specialized function implemented in the layer code, falling back to the common -default version if no specialization is available. - -Instance function intercepts must be declared in a header called -`layer_instance_functions.hpp` in the layer `source` directory. - -Device function intercepts must be declared in a header called -`layer_device_functions.hpp` in the layer `source` directory. - -The function prototypes for a layer implementation must be templated versions -of the normal Vulkan prototype, with the type `` used for the -template specialization. - -``` -template <> -VKAPI_ATTR void VKAPI_CALL layer_vkDestroyInstance( - VkInstance instance, - const VkAllocationCallbacks* pAllocator); -``` - -## Build an Android layer - -These build instructions require the following tools to be installed and added -to your environment `PATH`: +You must have the following tools installed and on your `PATH`: * CMake * GNU Make -The Android NDK must also be installed, and the path to the root of the NDK -installation must be stored in your `ANDROID_NDK_HOME` environment variable. +You must also have the Android NDK installed, and the `ANDROID_NDK_HOME` +environment variable must be set to the root of the NDK installation. ```sh -cd +cd cmake \ - -B \ + -B \ -G "Unix Makefiles" \ -DCMAKE_SYSTEM_NAME=Android \ -DANDROID_PLATFORM=29 \ @@ -106,40 +24,62 @@ cmake \ -DANDROID_TOOLCHAIN=clang \ -DANDROID_STL=c++_static \ -DCMAKE_BUILD_TYPE=Release \ - -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" + -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \ + .. -cmake --build +cmake --build ``` CMake builds are build system agnostic. You can replace GNU Make with another -build system, such as Ninja, if that is preferred. +build system, such as Ninja, if you prefer. An Android build script for Linux hosts, `./android_build.sh`, is provided -as a convenience wrapper for the above commands. It is not required to use +as a convenience wrapper for the commands above. It is not required to use this; it is primarily included to provide a one-line build for our CI test system. -## Build a Linux layer +## Build for Linux - -These build instructions require the following tools to be installed and added -to your environment `PATH`: +You must have the following tools installed and on your `PATH`: * CMake * GNU Make ```sh -cd +cd cmake \ - -B \ + -B \ -G "Unix Makefiles" \ -DCMAKE_BUILD_TYPE=Release \ .. -cmake --build +cmake --build ``` +## Build options + +The standard layer generation provides some optional build options which you +optionally set during CMake configuration. + +`LGL_CONFIG_LOG` controls logging to logcat (Android) or stdout (Linux). It is +on by default, and is disabled using `-DLGL_CONFIG_LOG=OFF` on the CMake +configure line. + +`LGL_CONFIG_TRACE` controls logging API entrypoint calls to logcat (Android) or +stdout. It is disabled by default, and is enabled using `-DLGL_CONFIG_TRACE=ON` +on the CMake configure line. Note that this option has a high performance +overhead, due to the amount of logging it creates, but it is a quick way to +discover what entrypoints an application is using. + +`LGL_CONFIG_OPTIMIZE_DISPATCH` controls whether the layer optimizes away +unnecessary `default_tag` intercepts, allowing API calls to bypass the layer +and call directly to the layer/driver below. It is enabled by default, and is +be disabled using `-DLGL_CONFIG_OPTIMIZE_DISPATCH=OFF` on the CMake configure +line. Disabling this optimization is useful when used in conjunction with +`LGL_CONFIG_TRACE` if you want to trace all entrypoints, and not just the ones +that the layer normally intercepts. + - - - _Copyright © 2024-2025, Arm Limited and contributors._ diff --git a/docs/creating.md b/docs/creating.md new file mode 100644 index 0000000..d13d812 --- /dev/null +++ b/docs/creating.md @@ -0,0 +1,149 @@ +# Creating a new layer + +Layer creation is automated, allowing you to quickly create a new layer that is +complete and ready to use. After creating your new layer, all that you need to +do is add the additional API intercepts your layer needs to implement the +custom functionality that you want it to provide. + +## Checking out the code + +From the directory you want to contain the code, check out the project and all +third-party dependencies: + +```sh +git clone https://github.com/ARM-software/libGPUlayers ./ +git submodule update --init +``` + +## Generate the new layer project + +Create your new layer, using a Python script to generate the project +directory containing the layer code. Replace the placeholder "Demo" with your +layer name. + +```sh +python3 ./generator/generate_vulkan_layer.py --project-name VkLayerDemo --output layer_demo +``` + +* The Vulkan layer name must start with `VkLayer` and have a title-case name, + e.g. `VkLayerExampleName`. +* The output directory name should start with `layer_` and have a snake-case + name, e.g. `layer_example_name`. +* The output directory must be in the root of the git checkout, making it a + sibling of the `source_common` directory. + +You now have a complete layer that is ready to use! This layer does nothing +useful yet, but it is functional and is deployable. You must now edit the layer +code to make it do something useful. + +### Generate the common code (optional) + +The common code is checked into the repository, and should not need +regenerating unless you need to use a newer version of the Vulkan +specification. + +Update the version of the Vulkan specification by updating the git version of +the `khronos/vulkan` submodule. + +Once updated, regenerate the common code using a Python script: + +```sh +python3 ./generator/generate_vulkan_common.py +``` + +## Adding custom intercepts to your layer + +Custom intercept functions are implemented in your layer source tree, using +C++ template specialization to override the default implementations provided +in the common code. + +### Adding intercept declarations + +Instance function intercepts must be declared in a header called +`layer_instance_functions.hpp` in your layer `source` directory. + +Device function intercepts must be declared in a header called +`layer_device_functions.hpp` in your layer `source` directory. + +The function prototypes for a layer implementation must be templated versions +of the normal Vulkan prototype, with the type `user_tag` used for tag dispatch. + +```C++ +template <> +VKAPI_ATTR void VKAPI_CALL layer_vkCmdSetCullMode( + VkCommandBuffer commandBuffer, + VkCullModeFlags cullMode); +``` + +When you build your layer, the compiler will automatically select your +`user_tag` specializations over the `default_tag` implementation provided in +the common code. + +### Adding intercept definitions + +The example below shows a typical no-op intercept implementation. + +```C++ +template <> +VKAPI_ATTR void VKAPI_CALL layer_vkCmdSetCullMode( + VkCommandBuffer commandBuffer, + VkCullModeFlags cullMode +) { + LAYER_TRACE(__func__); + + // Hold the lock to access layer-wide global store + std::unique_lock lock { g_vulkanLock }; + auto* layer = Device::retrieve(commandBuffer); + + // Release the lock to call into the driver + lock.unlock(); + layer->driver.vkCmdSetCullMode(commandBuffer, cullMode); +} +``` + +The framework uses a forwarding design for Vulkan handles, returning native +driver handles to the application and storing its local `Instance` or `Device` +context as side-band information held in the layer. This allows a layer to only +intercept a subset of the entry points without having to translate handles +everywhere. + +Use the `Instance::retrieve()` and `Device::retrieve()` functions to retrieve +the layer context, using the API dispatchable handle as the key. Because +Vulkan is multi-threaded, any lookups into these shared structures must be +done with the layer-wide lock held. + +You will typically want to release the layer-wide lock before calling in to +the driver to reduce the performance impact of having the layer installed on +multi-threaded Vulkan applications. How early you are able to release the lock +before calling the driver will depend on what your layer does. + +### Overriding layer entrypoints + +For most Vulkan API functions, the common code does nothing other than +providing a pass-through no-op, which will not be used at all unless the +`LGL_CONFIG_OPTIMIZE_DISPATCH` build option is disabled. + +There are a number of functions in the common code that provide a manually +authored implementation because the layer needs to do something specific to +talk to the loader and to initialize itself. These functions include: + +* `layer_vkGetInstanceProcAddr()` +* `layer_vkGetDeviceProcAddr()` +* `layer_vkEnumerateInstanceExtensionProperties()` +* `layer_vkEnumerateDeviceExtensionProperties()` +* `layer_vkEnumerateInstanceLayerProperties()` +* `layer_vkEnumerateDeviceLayerProperties()` +* `layer_vkCreateInstance()` +* `layer_vkDestroyInstance()` +* `layer_vkCreateDevice()` +* `layer_vkDestroyDevice()` + +These are all implemented as `default_tag` implementations. A user layer might +still override any of these with a `user_tag` specialization if needed, but +must reimplement the required functionality taken from the common +implementation to ensure the layer still works. The common function +implementations are found in `source_common/framework/manual_functions.cpp`. + +- - - + +_Copyright © 2024-2025, Arm Limited and contributors._ diff --git a/docs/faq.md b/docs/faq.md index 024fd52..249c325 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -7,32 +7,29 @@ Some answers to commonly asked questions. ### Will you accept contributions for other platforms? We are very willing to accept platform support contributions for the layer -skeleton generator. +generator and common code. ### Will you accept new layer drivers? -We are currently not able to accept contributions for new pre-built layers, as -we have no way to test and maintain them. We encourage developers to share -their layer creations as new open-source projects. +We are not able to accept contributions for new pre-built layers, as we have no +way to test and maintain them. We encourage developers to share their layer +creations as new open-source projects. We'd love to hear what you build, so let us know what you get up to! -### Will you add an Android OpenGL ES layer generation? +### Will you add support for Android OpenGL ES layer generation? -Yes, this is on our backlog. - -### Will you add Arm Linux support for Vulkan layer generation? - -Yes, this is on our backlog. +No, this is no longer planned. ## Layer functionality ### Is there a way to select what gets intercepted? -You can do this by modifying the interception table in the generated code to -remove functions you do not want to intercept. There is currently no support -in the generator for specifying a user-provided function list. +By default only essential entrypoints required by the layer framework, or +specific entrypoints implemented as `user_tag` specializations in a layer +project, are intercepted. Unused entrypoints are not intercepted unless the +`LGL_CONFIG_OPTIMIZE_DISPATCH` option is disabled. - - - -_Copyright © 2024, Arm Limited and contributors._ +_Copyright © 2024-2025, Arm Limited and contributors._ diff --git a/docs/running_android.md b/docs/running_android.md index c38647b..4f23c2b 100644 --- a/docs/running_android.md +++ b/docs/running_android.md @@ -1,15 +1,16 @@ # Running using a layer on Android To make it easy to install and configure layers for Android, we provide an -installation script which can automatically configure one or more layers. +installation script which automatically configures a device to use one or more +layers. -These instruction assume that you have already built the layers that you want -to install. See the [Building a new layer](./building.md) page for build -instructions. +These instructions assume that you have already built the layers that you want +to install. See the [Building a layer](./building.md) page, or the per-layer +README files, for build instructions. ## Script configuration -From the root directory of the GitHub project run the Android installation +From the root directory of the project checkout, run the Android installation utility, specifying the directory containing the layer that you want to install: @@ -20,13 +21,17 @@ python3 lgl_android_install.py --layer layer_gpu_example By default the script will automatically search to find connected Android devices, and debuggable packages on the chosen device. If there are multiple options the script will present a menu and prompt you for a selection. You -can avoid this by manually specifying the device (`--device`/`-D`) and package -(`--package`/`-P`) to instrument. +avoid the interactive prompt by manually specifying the device +(`--device`/`-D`) and package (`--package`/`-P`) to instrument. -Wait for the layer to be installed and configured. The script will notify you -when this has been done. You can now perform your development work. When you -are finished, return to the script and press a key to notify it that it can -clean up the device and remove the layers. +Once you have selected a device and package, wait for the layer to be installed +and configured. The script will notify you when this has been done. + +You can now perform you work by running the target application. The layer +will be loaded automatically. + +After you are finished, return to the script and press a key to notify it that +you have finished. It will clean up the device and remove the layers. ### Package launch and configuration @@ -39,12 +44,12 @@ application after the layers are configured and stop the application when you finish profiling. Auto-start will default to using the main launchable activity for the package, -but you can override this using `--package-activity` to specify the name of +but you might override this using `--package-activity` to specify the name of another activity to launch. -You can pass in additional activity command line arguments by using the +You optionally pass in additional activity command line arguments by using the `--package-arguments` option to specify the argument string to pass to -`am start`. This string will often contain spaces, so ensure that is quoted +`am start`. This string will often contain spaces, so ensure that it is quoted correctly on the host shell. For example: ```sh @@ -55,28 +60,28 @@ correctly on the host shell. For example: Some layers require a configuration file to control their behavior. Most layers that need a configuration file ship with a default config, -`layer_config.json`, in their layer directory. Users can override this with -a custom config by using the `--config`/`-C` option to specify a custom +`layer_config.json`, in their layer directory. Users override this with a +custom config by using the `--config`/`-C` option to specify a custom config file. **NOTE:** The layer that each config file applies to is specified in the config -file itself, and is not implied by command line order. +file itself, and is not implied by order of the command line options. ### Multi-layer installation -The script can install multiple layers in a stack. Specify the `--layer`/`-L` -option multiple times, once per layer. Layers are stacked in command line -order, with the first layer specified being the top of the stack closest to the -application. +The script is able to install multiple layers in a stack. Specify the +`--layer`/`-L` option multiple times, once per layer. Layers are stacked in +command line order, with the first layer specified being the top of the stack +closest to the application. ### Khronos validation layer installation -The script can install the Khronos validation layer. A dummy layer directory , -`layer_khronos_validation`, is provided. Download the the latest binary release -from the [Vulkan-ValidationLayers/releases][1] GitHub, and place the binaries -into dummy build tree at the correct location. +The script supports installing the Khronos validation layer. A dummy layer +directory, `layer_khronos_validation`, is provided. Download the latest binary +release from the [Vulkan-ValidationLayers/releases][1] GitHub, and place the +binaries into the dummy build tree at the correct location. -Once this is done you can install the validation layer like any other. +Once this is done you install the validation layer like any other. **NOTE:** When installing the Khronos validation layer you need to decide where to install it in the layer stack. If you install it as the first layer in the diff --git a/docs/running_linux.md b/docs/running_linux.md index b4c2024..9dfd18b 100644 --- a/docs/running_linux.md +++ b/docs/running_linux.md @@ -2,7 +2,7 @@ There are multiple ways to install layer drivers for Linux. For our use case we cannot usually modify the application binary, so we install the layer and -manifest into a user-owned directory on the target device and configure the +manifest in to a user-owned directory on the target device and configure the layer driver using environment options. ## Create a manifest @@ -49,9 +49,9 @@ env VK_LOADER_DEBUG=all \ ``` The `VK_LOADER_DEBUG=all` option enables verbose logging in the loader itself, -and is a useful tool that can help you work out why your layer is not being +and is a useful tool that might help you work out why your layer is not being loaded when you expect. Once your layer is being loaded correctly, this option -can be removed. +should be removed. ## Uninstall @@ -61,4 +61,4 @@ the Fizzbuzz layer installed. - - - -_Copyright © 2024, Arm Limited and contributors._ +_Copyright © 2024-2025, Arm Limited and contributors._ diff --git a/docs/updating_protobuf_files.md b/docs/updating_protobuf_files.md deleted file mode 100644 index be50089..0000000 --- a/docs/updating_protobuf_files.md +++ /dev/null @@ -1,19 +0,0 @@ -# Updating the generated protobuf (de)serialization code - -This project uses protobufs for (de)serialization of certain data: - - * In the raw GPU timeline messages sent from `layer_gpu_timeline` to the host. - * In the Perfetto data collected from the device. - -Python decoders for those protocols are pre-generated and stored in the sources -under `lglpy/timeline/protos`. - -To regenerate or update the timeline protocol files use: - - protoc -I layer_gpu_timeline/ \ - --python_out=lglpy/timeline/protos/layer_driver/ \ - layer_gpu_timeline/timeline.proto - -- - - - -_Copyright © 2025, Arm Limited and contributors._ diff --git a/layer_gpu_profile/README_LAYER.md b/layer_gpu_profile/README_LAYER.md index 90b74b2..760fe3b 100644 --- a/layer_gpu_profile/README_LAYER.md +++ b/layer_gpu_profile/README_LAYER.md @@ -1,14 +1,14 @@ # Layer: GPU Profile -This layer is a frame profiler that can capture per workload performance -counters for selected frames running on an Arm GPU. +This layer is a frame profiler that captures per workload performance counters +for selected frames running on an Arm GPU. ## What devices are supported? This layer requires Vulkan 1.0 and an Arm GPU because it uses an Arm-specific performance counter sampling library. -## What data can be collected? +## What data is collected? The layer serializes workloads for instrumented frames and injects counter samples between them, allowing the layer to measure the hardware metrics for @@ -36,7 +36,7 @@ unaffected by the addition of serialization. Arm GPUs provide a wide range of performance counters covering many different aspects of hardware performance. The layer will collect a standard set of -counters by default but, with source modification, can collect any of the +counters by default but, with source modification, might collect any of the hardware counters and derived expressions supported by the [libGPUCounters][LGC] library that Arm provides on GitHub. @@ -44,15 +44,15 @@ hardware counters and derived expressions supported by the ### GPU clock frequency impact -The GPU idle time waiting for the CPU to take a counter sample can cause the +The GPU idle time waiting for the CPU to take a counter sample might cause the system DVFS power governor to decide that the GPU is not busy. In production devices we commonly see that the GPU will be down-clocked during the -instrumented frame, which may have an impact on a some of the available -performance counters. For example, GPU memory latency may appear lower than +instrumented frame, which might have an impact on some of the available +performance counters. For example, GPU memory latency might appear lower than normal if the reduction in GPU clock makes the memory system look faster in comparison. -When running on a pre-production device you can minimize the impacts of these +When running on a pre-production device you minimize the impacts of these effects by pinning CPU, GPU, and bus clock frequencies. This is not usually possible on a production device. @@ -72,7 +72,7 @@ Application setup steps: * Build a debuggable build of your application and install it on the Android device. -Tooling setup steps +Tooling setup steps: * Install the Android platform tools and ensure `adb` is on your `PATH` environment variable. @@ -88,7 +88,7 @@ sections in the [Build documentation](../docs/building.md). ### Running using the layer -You can configure a device to run a profile by using the Android helper utility +You configure a device to run a profile by using the Android helper utility found in the root directory to configure the layer and manage the application. You must enable the profile layer, and provide a configuration file to parameterize it. @@ -98,8 +98,9 @@ python3 lgl_android_install.py --layer layer_gpu_profile --config - ``` The [`layer_config.json`](layer_config.json) file in this directory is a -template configuration file you can start from. It defaults to periodic -sampling every 600 frames, but you can modify this to suit your needs. +template configuration file you can use as a starting point. It defaults to +periodic sampling every 600 frames, but you should modify this to suit your +needs. The `--profile` option specifies an output directory on the host to contain the CSV files written by the tool. One CSV is written for each frame, each CSV diff --git a/layer_gpu_profile/docs/developer-docs.md b/layer_gpu_profile/docs/developer-docs.md index 84bce10..916545a 100644 --- a/layer_gpu_profile/docs/developer-docs.md +++ b/layer_gpu_profile/docs/developer-docs.md @@ -6,14 +6,14 @@ maintaining the layer. ## Measuring performance -Arm GPUs can run multiple workloads in parallel, if the application pipeline +Arm GPUs might run multiple workloads in parallel, if the application pipeline barriers allow it. This is good for overall frame performance, but it makes profiling data messy due to cross-talk between unrelated workloads. For profiling we therefore inject serialization points between workloads to -ensure that data corresponds to a single workload. Note that we can only -serialize within the current application process, so data could still be -perturbed by other processes using the GPU. +ensure that data corresponds to a single workload. Note that we only serialize +within the current application process, so data could still be perturbed by +other processes using the GPU. ### Sampling performance counters @@ -29,7 +29,7 @@ allows the CPU to set/wait on events in a submitted but not complete command buffer. The layer injects a `vkCmdSetEvent(A)` and `vkCmdWaitEvent(B)` pair between each workload in the command buffer, and then has the reverse `vkWaitEvent(A)` and `vkSetEvent(B)` pair on the CPU side. The counter sample -can be inserted in between the two CPU-side operations. Note that there is no +might be inserted in between the two CPU-side operations. Note that there is no blocking CPU-side wait for an event so `vkWaitEvent()` is really a polling loop around `vkGetEventStatus()`. @@ -56,7 +56,7 @@ increase compared to a well overlapped scenario. In addition, serializing workloads and then trapping back to the CPU to sample performance counters will cause the GPU to go idle waiting for the CPU to complete the counter sample. This makes the GPU appear underutilized to the -system DVFS governor, which may subsequently decide to reduce the GPU clock +system DVFS governor, which might subsequently decide to reduce the GPU clock frequency. On pre-production devices we recommend locking CPU, GPU and memory clock frequencies to avoid this problem. @@ -84,8 +84,8 @@ define the software operations that the layer needs to perform at submit time. Because counter sampling is handled synchronously on the CPU when a frame is being profiled, the layer handles each `vkQueueSubmit` and its associated counter samples synchronously at submit time before returning to the -application. When sampling the layer retains the layer lock when calling into -the driver, ensuring that only one thread at a time can process a submit that +application. When sampling, the layer retains the layer lock when calling into +the driver This ensures that only one thread at a time processes a submit that makes counter samples. ## Event handling diff --git a/layer_gpu_support/README_LAYER.md b/layer_gpu_support/README_LAYER.md index 4d60ead..ab740a4 100644 --- a/layer_gpu_support/README_LAYER.md +++ b/layer_gpu_support/README_LAYER.md @@ -1,10 +1,10 @@ # Layer: GPU Support -This layer is a tech support trick box that is designed to help diagnose causes -of functional and performance issues in applications. It works by letting you -quickly test your application with a set of API behavior overrides applied, -which can help to identify likely problem areas in the application if an -override causes an issue to disappear. +This layer is a tech support trick box that is designed to help diagnose +causes of functional and performance issues in applications. It works by +letting you quickly test your application with a set of API behavior overrides +applied, which might help to identify likely problem areas in the application +if an override causes an issue to disappear. ## What devices are supported? @@ -43,9 +43,9 @@ sections in the [Build documentation](../docs/building.md). ### Running using the layer -You can configure a device to run support experiments by using the Android -helper utility found in the root directory to configure the layer and manage -the application. You must enable the support layer, and provide a configuration +You configure a device to run support experiments by using the Android helper +utility found in the root directory to configure the layer and manage the +application. You must enable the support layer, and provide a configuration file to parameterize it. ```sh @@ -53,11 +53,11 @@ python3 lgl_android_install.py --layer layer_gpu_support --config ``` The [`layer_config.json`](layer_config.json) file in this directory is a -template configuration file you can start from. It does not enable any +template configuration file you should start from. It does not enable any overrides by default, so running the layer using this configuration "as is" will not do anything useful. Take a copy and modify it to enable the options you want to try. Details of the configuration options in each override group -are document in the _Behavior overrides_ section below. +are documented in the _Behavior overrides_ section below. The Android helper utility contains many other options for configuring the application under test and the capture process. For full instructions see the @@ -65,16 +65,16 @@ application under test and the capture process. For full instructions see the ## Behavior overrides -The current override groups are supported: +The following override groups are supported: -* **Feature:** control use of optional Vulkan features that can impact +* **Feature:** control use of optional Vulkan features that might impact correctness and performance. * **Serialization:** control serialization of GPU workload scheduling to diagnose issues caused by missing queue or command stream synchronization. * **Shaders and Pipelines:** control shader pipeline compilation to diagnose issues caused by shader precision issues. * **Framebuffers:** control use of lossy and lossless image compression for - uncompressed images that may be used as framebuffer attachments. + uncompressed images that could be used as framebuffer attachments. ### Features @@ -99,8 +99,8 @@ avoid ambiguous settings. The serialization overrides allow forceful serialization of submitted workloads, ensuring that they run in queue submit order. The synchronization -can be configured per workload type, allowing control over where serialization -is added to command buffers and queues. +is configured per workload type, allowing control over where serialization is +added to command buffers and queues. #### Configuration options @@ -162,7 +162,7 @@ compiler handles compilation tasks. ## Framebuffers -The framebuffer overrides allows some control over how the framebuffers are +The framebuffer overrides allow some control over how the framebuffers are allocated and handled by the driver. * If the `disable_compression` option is `true` then compression is always diff --git a/layer_gpu_timeline/README_LAYER.md b/layer_gpu_timeline/README_LAYER.md index d075baa..e9b2c0d 100644 --- a/layer_gpu_timeline/README_LAYER.md +++ b/layer_gpu_timeline/README_LAYER.md @@ -1,16 +1,16 @@ # Layer: GPU Timeline -This layer is used with Arm GPU tooling that can show the scheduling of -workloads on to the GPU hardware. The layer provides additional semantic +This layer is used with Arm GPU tooling that show the scheduling of workloads +on to the GPU hardware queues. The layer provides additional semantic annotation, extending the scheduling data from the Android Perfetto render stages telemetry with useful API-aware context. ![Timeline visualization](./docs/visualize.png) -Visualizations generated using this tooling show the execution of each workload -event, grouping events by the hardware scheduling stream used. These streams -can run in parallel on the GPU, and the visualization shows the level of -parallelization achieved. +Visualizations generated using this tooling show the execution of each +workload event, grouping events by the hardware scheduling stream used. These +streams might run in parallel on the GPU, and the visualization shows the +level of parallelization achieved. ## What devices are supported? @@ -42,12 +42,10 @@ Tooling setup steps environment variable. * Install the Android NDK and set the `ANDROID_NDK_HOME` environment variable to its installation path. -* The viewer uses Python 3.10 or newer, which can be downloaded from the - official Python website: https://www.python.org. -* The viewer uses PyGTK, and requires the native GTK3 libraries and PyGTK to be - installed. GTK installation instructions can be found on the official GTK - website: https://www.gtk.org/docs/installations. -* Python dependencies can be installed using the Python 3 `pip` package +* The viewer uses Python 3.10 or newer. See https://www.python.org. +* The viewer uses PyGTK, and requires the native GTK3 libraries and PyGTK to + be installed. See https://www.gtk.org/docs/installations. +* Python dependencies might be installed using the Python 3 `pip` package manager. ``` @@ -63,7 +61,7 @@ instructions see the _Build an Android layer_ section in the ### Layer run -You can record a timeline by using the Android helper utility found in the root +You record a timeline by using the Android helper utility found in the root directory to configure the layer and manage the capture process. You must enable the timeline layer, and the base name of the output files that will contain the final timeline data. @@ -73,8 +71,8 @@ python3 lgl_android_install.py --layer layer_gpu_timeline --timeline ``` The timeline data files will be saved as `.perfetto` and `.metadata`. -If you want to use different file names for each, you can alternatively specify -a full file path for each file using `--timeline-perfetto` and +If you want to use different file names for each, you might alternatively +specify a full file path for each file using `--timeline-perfetto` and `--timeline-metadata`. The Android helper utility contains many other options for configuring the @@ -83,8 +81,8 @@ application under test and the capture process. For full instructions see the ## Timeline visualization -This project includes an experimental Python viewer which can parse and -visualize the data in the two data files captured earlier. +This project includes an experimental Python viewer which parses and +visualizes the data in the two data files captured earlier. Run the following command to start the tool: @@ -107,9 +105,9 @@ Event boxes show: ### Controls -The viewer consists of two main areas - the Timeline canvas that shows the -events, and the Information panel that can show a summary of the current -active events and time range. +The viewer consists of two main areas. The Timeline canvas that shows the +events, and the Information panel that shows a summary of the current active +events and time range. Navigation uses the mouse: @@ -143,7 +141,7 @@ Selecting an active time range: ## What workloads are supported? -The Arm GPU scheduler event trace can generate timing events for each +The Arm GPU scheduler event trace might generate timing events for each atomically schedulable workload submitted to the GPU scheduler. Most workloads submitted to a Vulkan queue by the application are a single @@ -191,20 +189,20 @@ captured and reported, but with unknown workload dimensions. The current implementation reports the size of a compute workload as the number of work groups, because this is the parameter used by the API. We -eventually want to report this as the number of work items, but the parsing -of the SPIR-V and pipeline parameters has not yet been implemented. +eventually want to report this as the number of work items, but the parsing of +the SPIR-V and pipeline parameters has not yet been implemented. ### Limitation: Dynamic render passes split over multiple command buffers The label containing the `tagID` is recorded into the application command buffer when the command buffer is recorded. The workload-to-metadata mapping -requires that every use of a `tagID` has the same properties, or we will -be unable to associate the correct metadata with its matching workload. +requires that every use of a `tagID` has the same properties, or we will be +unable to associate the correct metadata with its matching workload. -Content that splits a render pass over multiple command buffers that -are not one-time-submit violates this requirement. Multiple submits of a render -pass with a single `tagID` may have different numbers of draw calls, depending -on the number of draws that occur in the later command buffers that resume the +Content that splits a render pass over multiple command buffers that are not +one-time-submit violates this requirement. Multiple submits of a render pass +with a single `tagID` might have different numbers of draw calls, depending on +the number of draws that occur in the later command buffers that resume the render pass. When the layer detects suspended render pass in a multi-submit command buffer, it will still capture and report the workload, but with an unknown draw call count. diff --git a/layer_gpu_timeline/docs/developer-docs.md b/layer_gpu_timeline/docs/developer-docs.md index 229a623..37804b3 100644 --- a/layer_gpu_timeline/docs/developer-docs.md +++ b/layer_gpu_timeline/docs/developer-docs.md @@ -1,7 +1,7 @@ # Layer: GPU Timeline - Developer Documentation -This layer is used with Arm GPU tooling that can show the scheduling of -workloads on to the GPU hardware. The layer provides additional semantic +This layer is used with Arm GPU tooling that shows the scheduling of workloads +on to the GPU hardware queue. The layer provides additional semantic annotation, extending the scheduling data from the Android Perfetto render stages telemetry with useful API-aware context. @@ -9,16 +9,16 @@ stages telemetry with useful API-aware context. Most properties we track are a property of the command buffer recording in isolation. However, the user debug label stack is a property of the queue and -persists across submits. We can therefore only determine the debug label +persists across submits. We therefore only determine the debug label associated with a workload in the command stream at submit time, and must resolve it per workload inside the command buffer. To support this we implement a software command stream that contains simple bytecode actions that represent the sequence of debug label and workload -commands inside each command buffer. This "command stream" can be played to -update the the queue state at submit time, triggering metadata submission -for each workload that can snapshot the current state of the user debug label -stack at that point in the command stream. +commands inside each command buffer. This "command stream" is played to update +the queue state at submit time, triggering metadata submission for each +workload that snapshots the current state of the user debug label stack at +that point in the command stream. ## Updating protobuf @@ -26,7 +26,7 @@ The protocol between the layer and the host tools uses Google Protocol Buffers to implement the message encoding. The layer implementation uses Protopuf, a light-weight implementation which -can be trivially integrated into the layer. Protopuf message definitions are +is easily integrated in to the layer. Protopuf message definitions are defined directly in the C++ code (see `timeline_protobuf_encoder.cpp`) and do not use the `timeline.proto` definitions. @@ -34,8 +34,8 @@ The host implementation uses the Google `protoc` compiler to generate native bindings from the `timeline.proto` definition. When updating the protocol buffers you must ensure that the C++ and `proto` definitions match. -To regenerate the Python bindings, run the following command from the -`layer_gpu_timeline` directory: +To regenerate the Python bindings, found in `lglpy/timeline/protos`, run the +following command from the `layer_gpu_timeline` directory: ```sh protoc ./timeline.proto --python_out=../lglpy/timeline/protos/layer_driver/ From 21f3cb949c6781ed2f25c2d588a178da981756bd Mon Sep 17 00:00:00 2001 From: Peter Harris Date: Wed, 10 Dec 2025 20:49:39 +0000 Subject: [PATCH 2/2] Skip Actions runs for Markdown-only changes --- .github/workflows/native_test.yaml | 2 ++ .github/workflows/python_test.yaml | 4 ++++ 2 files changed, 6 insertions(+) diff --git a/.github/workflows/native_test.yaml b/.github/workflows/native_test.yaml index 9dc6aaa..0477a69 100644 --- a/.github/workflows/native_test.yaml +++ b/.github/workflows/native_test.yaml @@ -10,11 +10,13 @@ on: - '*' paths-ignore: - 'lglpy/**' + - '**/*.md' pull_request: branches: - main paths-ignore: - 'lglpy/**' + - '**/*.md' jobs: build-ubuntu-x64-clang: diff --git a/.github/workflows/python_test.yaml b/.github/workflows/python_test.yaml index 1aa6f51..d700acc 100644 --- a/.github/workflows/python_test.yaml +++ b/.github/workflows/python_test.yaml @@ -8,9 +8,13 @@ on: - main tags: - '*' + paths-ignore: + - '**/*.md' pull_request: branches: - main + paths-ignore: + - '**/*.md' jobs: python-test: