From 8ab80d676892360e81f4ed8dcb4dde759b4fcca1 Mon Sep 17 00:00:00 2001 From: Kerry McLaughlin Date: Thu, 20 Nov 2025 10:13:56 +0000 Subject: [PATCH 1/8] Add intrinsic support for the range prefetch (RPRFM) instruction. --- main/acle.md | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/main/acle.md b/main/acle.md index da4c48d5..e3ee8455 100644 --- a/main/acle.md +++ b/main/acle.md @@ -471,6 +471,7 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin * Added support for modal 8-bit floating point matrix multiply-accumulate widening intrinsics. * Added support for 16-bit floating point matrix multiply-accumulate widening intrinsics. +* Added support for range prefetch intrinsic when `__ARM_FEATURE_RPRFM` is defined. ### References @@ -3613,6 +3614,51 @@ values. | KEEP | 0 | Temporal fetch of the addressed location (that is, allocate in cache normally) | | STRM | 1 | Streaming fetch of the addressed location (that is, memory used only once) | +The following intrinsic is also available when `__ARM_FEATURE_RPRFM` is defined: + +``` c + void __rpld(/*constant*/ unsigned int /*access_kind*/, + /*constant*/ unsigned int /*retention_policy*/, + /*constant*/ unsigned int /*reuse distance*/, + /*constant*/ signed int /*stride*/, + /*constant*/ unsigned int /*count*/, + /*constant*/ signed int /*length*/, + void const volatile *addr); +``` + +Generates a data prefetch instruction from a range of addresses starting from a +given base address. Locations within the specified address ranges are prefetched +into one or more caches. This intrinsic allows the specification of the +expected access kind (read or write), the data retention policy (temporal or +streaming) and the reuse distance, stride, count and length metadata values. + +The access kind and data retention policy arguments can only be one of the +following values. + +| **Access Kind** | **Value** | **Summary** | +| --------------- | --------- | ---------------------------------------- | +| PLD | 0 | Fetch the addressed location for reading | +| PST | 1 | Fetch the addressed location for writing | + +| **Retention Policy** | **Value** | **Summary** | +| -------------------- | --------- | -------------------------------------------------------------------------- | +| KEEP | 0 | Temporal fetch of the addressed location (that is, allocate in cache normally) | +| STRM | 1 | Streaming fetch of the addressed location (that is, memory used only once) | + +The table below describes the ranges of the reuse distance, stride, count and length arguments. + +| **Metadata** | **Range** | **Summary** | +| -------------- | ----------------- | -------------------------------------------------------------------- | +| Reuse Distance | 0 to 15 | Maximum number of bytes to be accessed before executing the | +| | | next RPRFM instruction that specifies the same range. Values | +| | | from 1 to 15 represent decreasing powers of two in the range | +| | | 512MiB to 32KiB. A value of 0 indicates distance not known. | +| | | Note: This value is ignored if a streaming prefetch is specified. | +| Stride | -2MiB to +2MiB-1B | Number of bytes to advance the block address by after `Length` | +| | | bytes have been accessed. Note: This value is ignored if Count is 0. | +| Count | 0 to 65535 | Number of blocks to be accessed, minus 1. | +| Length | -2MiB to +2MiB-1B | Number of contiguous bytes to be accessed. | + ### Instruction prefetch ``` c From d9b772a8d148501bd281270dd7f365af862248b6 Mon Sep 17 00:00:00 2001 From: Kerry McLaughlin Date: Mon, 8 Dec 2025 11:50:57 +0000 Subject: [PATCH 2/8] - Define Count as the number of data blocks to be accessed (i.e. 1 - 65536) --- main/acle.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main/acle.md b/main/acle.md index e3ee8455..8e043ec3 100644 --- a/main/acle.md +++ b/main/acle.md @@ -3656,7 +3656,7 @@ The table below describes the ranges of the reuse distance, stride, count and le | | | Note: This value is ignored if a streaming prefetch is specified. | | Stride | -2MiB to +2MiB-1B | Number of bytes to advance the block address by after `Length` | | | | bytes have been accessed. Note: This value is ignored if Count is 0. | -| Count | 0 to 65535 | Number of blocks to be accessed, minus 1. | +| Count | 1 to 65536 | Number of blocks to be accessed. | | Length | -2MiB to +2MiB-1B | Number of contiguous bytes to be accessed. | ### Instruction prefetch From a3a51f8226159b68f26fbd28b53443dbc05078c7 Mon Sep 17 00:00:00 2001 From: Kerry McLaughlin Date: Mon, 8 Dec 2025 13:50:31 +0000 Subject: [PATCH 3/8] - Fix comment "This value is ignored if Count is 0" after changing the range of the Count argument --- main/acle.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main/acle.md b/main/acle.md index 8e043ec3..2859c484 100644 --- a/main/acle.md +++ b/main/acle.md @@ -3655,7 +3655,7 @@ The table below describes the ranges of the reuse distance, stride, count and le | | | 512MiB to 32KiB. A value of 0 indicates distance not known. | | | | Note: This value is ignored if a streaming prefetch is specified. | | Stride | -2MiB to +2MiB-1B | Number of bytes to advance the block address by after `Length` | -| | | bytes have been accessed. Note: This value is ignored if Count is 0. | +| | | bytes have been accessed. Note: This value is ignored if Count is 1. | | Count | 1 to 65536 | Number of blocks to be accessed. | | Length | -2MiB to +2MiB-1B | Number of contiguous bytes to be accessed. | From 4616a23a715067037bb0e539b25a5f92b18a7bc7 Mon Sep 17 00:00:00 2001 From: Kerry McLaughlin Date: Tue, 9 Dec 2025 09:52:11 +0000 Subject: [PATCH 4/8] - Rename __rpld to __pld_range - Remove const from metadata arguments - Change Reuse Distance to take a number of bytes --- main/acle.md | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/main/acle.md b/main/acle.md index 2859c484..e33fc07c 100644 --- a/main/acle.md +++ b/main/acle.md @@ -3617,16 +3617,16 @@ values. The following intrinsic is also available when `__ARM_FEATURE_RPRFM` is defined: ``` c - void __rpld(/*constant*/ unsigned int /*access_kind*/, - /*constant*/ unsigned int /*retention_policy*/, - /*constant*/ unsigned int /*reuse distance*/, - /*constant*/ signed int /*stride*/, - /*constant*/ unsigned int /*count*/, - /*constant*/ signed int /*length*/, - void const volatile *addr); + void __pld_range(/*constant*/ unsigned int /*access_kind*/, + /*constant*/ unsigned int /*retention_policy*/, + size_t /*reuse distance*/, + signed int /*stride*/, + unsigned int /*count*/, + signed int /*length*/, + void const volatile *addr); ``` -Generates a data prefetch instruction from a range of addresses starting from a +Generates a data prefetch instruction for a range of addresses starting from a given base address. Locations within the specified address ranges are prefetched into one or more caches. This intrinsic allows the specification of the expected access kind (read or write), the data retention policy (temporal or @@ -3647,17 +3647,17 @@ following values. The table below describes the ranges of the reuse distance, stride, count and length arguments. -| **Metadata** | **Range** | **Summary** | -| -------------- | ----------------- | -------------------------------------------------------------------- | -| Reuse Distance | 0 to 15 | Maximum number of bytes to be accessed before executing the | -| | | next RPRFM instruction that specifies the same range. Values | -| | | from 1 to 15 represent decreasing powers of two in the range | -| | | 512MiB to 32KiB. A value of 0 indicates distance not known. | -| | | Note: This value is ignored if a streaming prefetch is specified. | -| Stride | -2MiB to +2MiB-1B | Number of bytes to advance the block address by after `Length` | -| | | bytes have been accessed. Note: This value is ignored if Count is 1. | -| Count | 1 to 65536 | Number of blocks to be accessed. | -| Length | -2MiB to +2MiB-1B | Number of contiguous bytes to be accessed. | +| **Metadata** | **Range** | **Summary** | +| -------------- | ------------------- | -------------------------------------------------------------------- | +| Reuse Distance | 0 or [2**15, 2**29] | Maximum number of bytes to be accessed before executing the | +| | | next RPRFM instruction that specifies the same range. Values | +| | | are powers of two representing the number of bytes in the range | +| | | 32KiB to 512MiB. A value of 0 indicates distance not known. | +| | | Note: This value is ignored if a streaming prefetch is specified. | +| Stride | [-2MiB, +2MiB) | Number of bytes to advance the block address by after `Length` | +| | | bytes have been accessed. Note: This value is ignored if Count is 1. | +| Count | [1, 65536] | Number of blocks to be accessed. | +| Length | [-2MiB, +2MiB) | Number of contiguous bytes to be accessed. | ### Instruction prefetch From e53af7676c94fedbe520ffb1dde46cfcdc381399 Mon Sep 17 00:00:00 2001 From: Kerry McLaughlin Date: Tue, 9 Dec 2025 17:21:38 +0000 Subject: [PATCH 5/8] - Add a second builtin allowing the metadata to be passed as a single value - Mark metadata arguments to __pldx_range as constant --- main/acle.md | 49 +++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 39 insertions(+), 10 deletions(-) diff --git a/main/acle.md b/main/acle.md index e33fc07c..303871f0 100644 --- a/main/acle.md +++ b/main/acle.md @@ -3617,13 +3617,13 @@ values. The following intrinsic is also available when `__ARM_FEATURE_RPRFM` is defined: ``` c - void __pld_range(/*constant*/ unsigned int /*access_kind*/, - /*constant*/ unsigned int /*retention_policy*/, - size_t /*reuse distance*/, - signed int /*stride*/, - unsigned int /*count*/, - signed int /*length*/, - void const volatile *addr); + void __pldx_range(/*constant*/ unsigned int /*access_kind*/, + /*constant*/ unsigned int /*retention_policy*/, + /*constant*/ size_t /*reuse distance*/, + /*constant*/ signed int /*stride*/, + /*constant*/ unsigned int /*count*/, + /*constant*/ signed int /*length*/, + void const volatile *addr); ``` Generates a data prefetch instruction for a range of addresses starting from a @@ -3650,15 +3650,44 @@ The table below describes the ranges of the reuse distance, stride, count and le | **Metadata** | **Range** | **Summary** | | -------------- | ------------------- | -------------------------------------------------------------------- | | Reuse Distance | 0 or [2**15, 2**29] | Maximum number of bytes to be accessed before executing the | -| | | next RPRFM instruction that specifies the same range. Values | -| | | are powers of two representing the number of bytes in the range | -| | | 32KiB to 512MiB. A value of 0 indicates distance not known. | +| | | next RPRFM instruction that specifies the same range. This value | +| | | represents a number of bytes in the range 32KiB to 512MiB. When the | +| | | given number of bytes is not a power of 2, the next closest power of | +| | | 2 higher than the value specified will be chosen. Values exceeding | +| | | the maximum will be represented by 0, indicating distance not known. | | | | Note: This value is ignored if a streaming prefetch is specified. | | Stride | [-2MiB, +2MiB) | Number of bytes to advance the block address by after `Length` | | | | bytes have been accessed. Note: This value is ignored if Count is 1. | | Count | [1, 65536] | Number of blocks to be accessed. | | Length | [-2MiB, +2MiB) | Number of contiguous bytes to be accessed. | +``` c + void __pld_range(/*constant*/ unsigned int /*access_kind*/, + /*constant*/ unsigned int /*retention_policy*/, + unsigned long /*metadata*/, + void const volatile *addr); +``` + +Generates a data prefetch instruction for a range of addresses starting from a +given base address. Locations within the specified address ranges are prefetched +into one or more caches. The access kind and retention policy arguments can +have the same values as in `__pldx_range`. The bits of the metadata argument +are interpreted as follows: + +| **Metadata** | **Bits** | **Range** | **Summary** | +| -------------- | -------- | --------------- | ------------------------------------------------------------ | +| Length | 0-21 | [-2MiB, +2MiB) | Signed integer representing the number of contiguous | +| | | | bytes to be accessed. | +| Count | 37-22 | [0, 65535] | Unsigned integer representing number of blocks of data | +| | | | to be accessed, minus 1. | +| Stride | 59-38 | [-2MiB, +2MiB) | Signed integer representing the number of bytes to advance | +| | | | the block address by after `Length` bytes have been | +| | | | accessed. This value is ignored if Count is 0. | +| Reuse Distance | 63-60 | [0, 15] | Indicates the maximum number of bytes to be accessed before | +| | | | executing the next RPRFM instruction that specifies the same | +| | | | range. Bits encode decreasing powers of two in the range | +| | | | 1 (512MiB) to 15 (32KiB). 0 indicates distance not known. | + ### Instruction prefetch ``` c From 3513fc70bdfbfc8a68875112df0da2f438f85939 Mon Sep 17 00:00:00 2001 From: Kerry McLaughlin Date: Wed, 10 Dec 2025 17:53:01 +0000 Subject: [PATCH 6/8] - Remove Range information from Reuse Distance description --- main/acle.md | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/main/acle.md b/main/acle.md index 303871f0..10b692be 100644 --- a/main/acle.md +++ b/main/acle.md @@ -3649,12 +3649,11 @@ The table below describes the ranges of the reuse distance, stride, count and le | **Metadata** | **Range** | **Summary** | | -------------- | ------------------- | -------------------------------------------------------------------- | -| Reuse Distance | 0 or [2**15, 2**29] | Maximum number of bytes to be accessed before executing the | -| | | next RPRFM instruction that specifies the same range. This value | -| | | represents a number of bytes in the range 32KiB to 512MiB. When the | -| | | given number of bytes is not a power of 2, the next closest power of | -| | | 2 higher than the value specified will be chosen. Values exceeding | -| | | the maximum will be represented by 0, indicating distance not known. | +| Reuse Distance | | Maximum number of bytes to be accessed before executing the next | +| | | RPRFM instruction that specifies the same range. All values are | +| | | rounded up to the nearest power of 2 in the range 32KiB to 512MiB. | +| | | Values exceeding the maximum of 512MiB will be represented by 0, | +| | | indicating distance not known. | | | | Note: This value is ignored if a streaming prefetch is specified. | | Stride | [-2MiB, +2MiB) | Number of bytes to advance the block address by after `Length` | | | | bytes have been accessed. Note: This value is ignored if Count is 1. | @@ -3663,9 +3662,9 @@ The table below describes the ranges of the reuse distance, stride, count and le ``` c void __pld_range(/*constant*/ unsigned int /*access_kind*/, - /*constant*/ unsigned int /*retention_policy*/, - unsigned long /*metadata*/, - void const volatile *addr); + /*constant*/ unsigned int /*retention_policy*/, + unsigned long /*metadata*/, + void const volatile *addr); ``` Generates a data prefetch instruction for a range of addresses starting from a From cb32719c305877b7da5d43baa180cf8385ed6cea Mon Sep 17 00:00:00 2001 From: Kerry McLaughlin Date: Thu, 11 Dec 2025 11:58:03 +0000 Subject: [PATCH 7/8] - Remove reference to __ARM_FEATURE_RPRFM - Add description of the __ARM_PREFETCH_RANGE macro --- main/acle.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/main/acle.md b/main/acle.md index 10b692be..c84ee3b6 100644 --- a/main/acle.md +++ b/main/acle.md @@ -3614,7 +3614,8 @@ values. | KEEP | 0 | Temporal fetch of the addressed location (that is, allocate in cache normally) | | STRM | 1 | Streaming fetch of the addressed location (that is, memory used only once) | -The following intrinsic is also available when `__ARM_FEATURE_RPRFM` is defined: +The `__ARM_PREFETCH_RANGE` macro can be used to test for the presence of the +following range prefetch intrinsics: ``` c void __pldx_range(/*constant*/ unsigned int /*access_kind*/, From 25ea5c84b39654d81efcf5e7bac3b2be92bc059c Mon Sep 17 00:00:00 2001 From: Kerry McLaughlin Date: Fri, 12 Dec 2025 13:56:33 +0000 Subject: [PATCH 8/8] - Reorder arguments of _pldx_range to length, count, stride & reuse distance --- main/acle.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/main/acle.md b/main/acle.md index c84ee3b6..dc23a8cc 100644 --- a/main/acle.md +++ b/main/acle.md @@ -3620,10 +3620,10 @@ following range prefetch intrinsics: ``` c void __pldx_range(/*constant*/ unsigned int /*access_kind*/, /*constant*/ unsigned int /*retention_policy*/, - /*constant*/ size_t /*reuse distance*/, - /*constant*/ signed int /*stride*/, - /*constant*/ unsigned int /*count*/, /*constant*/ signed int /*length*/, + /*constant*/ unsigned int /*count*/, + /*constant*/ signed int /*stride*/, + /*constant*/ size_t /*reuse distance*/, void const volatile *addr); ``` @@ -3631,7 +3631,7 @@ Generates a data prefetch instruction for a range of addresses starting from a given base address. Locations within the specified address ranges are prefetched into one or more caches. This intrinsic allows the specification of the expected access kind (read or write), the data retention policy (temporal or -streaming) and the reuse distance, stride, count and length metadata values. +streaming) and the length, count, stride and reuse distance metadata values. The access kind and data retention policy arguments can only be one of the following values. @@ -3646,20 +3646,20 @@ following values. | KEEP | 0 | Temporal fetch of the addressed location (that is, allocate in cache normally) | | STRM | 1 | Streaming fetch of the addressed location (that is, memory used only once) | -The table below describes the ranges of the reuse distance, stride, count and length arguments. +The table below describes the ranges of the length, count, stride and reuse distance arguments. | **Metadata** | **Range** | **Summary** | | -------------- | ------------------- | -------------------------------------------------------------------- | +| Length | [-2MiB, +2MiB) | Number of contiguous bytes to be accessed. | +| Count | [1, 65536] | Number of blocks to be accessed. | +| Stride | [-2MiB, +2MiB) | Number of bytes to advance the block address by after `Length` | +| | | bytes have been accessed. Note: This value is ignored if Count is 1. | | Reuse Distance | | Maximum number of bytes to be accessed before executing the next | | | | RPRFM instruction that specifies the same range. All values are | | | | rounded up to the nearest power of 2 in the range 32KiB to 512MiB. | | | | Values exceeding the maximum of 512MiB will be represented by 0, | | | | indicating distance not known. | | | | Note: This value is ignored if a streaming prefetch is specified. | -| Stride | [-2MiB, +2MiB) | Number of bytes to advance the block address by after `Length` | -| | | bytes have been accessed. Note: This value is ignored if Count is 1. | -| Count | [1, 65536] | Number of blocks to be accessed. | -| Length | [-2MiB, +2MiB) | Number of contiguous bytes to be accessed. | ``` c void __pld_range(/*constant*/ unsigned int /*access_kind*/,