@@ -466,6 +466,8 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin
466466* Added feature test macro for FEAT_SSVE_FEXPA.
467467* Added feature test macro for FEAT_CSSC.
468468* Added support for FEAT_FPRCVT intrinsics and `__ARM_FEATURE_FPRCVT`.
469+ * Added support for modal 8-bit floating point matrix multiply-accumulate widening intrinsics.
470+ * Added support for 16-bit floating point matrix multiply-accumulate widening intrinsics.
469471
470472### References
471473
@@ -2354,6 +2356,26 @@ is hardware support for the SVE forms of these instructions and if the
23542356associated ACLE intrinsics are available. This implies that
23552357`__ARM_FEATURE_MATMUL_INT8` and `__ARM_FEATURE_SVE` are both nonzero.
23562358
2359+ ##### Multiplication of modal 8-bit floating-point matrices
2360+
2361+ This section is in
2362+ [**Alpha** state](#current-status-and-anticipated-changes) and might change or be
2363+ extended in the future.
2364+
2365+ `__ARM_FEATURE_F8F16MM` is defined to `1` if there is hardware support
2366+ for the NEON and SVE modal 8-bit floating-point matrix multiply-accumulate to half-precision (FEAT_F8F16MM)
2367+ instructions and if the associated ACLE intrinsics are available.
2368+
2369+ `__ARM_FEATURE_F8F32MM` is defined to `1` if there is hardware support
2370+ for the NEON and SVE modal 8-bit floating-point matrix multiply-accumulate to single-precision (FEAT_F8F32MM)
2371+ instructions and if the associated ACLE intrinsics are available.
2372+
2373+ ##### Multiplication of 16-bit floating-point matrices
2374+
2375+ `__ARM_FEATURE_SVE_F16F32MM` is defined to `1` if there is hardware support
2376+ for the SVE 16-bit floating-point to 32-bit floating-point matrix multiply and add
2377+ (FEAT_SVE_F16F32MM) instructions and if the associated ACLE intrinsics are available.
2378+
23572379##### Multiplication of 32-bit floating-point matrices
23582380
23592381`__ARM_FEATURE_SVE_MATMUL_FP32` is defined to `1` if there is hardware support
@@ -2646,6 +2668,9 @@ be found in [[BA]](#BA).
26462668| [`__ARM_FEATURE_SVE_BITS`](#scalable-vector-extension-sve) | The number of bits in an SVE vector, when known in advance | 256 |
26472669| [`__ARM_FEATURE_SVE_MATMUL_FP32`](#multiplication-of-32-bit-floating-point-matrices) | 32-bit floating-point matrix multiply extension (FEAT_F32MM) | 1 |
26482670| [`__ARM_FEATURE_SVE_MATMUL_FP64`](#multiplication-of-64-bit-floating-point-matrices) | 64-bit floating-point matrix multiply extension (FEAT_F64MM) | 1 |
2671+ | [`__ARM_FEATURE_F8F16MM`](#multiplication-of-modal-8-bit-floating-point-matrices) | Modal 8-bit floating-point matrix multiply-accumulate to half-precision extension (FEAT_F8F16MM) | 1 |
2672+ | [`__ARM_FEATURE_F8F32MM`](#multiplication-of-modal-8-bit-floating-point-matrices) | Modal 8-bit floating-point matrix multiply-accumulate to single-precision extension (FEAT_F8F32MM) | 1 |
2673+ | [`__ARM_FEATURE_SVE_F16F32MM`](#multiplication-of-16-bit-floating-point-matrices) | 16-bit floating-point matrix multiply-accumulate to single-precision extension (FEAT_SVE_F16F32MM) | 1 |
26492674| [`__ARM_FEATURE_SVE_MATMUL_INT8`](#multiplication-of-8-bit-integer-matrices) | SVE support for the integer matrix multiply extension (FEAT_I8MM) | 1 |
26502675| [`__ARM_FEATURE_SVE_PREDICATE_OPERATORS`](#scalable-vector-extension-sve) | Level of support for C and C++ operators on SVE vector types | 1 |
26512676| [`__ARM_FEATURE_SVE_VECTOR_OPERATORS`](#scalable-vector-extension-sve) | Level of support for C and C++ operators on SVE predicate types | 1 |
@@ -9383,6 +9408,31 @@ BFloat16 floating-point multiply vectors.
93839408 uint64_t imm_idx);
93849409 ```
93859410
9411+ ### SVE2 floating-point matrix multiply-accumulate instructions.
9412+
9413+ #### FMMLA (widening, FP8 to FP16)
9414+
9415+ Modal 8-bit floating-point matrix multiply-accumulate to half-precision.
9416+ ```c
9417+ // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_F8F16MM)
9418+ svfloat16_t svmmla[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm);
9419+ ```
9420+
9421+ #### FMMLA (widening, FP8 to FP32)
9422+
9423+ Modal 8-bit floating-point matrix multiply-accumulate to single-precision.
9424+ ```c
9425+ // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_F8F32MM)
9426+ svfloat32_t svmmla[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm);
9427+ ```
9428+ #### FMMLA (widening, FP16 to FP32)
9429+
9430+ 16-bit floating-point matrix multiply-accumulate to single-precision.
9431+ ```c
9432+ // Only if __ARM_FEATURE_SVE_F16F32MM
9433+ svfloat32_t svmmla[_f32_f16](svfloat32_t zda, svfloat16_t zn, svfloat16_t zm);
9434+ ```
9435+
93869436### SVE2.1 instruction intrinsics
93879437
93889438The specification for SVE2.1 is in
0 commit comments