pulp-platform · rgiunti · May 23, 2024 · Jun 3, 2024 · Jun 7, 2024 · Jun 26, 2024
@@ -25,13 +25,30 @@ sources:
   - vendor/opene906/E906_RTL_FACTORY/gen_rtl/fpu/rtl/pa_fpu_dp.v
   - vendor/opene906/E906_RTL_FACTORY/gen_rtl/fpu/rtl/pa_fpu_frbus.v
   - vendor/opene906/E906_RTL_FACTORY/gen_rtl/fpu/rtl/pa_fpu_src_type.v
+#  - vendor/openc910/C910_RTL_FACTORY/gen_rtl/clk/rtl/gated_clk_cell.v # same as the one from E906
+  - vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_ctrl.v
+  - vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_double.v
+  - vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_ff1.v
+  - vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_pack.v
+  - vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_prepare.v
+  - vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_round.v
+  - vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_scalar_dp.v
+  - vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_srt_radix16_bound_table.v
+  - vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_srt_radix16_with_sqrt.v
+  - vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_srt.v
+  - vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_top.v
   - src/fpnew_divsqrt_th_32.sv
+  - src/fpnew_divsqrt_th_64_multi.sv
   - src/fpnew_divsqrt_multi.sv
   - src/fpnew_fma.sv
   - src/fpnew_fma_multi.sv
   - src/fpnew_sdotp_multi.sv
   - src/fpnew_sdotp_multi_wrapper.sv
   - src/fpnew_noncomp.sv
+  - src/mxdotp/fpnew_mxdotp_multi_pkg.sv
+  - src/mxdotp/fpnew_mxdotp_multi_modules.sv
+  - src/fpnew_mxdotp_multi.sv
+  - src/fpnew_mxdotp_multi_wrapper.sv
   - src/fpnew_opgroup_block.sv
   - src/fpnew_opgroup_fmt_slice.sv
   - src/fpnew_opgroup_multifmt_slice.sv

@@ -2,4 +2,4 @@
 
 FPnew is released under the *SolderPad Hardware License*, which is a permissive license based on Apache 2.0. Please refer to the [SolderPad license file](LICENSE.solderpad) for further information.
 
-The T-Head E906 DivSqrt unit, integrated into FPnew in [`vendor/opene906`](vendor/opene906), is reseased under the *Apache License, Version 2.0*. Please refer to the [Apache 2.0 license file](LICENSE.apache) for further information.
+The T-Head E906 and C910 DivSqrt units, integrated into FPnew in [`vendor/opene906`](vendor/opene906) and [`vendor/openc910`](vendor/openc910), are reseased under the *Apache License, Version 2.0*. Please refer to the [Apache 2.0 license file](LICENSE.apache) for further information.
@@ -2,8 +2,9 @@
 
 Parametric floating-point unit with support for standard RISC-V formats and operations as well as transprecision formats, written in SystemVerilog.
 
-Maintainer: Luca Bertaccini <lbertaccini@iis.ee.ethz.ch><br>
-Principal Author: Stefan Mach <smach@iis.ee.ethz.ch>
+Current Maintainer: Gamze İslamoğlu <gislamoglu@iis.ee.ethz.ch><br>
+Past Maintainer: Luca Bertaccini <lbertaccini@iis.ee.ethz.ch><br>
+Main Author: Stefan Mach <smach@iis.ee.ethz.ch>
 
 ## Features
 
@@ -88,8 +89,7 @@ It is discouraged to `import` all of `fpnew_pkg` into your source files. Instead
 fpnew_top #(
   .Features       ( fpnew_pkg::RV64D          ),
   .Implementation ( fpnew_pkg::DEFAULT_NOREGS ),
-  .TagType        ( logic                     ),
-  .PulpDivsqrt    ( 1'b1                      )
+  .TagType        ( logic                     )
 ) i_fpnew_top (
   .clk_i,
   .rst_ni,

@@ -7,6 +7,45 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) a
 In this sense, we interpret the "Public API" of a hardware module as its port/parameter list.
 Versions of the IP in the same major relase are "pin-compatible" with each other. Minor relases are permitted to add new parameters as long as their default bindings ensure backwards compatibility.
 
+## [Unreleased]
+
+### Added
+- Add FP6(E3M2), FP6ALT(E2M3), and FP4(E2M1) floating-point formats
+- Add MXDOTP Microscaling dot product multi-format operation group
+  - Supports source formats: FP8, FP8ALT, FP6, FP6ALT, FP4, INT8
+  - Supports destination formats: FP32, FP16ALT
+  - Scaled dot-product and accumulation support with two 8-bit exponent scale factors
+
+### Changed
+- Extend classifier to support MX-specific special cases for FP6, FP6ALT, FP4 formats
+- Increase number of supported FP formats from 6 to 9
+- Increase number of opgroups from 5 to 6
+
+### Notes
+- MXDOTP implementation tested with all element formats enabled, but not yet exhaustively tested with all possible combinations of enabled formats.
+- Known limitations documented in TODO comments (see source files for details)
+
+## [pulp-v0.2.3] - 2024-09-27
+
+### Fix
+- Fix illegal Verilog `'0`
+
+## [pulp-v0.2.2] - 2024-06-24
+
+### Added
+- Add FP16ALT support to THMULTI DivSqrt
+
+## [pulp-v0.2.1] - 2024-06-07
+
+### Fix
+- Fix synchronization of THMULTI DivSqrt lanes when FP16ALT, FP8, or FP8ALT are enabled.
+
+## [pulp-v0.2.0] - 2024-05-29
+
+### Added
+- Add support for alternative multi-format DivSqrt unit (from openC910), supporting FP64, FP32, FP16 and SIMD operations
+- Replace `PulpDivsqrt` top-level parameter with `DivSqrtSel` to choose among the legacy PULP DivSqrt unit (`PULP`), the openE906 DivSqrt (`TH32`), and the openC910 DivSqrt (`THMULTI`). The default choice is set to `THMULTI`
+
 ## [pulp-v0.1.3] - 2023-07-19
 
 ### Fixed

@@ -1,2 +1,2 @@
 # Global owners
-*	@lucabertaccini
+*	@gamzeisl
@@ -37,6 +37,7 @@ For more in-depth explanations on how to configure the unit and the layout of th
 |------------------|------------------------------------------------------------------------------------------------------------------------------|
 | `Features`       | Specifies the features of the FPU, such as the set of supported formats and operations.                                      |
 | `Implementation` | Allows to control how the above features are implemented, such as the number of pipeline stages and architecture of subunits |
+| `DivSqrtSel`     | Chooses among the three supported DivSqrt units                                                                              |
 | `TagType`        | The SystemVerilog data type of the operation tag                                                                             |
 | `TrueSIMDClass`  | If enabled, the result of a classify operation in vectorial mode will be RISC-V compliant if each output has at least 10 bits|
 | `EnableSIMDMask` | Enable the RISC-V floating-point status flags masking of inactive vectorial lanes. When disabled, `simd_mask_i` is inactive  |
@@ -108,8 +109,10 @@ Unless noted otherwise, the first operand `op[0]` is used for the operation.
 | `ADD`      | `0`      | Addition (`op[1] + op[2]`) *note the operand indices*                                                                                                                                                            |
 | `ADD`      | `1`      | Subtraction (`op[1] - op[2]`) *note the operand indices*                                                                                                                                                         |
 | `MUL`      | `0`      | Multiplication (`op[0] * op[1]`)                                                                                                                                                                                 |
-| `SDOTP`    | `0`      | Sum of dot product )                                                                                                                                                                                 |
-| `VSUM`     | `0`      | Vector Inner Sum )                                                                                                                                                                                 |
+| `SDOTP`    | `0`      | Sum of dot product                                                                                                                                                                                               |
+| `VSUM`     | `0`      | Vector Inner Sum                                                                                                                                                                                                 |
+| `MXDOTPF`  | `0`      | Microscaling FP scaled dot product and accumulate                                                                                                                                                 |
+| `MXDOTPI`  | `0`      | Microscaling INT scaled dot product and accumulate |
 | `DIV`      | `0`      | Division (`op[0] / op[1]`)                                                                                                                                                                                       |
 | `SQRT`     | `0`      | Square root                                                                                                                                                                                                      |
 | `SGNJ`     | `0`      | Sign injection, operation encoded in rounding mode<br>`RNE`: `op[0]` with `sign(op[1])`<br>`RTZ`: `op[0]` with `~sign(op[1])`<br>`RDN`: `op[0]` with `sign(op[0]) ^ sign(op[1])`<br>`RUP`: `op[0]` (passthrough) |
@@ -129,7 +132,7 @@ Unless noted otherwise, the first operand `op[0]` is used for the operation.
 
 ##### `fp_format_e` - FP Formats
 
-Enumeration of type `logic [2:0]` holding the supported FP formats.
+Enumeration of type `logic [3:0]` holding the supported FP formats.
 
 | Enumerator | Format        | Width  | Exp. Bits | Man. Bits |
 | ---------- | ------------- | -----: | :-------: | :-------: |
@@ -139,10 +142,13 @@ Enumeration of type `logic [2:0]` holding the supported FP formats.
 | `FP8`      | binary8       | 8 bit  | 5         | 2         |
 | `FP16ALT`  | binary16alt   | 16 bit | 8         | 7         |
 | `FP8ALT`   | binary8alt    | 8 bit  | 4         | 3         |
+| `FP6`      | binary6       | 6 bit  | 3         | 2         |
+| `FP6ALT`   | binary6alt    | 6 bit  | 2         | 3         |
+| `FP4`      | binary4       | 4 bit  | 2         | 1         |
 
 The following global parameters associated with FP formats are set in `fpnew_pkg`:
 ```SystemVerilog
-localparam int unsigned NUM_FP_FORMATS = 6;
+localparam int unsigned NUM_FP_FORMATS = 9;
 localparam int unsigned FP_FORMAT_BITS = $clog2(NUM_FP_FORMATS);
 ```
 
@@ -285,7 +291,7 @@ Otherwise, synthesis tools can optimize away any logic associated with this form
 
 #### `Implementation` - Implementation Options
 
-The FPU is divided into five operation groups,  `ADDMUL`, `DIVSQRT`, `NONDOMP`, `CONV`, and `DOTP` (see [Architecture: Top-Level](#top-level)).
+The FPU is divided into six operation groups: `ADDMUL`, `DIVSQRT`, `NONCOMP`, `CONV`, `DOTP`, and `MXDOTP` (see [Architecture: Top-Level](#top-level)).
 The `Implementation` parameter controls the implementation of these operation groups.
 It is of type `fpu_implementation_t` which is defined as:
 ```SystemVerilog
@@ -327,18 +333,19 @@ The unit type `unit_type_t` is an enumeration of type `logic [1:0]` holding the
 The `UnitTypes` parameter allows to control resources used for the FPU by either removing operation units for certain formats and operations, or merging multiple formats into one.
 Currently, the follwoing unit types are available for the FPU operation groups:
 
-|            |      `ADDMUL`      |     `DIVSQRT`      |     `NONCOMP`      |       `CONV`       |       `DOTP`       |
-|------------|--------------------|--------------------|--------------------|--------------------|--------------------|
-| `PARALLEL` | :heavy_check_mark: |                    | :heavy_check_mark: |                    |                    |
-| `MERGED`   | :heavy_check_mark: | :heavy_check_mark: |                    | :heavy_check_mark: | :heavy_check_mark: |
+|            |      `ADDMUL`      |     `DIVSQRT`      |     `NONCOMP`      |       `CONV`       |       `DOTP`       |      `MXDOTP`      |
+|------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
+| `PARALLEL` | :heavy_check_mark: |                    | :heavy_check_mark: |                    |                    |                    |
+| `MERGED`   | :heavy_check_mark: | :heavy_check_mark: |                    | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
 
 *Default*:
 ```SystemVerilog
-'{'{default: PARALLEL}, // ADDMUL
-  '{default: MERGED},   // DIVSQRT
-  '{default: PARALLEL}, // NONCOMP
-  '{default: MERGED},   // CONV`
-  '{default: DISABLED}} // DOTP`
+'{'{default: PARALLEL},  // ADDMUL
+  '{default: MERGED},    // DIVSQRT
+  '{default: PARALLEL},  // NONCOMP
+  '{default: MERGED},    // CONV
+  '{default: DISABLED},  // DOTP
+  '{default: DISABLED}}  // MXDOTP
 ```
 (all formats within operation group use same type)
 
@@ -358,7 +365,18 @@ The configuration  `pipe_config_t` is an enumeration of type `logic [1:0]` holdi
 | `INSIDE`      | All registers are inserted at roughly the middle of the operational unit (if not possible, `BEFORE`) |
 | `DISTRIBUTED` | Registers are evenly distributed to `INSIDE`, `BEFORE`, and `AFTER` (if no `INSIDE`, all `BEFORE`)   |
 
-### `Stochastic Rounding Implementation`
+#### `Division and Square-Root Unit Selection`
+The `DivSqrtSel` parameter is used to choose among the support DivSqrt units.
+It is of type `divsqrt_unit_t`, which is defined as:
+```SystemVerilog
+typedef enum logic[1:0] {
+  PULP,    // "PULP" instantiates the PULP DivSqrt unit supports FP64, FP32, FP16, FP16ALT, FP8 and SIMD operations
+  TH32,    // "TH32" instantiates the E906 DivSqrt unit supports only FP32 (no SIMD support)
+  THMULTI  // "THMULTI" instantiates the C910 DivSqrt unit supports FP64, FP32, FP16, FP16ALT and SIMD operations
+} divsqrt_unit_t;
+```
+
+#### `Stochastic Rounding Implementation`
 
 The `StochasticRndImplementation` parameter is used to configure the RSR support.
 It is of type `rsr_impl_t` which is defined as:
@@ -425,7 +443,7 @@ The *operation group* is the highest level of grouping within FPnew and signifie
 
 ![FPnew](fig/top_block.png)
 
-There are currently five operation groups in FPnew which are enumerated in `opgroup_e` as outlined in the following table:
+There are currently six operation groups in FPnew which are enumerated in `opgroup_e` as outlined in the following table:
 
 | Enumerator |                  Description                  |         Associated Operations         |
 |------------|-----------------------------------------------|---------------------------------------|
@@ -434,6 +452,7 @@ There are currently five operation groups in FPnew which are enumerated in `opgr
 | `NONCOMP`  | Non-Computational Operations like Comparisons | `SGNJ`, `MINMAX`, `CMP`, `CLASS`      |
 | `CONV`     | Conversions                                   | `F2I`, `I2F`, `F2F`, `CPKAB`, `CPKCD` |
 | `DOTP`     | Dot Products                                  | `SDOTP`, `EXVSUM`, `VSUM`             |
+| `MXDOTP`   | Microscaling Dot Products                     | `MXDOTPF`, `MXDOTPI`                  |
 
 Most architectural decisions for FPnew are made at very fine granularity.
 The big exception to this is the generation of vectorial hardware which is decided at top level through the `EnableVectors` parameter.

@@ -16,6 +16,7 @@
 module fpnew_classifier #(
   parameter fpnew_pkg::fp_format_e   FpFormat = fpnew_pkg::fp_format_e'(0),
   parameter int unsigned             NumOperands = 1,
+  parameter int unsigned             MX = 0,
   // Do not change
   localparam int unsigned WIDTH = fpnew_pkg::fp_width(FpFormat)
 ) (
@@ -51,13 +52,30 @@ module fpnew_classifier #(
     // Classify Input
     // ---------------
     always_comb begin : classify_input
-      value         = operands_i[op];
-      is_boxed      = is_boxed_i[op];
-      is_normal     = is_boxed && (value.exponent != '0) && (value.exponent != '1);
+      value    = operands_i[op];
+      is_boxed = is_boxed_i[op];
+
+      if (MX == 1 && FpFormat == fpnew_pkg::fp_format_e'(fpnew_pkg::FP8ALT)) begin
+        // E4M3: No infinity, NaN when exp=all1s and man=all1s
+        is_inf    = 1'b0;
+        is_nan    = !is_boxed || ((value.exponent == '1) && (value.mantissa == '1));
+        is_normal = is_boxed && (value.exponent != '0) && !is_nan;
+      end else if (MX == 1 && (FpFormat == fpnew_pkg::fp_format_e'(fpnew_pkg::FP6) ||
+                                FpFormat == fpnew_pkg::fp_format_e'(fpnew_pkg::FP6ALT) ||
+                                FpFormat == fpnew_pkg::fp_format_e'(fpnew_pkg::FP4))) begin
+        // E3M2, E2M3, E2M1: No infinity or NaN
+        is_inf    = 1'b0;
+        is_nan    = 1'b0;
+        is_normal = is_boxed && (value.exponent != '0);
+      end else begin
+        // Standard IEEE-754 classification (for all other formats and MX=0)
+        is_inf    = is_boxed && ((value.exponent == '1) && (value.mantissa == '0));
+        is_nan    = !is_boxed || ((value.exponent == '1) && (value.mantissa != '0));
+        is_normal = is_boxed && (value.exponent != '0) && (value.exponent != '1);
+      end
+
       is_zero       = is_boxed && (value.exponent == '0) && (value.mantissa == '0);
       is_subnormal  = is_boxed && (value.exponent == '0) && !is_zero;
-      is_inf        = is_boxed && ((value.exponent == '1) && (value.mantissa == '0));
-      is_nan        = !is_boxed || ((value.exponent == '1) && (value.mantissa != '0));
       is_signalling = is_boxed && is_nan && (value.mantissa[MAN_BITS-1] == 1'b0);
       is_quiet      = is_nan && !is_signalling;
       // Assign output for current input
Original file line number	Diff line number	Diff line change
Expand Up		@@ -2,4 +2,4 @@

		FPnew is released under the SolderPad Hardware License, which is a permissive license based on Apache 2.0. Please refer to the [SolderPad license file](LICENSE.solderpad) for further information.

		The T-Head E906 DivSqrt unit, integrated into FPnew in [`vendor/opene906`](vendor/opene906), is reseased under the Apache License, Version 2.0. Please refer to the [Apache 2.0 license file](LICENSE.apache) for further information.
		The T-Head E906 and C910 DivSqrt units, integrated into FPnew in [`vendor/opene906`](vendor/opene906) and [`vendor/openc910`](vendor/openc910), are reseased under the Apache License, Version 2.0. Please refer to the [Apache 2.0 license file](LICENSE.apache) for further information.