Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions Bender.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,30 @@ sources:
- vendor/opene906/E906_RTL_FACTORY/gen_rtl/fpu/rtl/pa_fpu_dp.v
- vendor/opene906/E906_RTL_FACTORY/gen_rtl/fpu/rtl/pa_fpu_frbus.v
- vendor/opene906/E906_RTL_FACTORY/gen_rtl/fpu/rtl/pa_fpu_src_type.v
# - vendor/openc910/C910_RTL_FACTORY/gen_rtl/clk/rtl/gated_clk_cell.v # same as the one from E906
- vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_ctrl.v
- vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_double.v
- vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_ff1.v
- vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_pack.v
- vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_prepare.v
- vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_round.v
- vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_scalar_dp.v
- vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_srt_radix16_bound_table.v
- vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_srt_radix16_with_sqrt.v
- vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_srt.v
- vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_top.v
- src/fpnew_divsqrt_th_32.sv
- src/fpnew_divsqrt_th_64_multi.sv
- src/fpnew_divsqrt_multi.sv
- src/fpnew_fma.sv
- src/fpnew_fma_multi.sv
- src/fpnew_sdotp_multi.sv
- src/fpnew_sdotp_multi_wrapper.sv
- src/fpnew_noncomp.sv
- src/mxdotp/fpnew_mxdotp_multi_pkg.sv
- src/mxdotp/fpnew_mxdotp_multi_modules.sv
- src/fpnew_mxdotp_multi.sv
- src/fpnew_mxdotp_multi_wrapper.sv
- src/fpnew_opgroup_block.sv
- src/fpnew_opgroup_fmt_slice.sv
- src/fpnew_opgroup_multifmt_slice.sv
Expand Down
2 changes: 1 addition & 1 deletion README.license.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@

FPnew is released under the *SolderPad Hardware License*, which is a permissive license based on Apache 2.0. Please refer to the [SolderPad license file](LICENSE.solderpad) for further information.

The T-Head E906 DivSqrt unit, integrated into FPnew in [`vendor/opene906`](vendor/opene906), is reseased under the *Apache License, Version 2.0*. Please refer to the [Apache 2.0 license file](LICENSE.apache) for further information.
The T-Head E906 and C910 DivSqrt units, integrated into FPnew in [`vendor/opene906`](vendor/opene906) and [`vendor/openc910`](vendor/openc910), are reseased under the *Apache License, Version 2.0*. Please refer to the [Apache 2.0 license file](LICENSE.apache) for further information.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@

Parametric floating-point unit with support for standard RISC-V formats and operations as well as transprecision formats, written in SystemVerilog.

Maintainer: Luca Bertaccini <lbertaccini@iis.ee.ethz.ch><br>
Principal Author: Stefan Mach <smach@iis.ee.ethz.ch>
Current Maintainer: Gamze İslamoğlu <gislamoglu@iis.ee.ethz.ch><br>
Past Maintainer: Luca Bertaccini <lbertaccini@iis.ee.ethz.ch><br>
Main Author: Stefan Mach <smach@iis.ee.ethz.ch>

## Features

Expand Down Expand Up @@ -88,8 +89,7 @@ It is discouraged to `import` all of `fpnew_pkg` into your source files. Instead
fpnew_top #(
.Features ( fpnew_pkg::RV64D ),
.Implementation ( fpnew_pkg::DEFAULT_NOREGS ),
.TagType ( logic ),
.PulpDivsqrt ( 1'b1 )
.TagType ( logic )
) i_fpnew_top (
.clk_i,
.rst_ni,
Expand Down
39 changes: 39 additions & 0 deletions docs/CHANGELOG-PULP.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,45 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) a
In this sense, we interpret the "Public API" of a hardware module as its port/parameter list.
Versions of the IP in the same major relase are "pin-compatible" with each other. Minor relases are permitted to add new parameters as long as their default bindings ensure backwards compatibility.

## [Unreleased]

### Added
- Add FP6(E3M2), FP6ALT(E2M3), and FP4(E2M1) floating-point formats
- Add MXDOTP Microscaling dot product multi-format operation group
- Supports source formats: FP8, FP8ALT, FP6, FP6ALT, FP4, INT8
- Supports destination formats: FP32, FP16ALT
- Scaled dot-product and accumulation support with two 8-bit exponent scale factors

### Changed
- Extend classifier to support MX-specific special cases for FP6, FP6ALT, FP4 formats
- Increase number of supported FP formats from 6 to 9
- Increase number of opgroups from 5 to 6

### Notes
- MXDOTP implementation tested with all element formats enabled, but not yet exhaustively tested with all possible combinations of enabled formats.
- Known limitations documented in TODO comments (see source files for details)

## [pulp-v0.2.3] - 2024-09-27

### Fix
- Fix illegal Verilog `'0`

## [pulp-v0.2.2] - 2024-06-24

### Added
- Add FP16ALT support to THMULTI DivSqrt

## [pulp-v0.2.1] - 2024-06-07

### Fix
- Fix synchronization of THMULTI DivSqrt lanes when FP16ALT, FP8, or FP8ALT are enabled.

## [pulp-v0.2.0] - 2024-05-29

### Added
- Add support for alternative multi-format DivSqrt unit (from openC910), supporting FP64, FP32, FP16 and SIMD operations
- Replace `PulpDivsqrt` top-level parameter with `DivSqrtSel` to choose among the legacy PULP DivSqrt unit (`PULP`), the openE906 DivSqrt (`TH32`), and the openC910 DivSqrt (`THMULTI`). The default choice is set to `THMULTI`

## [pulp-v0.1.3] - 2023-07-19

### Fixed
Expand Down
2 changes: 1 addition & 1 deletion docs/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# Global owners
* @lucabertaccini
* @gamzeisl
51 changes: 35 additions & 16 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ For more in-depth explanations on how to configure the unit and the layout of th
|------------------|------------------------------------------------------------------------------------------------------------------------------|
| `Features` | Specifies the features of the FPU, such as the set of supported formats and operations. |
| `Implementation` | Allows to control how the above features are implemented, such as the number of pipeline stages and architecture of subunits |
| `DivSqrtSel` | Chooses among the three supported DivSqrt units |
| `TagType` | The SystemVerilog data type of the operation tag |
| `TrueSIMDClass` | If enabled, the result of a classify operation in vectorial mode will be RISC-V compliant if each output has at least 10 bits|
| `EnableSIMDMask` | Enable the RISC-V floating-point status flags masking of inactive vectorial lanes. When disabled, `simd_mask_i` is inactive |
Expand Down Expand Up @@ -108,8 +109,10 @@ Unless noted otherwise, the first operand `op[0]` is used for the operation.
| `ADD` | `0` | Addition (`op[1] + op[2]`) *note the operand indices* |
| `ADD` | `1` | Subtraction (`op[1] - op[2]`) *note the operand indices* |
| `MUL` | `0` | Multiplication (`op[0] * op[1]`) |
| `SDOTP` | `0` | Sum of dot product ) |
| `VSUM` | `0` | Vector Inner Sum ) |
| `SDOTP` | `0` | Sum of dot product |
| `VSUM` | `0` | Vector Inner Sum |
| `MXDOTPF` | `0` | Microscaling FP scaled dot product and accumulate |
| `MXDOTPI` | `0` | Microscaling INT scaled dot product and accumulate |
| `DIV` | `0` | Division (`op[0] / op[1]`) |
| `SQRT` | `0` | Square root |
| `SGNJ` | `0` | Sign injection, operation encoded in rounding mode<br>`RNE`: `op[0]` with `sign(op[1])`<br>`RTZ`: `op[0]` with `~sign(op[1])`<br>`RDN`: `op[0]` with `sign(op[0]) ^ sign(op[1])`<br>`RUP`: `op[0]` (passthrough) |
Expand All @@ -129,7 +132,7 @@ Unless noted otherwise, the first operand `op[0]` is used for the operation.

##### `fp_format_e` - FP Formats

Enumeration of type `logic [2:0]` holding the supported FP formats.
Enumeration of type `logic [3:0]` holding the supported FP formats.

| Enumerator | Format | Width | Exp. Bits | Man. Bits |
| ---------- | ------------- | -----: | :-------: | :-------: |
Expand All @@ -139,10 +142,13 @@ Enumeration of type `logic [2:0]` holding the supported FP formats.
| `FP8` | binary8 | 8 bit | 5 | 2 |
| `FP16ALT` | binary16alt | 16 bit | 8 | 7 |
| `FP8ALT` | binary8alt | 8 bit | 4 | 3 |
| `FP6` | binary6 | 6 bit | 3 | 2 |
| `FP6ALT` | binary6alt | 6 bit | 2 | 3 |
| `FP4` | binary4 | 4 bit | 2 | 1 |

The following global parameters associated with FP formats are set in `fpnew_pkg`:
```SystemVerilog
localparam int unsigned NUM_FP_FORMATS = 6;
localparam int unsigned NUM_FP_FORMATS = 9;
localparam int unsigned FP_FORMAT_BITS = $clog2(NUM_FP_FORMATS);
```

Expand Down Expand Up @@ -285,7 +291,7 @@ Otherwise, synthesis tools can optimize away any logic associated with this form

#### `Implementation` - Implementation Options

The FPU is divided into five operation groups, `ADDMUL`, `DIVSQRT`, `NONDOMP`, `CONV`, and `DOTP` (see [Architecture: Top-Level](#top-level)).
The FPU is divided into six operation groups: `ADDMUL`, `DIVSQRT`, `NONCOMP`, `CONV`, `DOTP`, and `MXDOTP` (see [Architecture: Top-Level](#top-level)).
The `Implementation` parameter controls the implementation of these operation groups.
It is of type `fpu_implementation_t` which is defined as:
```SystemVerilog
Expand Down Expand Up @@ -327,18 +333,19 @@ The unit type `unit_type_t` is an enumeration of type `logic [1:0]` holding the
The `UnitTypes` parameter allows to control resources used for the FPU by either removing operation units for certain formats and operations, or merging multiple formats into one.
Currently, the follwoing unit types are available for the FPU operation groups:

| | `ADDMUL` | `DIVSQRT` | `NONCOMP` | `CONV` | `DOTP` |
|------------|--------------------|--------------------|--------------------|--------------------|--------------------|
| `PARALLEL` | :heavy_check_mark: | | :heavy_check_mark: | | |
| `MERGED` | :heavy_check_mark: | :heavy_check_mark: | | :heavy_check_mark: | :heavy_check_mark: |
| | `ADDMUL` | `DIVSQRT` | `NONCOMP` | `CONV` | `DOTP` | `MXDOTP` |
|------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
| `PARALLEL` | :heavy_check_mark: | | :heavy_check_mark: | | | |
| `MERGED` | :heavy_check_mark: | :heavy_check_mark: | | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |

*Default*:
```SystemVerilog
'{'{default: PARALLEL}, // ADDMUL
'{default: MERGED}, // DIVSQRT
'{default: PARALLEL}, // NONCOMP
'{default: MERGED}, // CONV`
'{default: DISABLED}} // DOTP`
'{'{default: PARALLEL}, // ADDMUL
'{default: MERGED}, // DIVSQRT
'{default: PARALLEL}, // NONCOMP
'{default: MERGED}, // CONV
'{default: DISABLED}, // DOTP
'{default: DISABLED}} // MXDOTP
```
(all formats within operation group use same type)

Expand All @@ -358,7 +365,18 @@ The configuration `pipe_config_t` is an enumeration of type `logic [1:0]` holdi
| `INSIDE` | All registers are inserted at roughly the middle of the operational unit (if not possible, `BEFORE`) |
| `DISTRIBUTED` | Registers are evenly distributed to `INSIDE`, `BEFORE`, and `AFTER` (if no `INSIDE`, all `BEFORE`) |

### `Stochastic Rounding Implementation`
#### `Division and Square-Root Unit Selection`
The `DivSqrtSel` parameter is used to choose among the support DivSqrt units.
It is of type `divsqrt_unit_t`, which is defined as:
```SystemVerilog
typedef enum logic[1:0] {
PULP, // "PULP" instantiates the PULP DivSqrt unit supports FP64, FP32, FP16, FP16ALT, FP8 and SIMD operations
TH32, // "TH32" instantiates the E906 DivSqrt unit supports only FP32 (no SIMD support)
THMULTI // "THMULTI" instantiates the C910 DivSqrt unit supports FP64, FP32, FP16, FP16ALT and SIMD operations
} divsqrt_unit_t;
```

#### `Stochastic Rounding Implementation`

The `StochasticRndImplementation` parameter is used to configure the RSR support.
It is of type `rsr_impl_t` which is defined as:
Expand Down Expand Up @@ -425,7 +443,7 @@ The *operation group* is the highest level of grouping within FPnew and signifie

![FPnew](fig/top_block.png)

There are currently five operation groups in FPnew which are enumerated in `opgroup_e` as outlined in the following table:
There are currently six operation groups in FPnew which are enumerated in `opgroup_e` as outlined in the following table:

| Enumerator | Description | Associated Operations |
|------------|-----------------------------------------------|---------------------------------------|
Expand All @@ -434,6 +452,7 @@ There are currently five operation groups in FPnew which are enumerated in `opgr
| `NONCOMP` | Non-Computational Operations like Comparisons | `SGNJ`, `MINMAX`, `CMP`, `CLASS` |
| `CONV` | Conversions | `F2I`, `I2F`, `F2F`, `CPKAB`, `CPKCD` |
| `DOTP` | Dot Products | `SDOTP`, `EXVSUM`, `VSUM` |
| `MXDOTP` | Microscaling Dot Products | `MXDOTPF`, `MXDOTPI` |

Most architectural decisions for FPnew are made at very fine granularity.
The big exception to this is the generation of vectorial hardware which is decided at top level through the `EnableVectors` parameter.
Expand Down
28 changes: 23 additions & 5 deletions src/fpnew_classifier.sv
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
module fpnew_classifier #(
parameter fpnew_pkg::fp_format_e FpFormat = fpnew_pkg::fp_format_e'(0),
parameter int unsigned NumOperands = 1,
parameter int unsigned MX = 0,
// Do not change
localparam int unsigned WIDTH = fpnew_pkg::fp_width(FpFormat)
) (
Expand Down Expand Up @@ -51,13 +52,30 @@ module fpnew_classifier #(
// Classify Input
// ---------------
always_comb begin : classify_input
value = operands_i[op];
is_boxed = is_boxed_i[op];
is_normal = is_boxed && (value.exponent != '0) && (value.exponent != '1);
value = operands_i[op];
is_boxed = is_boxed_i[op];

if (MX == 1 && FpFormat == fpnew_pkg::fp_format_e'(fpnew_pkg::FP8ALT)) begin
// E4M3: No infinity, NaN when exp=all1s and man=all1s
is_inf = 1'b0;
is_nan = !is_boxed || ((value.exponent == '1) && (value.mantissa == '1));
is_normal = is_boxed && (value.exponent != '0) && !is_nan;
end else if (MX == 1 && (FpFormat == fpnew_pkg::fp_format_e'(fpnew_pkg::FP6) ||
FpFormat == fpnew_pkg::fp_format_e'(fpnew_pkg::FP6ALT) ||
FpFormat == fpnew_pkg::fp_format_e'(fpnew_pkg::FP4))) begin
// E3M2, E2M3, E2M1: No infinity or NaN
is_inf = 1'b0;
is_nan = 1'b0;
is_normal = is_boxed && (value.exponent != '0);
end else begin
// Standard IEEE-754 classification (for all other formats and MX=0)
is_inf = is_boxed && ((value.exponent == '1) && (value.mantissa == '0));
is_nan = !is_boxed || ((value.exponent == '1) && (value.mantissa != '0));
is_normal = is_boxed && (value.exponent != '0) && (value.exponent != '1);
end

is_zero = is_boxed && (value.exponent == '0) && (value.mantissa == '0);
is_subnormal = is_boxed && (value.exponent == '0) && !is_zero;
is_inf = is_boxed && ((value.exponent == '1) && (value.mantissa == '0));
is_nan = !is_boxed || ((value.exponent == '1) && (value.mantissa != '0));
is_signalling = is_boxed && is_nan && (value.mantissa[MAN_BITS-1] == 1'b0);
is_quiet = is_nan && !is_signalling;
// Assign output for current input
Expand Down
Loading