Division Algorithms in FPGAs
The following two algorithms are being examined:
uint<N>_t numerator = num; // input
uint<N>_t denominator = den; // input
uint<N>_t quotient = 0;
while (numerator >= denominator) {
numerator -= denominator;
++quotient;
}
uint<N>_t quo = quotient; // output
uint<N>_t rmn = numerator; // outputThe implemented algorithm is documented in a .ppt about computer arithmetic from the UCCS.
uint<2*N + 1>_t numerator = num; // input
uint<N>_t denominator = den; // input
uint<N>_t quotient = 0;
uint<N>_t remainder = 0;
uint<log2(N-1)>_t k;
uint<N>_t ap = 0;
uint<N>_t an = 0;
k = leading_zeros(denominator);
denominator <<= k;
numerator <<= k;
for (int i = 0; i < N; ++i) {
switch (top_3_bits(numerator)) {
case 0:
case 7:
ap <<= 1;
an <<= 1;
numerator <<= 1;
break;
case 4:
case 5:
case 6:
ap <<= 1;
an = (an << 1) | 1;
numerator <<= 1;
numerator += (denominator << N);
break;
default: // case 1, 2, 3
ap = (ap << 1) | 1;
an <<= 1;
numerator <<= 1;
numerator -= (denominator << N);
break;
}
}
quotient = ap - an;
remainder = numerator >> N;
if (msb(numerator) == 1) {
remainder += denominator;
--quotient;
}
uint<N>_t quo = quotient; // output
uint<N>_t rmn = remainder >> k; // outputNotes:
apcan be integrated into the N LSBs of thenumeratorto save N flip-flops (see VHDL code)numeratoris namedpin the VHDL code (named(P,A)in the.ppt)numis namedAanddenis namedBin the.ppt
The performance of the algorithms is measured in termns of area, timing, throughput and latency (ATTL).
| 10 bits | 12 bits | N bits | ||||
|---|---|---|---|---|---|---|
| Rept. - | SRT | Rept. - | SRT | Rept. - | SRT | |
| Area (#ff) | a | b | c | d | ? | ? |
| Timing (MHz) | x | y | x | y | x | y |
| Throughput (bits/clk) | 2.3 | 1 | 2.4 | 1 | ? | 1 |
| Avg. Latency (#clk) | 4.3 *) | 10 | 5.0 *) | 12 | ? | N |
*) Determined by Simulation
a, b, c, d, x, y: TBD
| 10 bits | 12 bits | N bits | ||||
|---|---|---|---|---|---|---|
| Rept. - | SRT | Rept. - | SRT | Rept. - | SRT | |
| Area (#ff) | a | b | c | d | ? | ? |
| Timing (MHz) | u | v | w | x | ? | ? |
| Throughput (bits/clk) | 2.3 * M | M | 2.4 * M | M | ? | M |
| Avg. Latency (#clk) | 4.3 / M | 10 / M | 5.0 / M | 12 / M | ? | N / M |
M: Iterations per Cycle
a, b, c, d, u, v, w, x: TBD
Copyright © 2019 Dominik Müller and Nico Canzani
This project is licensed under the terms of the Apache License 2.0 - see the LICENSE file for details