Skip to content

Conversation

@lu-pinto
Copy link
Member

@lu-pinto lu-pinto commented Oct 31, 2025

This PR reimplements DIV and SDIV without using BigInteger from the JDK. Benchmarks included in this PR show it is significantly faster:
image

I've also made some changes to Mod to simplify the algorithm and in unit tests.

Fixed Issue(s)

Thanks for sending a pull request! Have you done the following?

  • Checked out our contribution guidelines?
  • Considered documentation and added the doc-change-required label to this PR if updates are required.
  • Considered the changelog and included an update if required.
  • For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

  • spotless: ./gradlew spotlessApply
  • unit tests: ./gradlew build
  • acceptance tests: ./gradlew acceptanceTest
  • integration tests: ./gradlew integrationTest
  • reference tests: ./gradlew ethereum:referenceTests:referenceTests
  • hive tests: Engine or other RPCs modified?

Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
@lu-pinto lu-pinto marked this pull request as draft October 31, 2025 20:14
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
@lu-pinto
Copy link
Member Author

Negligible impact on the MOD operation: MOD benchmarks

Comment on lines -538 to -541
long[] vLimbsAsLong = new long[n];
for (int i = 0; i < n; i++) {
vLimbsAsLong[i] = vLimbs[i] & MASK_L;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed - since this does not produce significant performance improvements

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this, did not observe a difference as well.

@lu-pinto lu-pinto marked this pull request as ready for review November 3, 2025 15:40
Copy link
Contributor

@macfarla macfarla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think it looks pretty good. couple of comments

@@ -0,0 +1,64 @@
/*
* Copyright ConsenSys AG.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Copyright ConsenSys AG.
* Copyright contributors to Besu.

### Additions and Improvements
- Update to vertx 4.5.22 [#9375](https://github.com/hyperledger/besu/pull/9375)
- Add `opcodes` optional parameter to RPC methods: `debug_standardTraceBlockToFile`, `debug_standardTraceBadBlockToFile`, `debug_traceBlockByNumber`, `debug_traceBlockByHash`, `debug_traceTransaction`, `debug_traceBlock`, `debug_traceCall` for tracing specified opcodes [#9335](https://github.com/hyperledger/besu/pull/9335)
- Improve DIV and SDIV performance [#9391](https://github.com/hyperledger/besu/pull/9391)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will need to move sections

private static int[] knuthDivision(final int[] numerator, final int[] denominator) {
int n = nSetLimbs(denominator);
if (n == 0) {
throw new ArithmeticException("divided by zero");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new ArithmeticException("divided by zero");
throw new ArithmeticException("division by zero");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: think division by zero is more commonly used wording

private static int[] knuthRemainder(final int[] dividend, final int[] modulus) {
int n = nSetLimbs(modulus);
if (n == 0) {
throw new ArithmeticException("divided by zero");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new ArithmeticException("divided by zero");
throw new ArithmeticException("division by zero");

final Bytes value1 = frame.popStackItem();
Bytes resultBytes;
if (value1.isZero()) {
resultBytes = Bytes.EMPTY;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the denominator is zero here we return empty but in UInt256 if denominator is zero we return zero

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I will change just to be consistent. But Bytes.EMPTY and Bytes32.ZERO is the same for the EVM in fairness.


protected abstract Operation.OperationResult invoke(MessageFrame frame);

public <T> void fillPools(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use BenchmarkHelper as it is done on other opCodes ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I think you're right. I was thinking of modifying the aPool and bPool fields in the parent class itself without leaking the fields, but I realize that avoiding the leak is impossible if one wants to do some more customised stuff.

() -> 22 + RANDOM.nextInt(10),
() -> 1 + RANDOM.nextInt(10),
byteArray -> new BigInteger(1, byteArray),
(__, ___) -> true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would imagine false, meaning we don't want to swap operands to keep a larged quotient

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, done!

() -> 5 + RANDOM.nextInt(4),
() -> 1 + RANDOM.nextInt(4),
byteArray -> new BigInteger(1, byteArray),
(__, ___) -> true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see actually a case where true is needed or I could be missing something. Why not change the order of the two first parameters and set the last parameter to false ?

() -> 9 + RANDOM.nextInt(5),
() -> 4 + RANDOM.nextInt(5),
byteArray -> new BigInteger(1, byteArray),
(__, ___) -> true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above.

import org.hyperledger.besu.evm.operation.SDivOperationOptimized;

public class SDivOperationBenchmark extends BinaryOperationBenchmark {
public class SDivOperationBenchmark extends DivOperationBenchmark {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that in the benchmark, we test only on positive values as I see the transformer defined like this

 byteArray -> new BigInteger(1, byteArray),

In DivOperationBenchmark

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transformer has no impact on the randomized bytes, so you will still get negative and positive values. What will happen is that the criteria to whether swap the operands or not will be based on absolute values, e.g., without caring if value is positive or negative. I think this should be fine AFAIK

Copy link
Contributor

@ahamlat ahamlat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved by error.

Copy link
Contributor

@thomas-quadratic thomas-quadratic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the good work.
The new div and sdiv implementations are strong.
However, I believe that a few of the optimisations that are introduced on the past code have subtle problems, which means that we need at least to improve our tests !
Some are on private methods, maybe hard to catch through small code path like our short lived integers for a single operation.
I feel this is positive either way, whether tests will turn out passing or failing, as it will be giving more confidence overall

* @return result of integer division.
*/
public UInt256 div(final UInt256 denominator) {
if (this.isZero() || denominator.isZero()) return ZERO;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently in mod also added if denominator.isOne() -> return this.
If accepted, we should do the same here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about this - not sure if this will be an added benefit because we have to test for Uint256::equals every time we do divisions from now on. So it will make division by single limb when value is 1 faster but it will impact every other division operation. I wonder if that's beneficial given that with denominator = 1 we already go into the single limb fast path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Probably no need for denominator.isOne() for div or mod, but it does make sense for addMod and mulMod, because then you avoid add or mul.

int m = nSetLimbs(dividend);
int cmp = compareLimbs(dividend, m, modulus, n);
int[] result = new int[N_LIMBS];
int divLen = nSetLimbs(dividend);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I personally prefer names that are more meaningful, even for algorithms' internal mathematical variables. Something like numeratorLength and denominatorLength, or numLen, denomLen, is in my opinion easier to read than m and n.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I can revert that, but you did use n below as well, so what did you mean with that? From the Knuth algorithm that I looked at it mentioned n and m as being the number of halfwords (limbs) of the numerator and denominator, respectively, also being u and v the numerator and denominator, respectively. That's why I mostly made the change to match the original algorithm in the book. But again I can revert if you feel strongly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not feel strongly about this, I am pretty bad at naming myself. Keeping it like this is fine by me, especially if we refer to an algorithm in a book and the names match.

Comment on lines -538 to -541
long[] vLimbsAsLong = new long[n];
for (int i = 0; i < n; i++) {
vLimbsAsLong[i] = vLimbs[i] & MASK_L;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this, did not observe a difference as well.

}

private static void shiftLeftInto(
private static void bitShiftLeftInto(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, bitShift is more explicit than shift in this case as there could be confusion between limbs and bits.
I agree with this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, my first impression when I read the name was that it was shifting entire limbs as well which is not the case.

// Unchecked: result length should be at least x.length + limbShift
int limbShift = shift / N_BITS_PER_LIMB;
int bitShift = shift % N_BITS_PER_LIMB;
if (bitShift == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here before there was a shortcut if the bitShift value was 0, not just shift overall.
That is, if the bitShift is 0, the shift is solely at the limbs level, which is I think much faster with arraycopy than with a for loop.
Why did you make this solely for shift == 0 ? Is it because we never use a shift that is >= N_BITS_PER_LIMB, hence you are simplifying the function? If so, I would personally keep it more general, because shifting by N_BITS_PER_LIMB or larger becomes incorrect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you make this solely for shift == 0 ? Is it because we never use a shift that is >= N_BITS_PER_LIMB, hence you are simplifying the function?

Yes, essentially that's it! limbShift == 0 all the time. So why are we doing the modulus operation to figure out bitShift for cases where you don't need to do it? This seems like a waste of CPU cycles. I would rather create another method to shift over limbs separately. And it's not like the compiler will do dead code elimination there, because when you start using this method to shift over limbs it will have to deal with both cases.
That's why I changed the name to bitShiftLeftInto to make it sound like you are not shifting over the limbs but only within limbs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, good point. However:

  1. note that even if limbShift is 0, the shift is not entirely only within limbs, as some part of a limb falls over to the next limb.
  2. We should at least mention that the method should be used only with 0 <= shift < N_BITS_PER_LIMB, otherwise the result will be wrong. It could also be enforced, but then we lose the some performance benefits.
  3. Although it looks like we are saving on a division, in fact the compiler is clever enough to understand they are with respect to powers of 2, so does shift operations which are far less expensive. We could have written them straight as shifts, but division is probably safer.

I would argue here that if the performance is similar, we should go for the more general and safer implementation. If we can gain a little on performance, than I am fine with it.

// Fixed number of bytes per limb.
private static final int N_BYTES_PER_LIMB = 4;
// Fixed number of bits per limb.
private static final int N_BITS_PER_LIMB = 32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to reintroduce this one for the shifts methods.

cmp = Integer.compareUnsigned(0, b[i]);
if (cmp != 0) return cmp;
}
if (aLen != bLen) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect I think: a = <1, 0, 0, 0> with length 4 is smaller than b = <1, 1, 0, 0> with length 2 (4 limbs but extrapolate to 8 limbs if you want). Yet both are valid internal representations of UInt256.
We have to revert back this change.

Copy link
Member Author

@lu-pinto lu-pinto Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so from the javadoc I thought that length was only about the useful limbs and not always equal to N_LIMBS ? This is rather confusing then to whom is working on this class because it's not consistent. Why do we need length at all then, can we remove it? I thought the purpose was to speed up things.
Why do we need to have this case?

a = <1, 0, 0, 0> with length 4

Maybe we shouldn't allow for anyone to construct UInt256 if not it's own class. IMO constructors should be fully private too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is rather confusing I agree, and the subtlety is in the meaning of optional. I should have commented far more on this, as it is a subtle point.

All limbs past length are known to be zero, but it is not necessarily the smallest such number.
If you don't like length, you can always safely ignore it and assume N_LIMBS.

The reason for this is that if length should be the smallest, then you have to trim leading zeros all the time and we are back with part of the problem with BigInteger: trimming at instantiation. To avoid having to trim, we use fixed size integer: 8 limbs. All 8 limbs are set, and the number is defined by those 8 limbs.

However, we lose some performance benefits when we know we have many zeros, in loops in add and mul for example. So this is where we hit a middle ground: we reintroduce a length attribute for optimisation purposes only. The number is still defined by its 8 limbs, we don't trim, but length can optionally indicate to an algorithm that the leading limbs are zero. The algorithm can use this information or not.

Example: mod operation -> we know that the result will have no more limbs than the modulus, but we don't check exactly how many of the leading limbs are zero, because that would be forcing a trim. So mod results has length set to the modulus' length and that is perfectly safe.


int[] x = this.limbs;
int[] y = modulus.limbs;
if (isDividendNegative) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why absInto was done unconditionally is that it had the side-effect of copying the limbs' array. I agree this is not super clear.
However, with proposed change, if numerator is positive, I believe the operation will modify it inplace, which would be definitely unwanted for a public method. We need a property test for this: given x, y, when doing r = x.signedMod(y), then x == x, y == y

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be missing something but aren't you already creating a brand new array during normalization inside the division algorithm itself? The algorithm does not seem to modify any of the arrays passed as parameters - in fact if you look at mod() this is exactly what's happening.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, sorry my bad. I definitely think that we should add immutability as property tests, and this I think will pass, so all good!


int[] x = this.limbs;
int[] y = denominator.limbs;
if (isNumeratorNegative) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for signedMod concerning modifying this inplace.

Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Open PRs

Development

Successfully merging this pull request may close these issues.

4 participants