reimplement div and sdiv #9391

lu-pinto · 2025-10-31T20:14:04Z

This PR reimplements DIV and SDIV without using BigInteger from the JDK. Benchmarks included in this PR show it is significantly faster:

I've also made some changes to Mod to simplify the algorithm and in unit tests.

Fixed Issue(s)

Thanks for sending a pull request! Have you done the following?

Checked out our contribution guidelines?
Considered documentation and added the doc-change-required label to this PR if updates are required.
Considered the changelog and included an update if required.
For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

spotless: ./gradlew spotlessApply
unit tests: ./gradlew build
acceptance tests: ./gradlew acceptanceTest
integration tests: ./gradlew integrationTest
reference tests: ./gradlew ethereum:referenceTests:referenceTests
hive tests: Engine or other RPCs modified?

Signed-off-by: Luis Pinto <luis.pinto@consensys.net>

lu-pinto · 2025-10-31T23:48:10Z

Negligible impact on the MOD operation: MOD benchmarks

lu-pinto · 2025-11-01T00:03:42Z

evm/src/main/java/org/hyperledger/besu/evm/UInt256.java

-    long[] vLimbsAsLong = new long[n];
-    for (int i = 0; i < n; i++) {
-      vLimbsAsLong[i] = vLimbs[i] & MASK_L;
-    }


removed - since this does not produce significant performance improvements

I agree with this, did not observe a difference as well.

macfarla

think it looks pretty good. couple of comments

macfarla · 2025-11-06T00:59:36Z

evm/src/main/java/org/hyperledger/besu/evm/operation/DivOperationOptimized.java

@@ -0,0 +1,64 @@
+/*
+ * Copyright ConsenSys AG.


Suggested change

* Copyright ConsenSys AG.

* Copyright contributors to Besu.

macfarla · 2025-11-06T01:13:45Z

CHANGELOG.md

 ### Additions and Improvements
 - Update to vertx 4.5.22 [#9375](https://github.com/hyperledger/besu/pull/9375)
 - Add `opcodes` optional parameter to RPC methods: `debug_standardTraceBlockToFile`, `debug_standardTraceBadBlockToFile`, `debug_traceBlockByNumber`, `debug_traceBlockByHash`, `debug_traceTransaction`, `debug_traceBlock`, `debug_traceCall` for tracing specified opcodes [#9335](https://github.com/hyperledger/besu/pull/9335)
+- Improve DIV and SDIV performance [#9391](https://github.com/hyperledger/besu/pull/9391)


will need to move sections

macfarla · 2025-11-06T01:46:04Z

evm/src/main/java/org/hyperledger/besu/evm/UInt256.java

+  private static int[] knuthDivision(final int[] numerator, final int[] denominator) {
+    int n = nSetLimbs(denominator);
+    if (n == 0) {
+      throw new ArithmeticException("divided by zero");


Suggested change

throw new ArithmeticException("divided by zero");

throw new ArithmeticException("division by zero");

nit: think division by zero is more commonly used wording

macfarla · 2025-11-06T01:47:14Z

evm/src/main/java/org/hyperledger/besu/evm/UInt256.java

  private static int[] knuthRemainder(final int[] dividend, final int[] modulus) {
+    int n = nSetLimbs(modulus);
+    if (n == 0) {
+      throw new ArithmeticException("divided by zero");


Suggested change

throw new ArithmeticException("divided by zero");

throw new ArithmeticException("division by zero");

macfarla · 2025-11-06T01:55:37Z

evm/src/main/java/org/hyperledger/besu/evm/operation/SDivOperationOptimized.java

+    final Bytes value1 = frame.popStackItem();
+    Bytes resultBytes;
+    if (value1.isZero()) {
+      resultBytes = Bytes.EMPTY;


if the denominator is zero here we return empty but in UInt256 if denominator is zero we return zero

right, I will change just to be consistent. But Bytes.EMPTY and Bytes32.ZERO is the same for the EVM in fairness.

Signed-off-by: Luis Pinto <luis.pinto@consensys.net>

ahamlat · 2025-11-07T10:17:16Z

.../core/src/jmh/java/org/hyperledger/besu/ethereum/vm/operations/BinaryOperationBenchmark.java


  protected abstract Operation.OperationResult invoke(MessageFrame frame);
+
+  public <T> void fillPools(


Why not use BenchmarkHelper as it is done on other opCodes ?

True, I think you're right. I was thinking of modifying the aPool and bPool fields in the parent class itself without leaking the fields, but I realize that avoiding the leak is impossible if one wants to do some more customised stuff.

ahamlat · 2025-11-07T10:24:35Z

...eum/core/src/jmh/java/org/hyperledger/besu/ethereum/vm/operations/DivOperationBenchmark.java

+            () -> 22 + RANDOM.nextInt(10),
+            () -> 1 + RANDOM.nextInt(10),
+            byteArray -> new BigInteger(1, byteArray),
+            (__, ___) -> true);


I would imagine false, meaning we don't want to swap operands to keep a larged quotient

true, done!

ahamlat · 2025-11-07T10:31:22Z

...eum/core/src/jmh/java/org/hyperledger/besu/ethereum/vm/operations/DivOperationBenchmark.java

+            () -> 5 + RANDOM.nextInt(4),
+            () -> 1 + RANDOM.nextInt(4),
+            byteArray -> new BigInteger(1, byteArray),
+            (__, ___) -> true);


I don't see actually a case where true is needed or I could be missing something. Why not change the order of the two first parameters and set the last parameter to false ?

ahamlat · 2025-11-07T10:32:02Z

...eum/core/src/jmh/java/org/hyperledger/besu/ethereum/vm/operations/DivOperationBenchmark.java

+            () -> 9 + RANDOM.nextInt(5),
+            () -> 4 + RANDOM.nextInt(5),
+            byteArray -> new BigInteger(1, byteArray),
+            (__, ___) -> true);


The same as above.

ahamlat · 2025-11-07T10:41:05Z

...um/core/src/jmh/java/org/hyperledger/besu/ethereum/vm/operations/SDivOperationBenchmark.java

+import org.hyperledger.besu.evm.operation.SDivOperationOptimized;

-public class SDivOperationBenchmark extends BinaryOperationBenchmark {
+public class SDivOperationBenchmark extends DivOperationBenchmark {


Does this mean that in the benchmark, we test only on positive values as I see the transformer defined like this

byteArray -> new BigInteger(1, byteArray),

In DivOperationBenchmark

The transformer has no impact on the randomized bytes, so you will still get negative and positive values. What will happen is that the criteria to whether swap the operands or not will be based on absolute values, e.g., without caring if value is positive or negative. I think this should be fine AFAIK

ahamlat

Approved by error.

thomas-quadratic

Thanks for the good work.
The new div and sdiv implementations are strong.
However, I believe that a few of the optimisations that are introduced on the past code have subtle problems, which means that we need at least to improve our tests !
Some are on private methods, maybe hard to catch through small code path like our short lived integers for a single operation.
I feel this is positive either way, whether tests will turn out passing or failing, as it will be giving more confidence overall

thomas-quadratic · 2025-11-10T13:29:29Z

evm/src/main/java/org/hyperledger/besu/evm/UInt256.java

+   * @return result of integer division.
+   */
+  public UInt256 div(final UInt256 denominator) {
+    if (this.isZero() || denominator.isZero()) return ZERO;


Recently in mod also added if denominator.isOne() -> return this.
If accepted, we should do the same here.

thinking about this - not sure if this will be an added benefit because we have to test for Uint256::equals every time we do divisions from now on. So it will make division by single limb when value is 1 faster but it will impact every other division operation. I wonder if that's beneficial given that with denominator = 1 we already go into the single limb fast path.

Good point. Probably no need for denominator.isOne() for div or mod, but it does make sense for addMod and mulMod, because then you avoid add or mul.

thomas-quadratic · 2025-11-10T13:51:41Z

evm/src/main/java/org/hyperledger/besu/evm/UInt256.java

+    int m = nSetLimbs(dividend);
+    int cmp = compareLimbs(dividend, m, modulus, n);
    int[] result = new int[N_LIMBS];
-    int divLen = nSetLimbs(dividend);


nit: I personally prefer names that are more meaningful, even for algorithms' internal mathematical variables. Something like numeratorLength and denominatorLength, or numLen, denomLen, is in my opinion easier to read than m and n.

ok I can revert that, but you did use n below as well, so what did you mean with that? From the Knuth algorithm that I looked at it mentioned n and m as being the number of halfwords (limbs) of the numerator and denominator, respectively, also being u and v the numerator and denominator, respectively. That's why I mostly made the change to match the original algorithm in the book. But again I can revert if you feel strongly.

I do not feel strongly about this, I am pretty bad at naming myself. Keeping it like this is fine by me, especially if we refer to an algorithm in a book and the names match.

thomas-quadratic · 2025-11-10T13:54:55Z

evm/src/main/java/org/hyperledger/besu/evm/UInt256.java

-    long[] vLimbsAsLong = new long[n];
-    for (int i = 0; i < n; i++) {
-      vLimbsAsLong[i] = vLimbs[i] & MASK_L;
-    }


I agree with this, did not observe a difference as well.

thomas-quadratic · 2025-11-10T16:09:38Z

evm/src/main/java/org/hyperledger/besu/evm/UInt256.java

  }

-  private static void shiftLeftInto(
+  private static void bitShiftLeftInto(


Ok, bitShift is more explicit than shift in this case as there could be confusion between limbs and bits.
I agree with this change.

yes, my first impression when I read the name was that it was shifting entire limbs as well which is not the case.

thomas-quadratic · 2025-11-10T16:17:49Z

evm/src/main/java/org/hyperledger/besu/evm/UInt256.java

-    // Unchecked: result length should be at least x.length + limbShift
-    int limbShift = shift / N_BITS_PER_LIMB;
-    int bitShift = shift % N_BITS_PER_LIMB;
-    if (bitShift == 0) {


Here before there was a shortcut if the bitShift value was 0, not just shift overall.
That is, if the bitShift is 0, the shift is solely at the limbs level, which is I think much faster with arraycopy than with a for loop.
Why did you make this solely for shift == 0 ? Is it because we never use a shift that is >= N_BITS_PER_LIMB, hence you are simplifying the function? If so, I would personally keep it more general, because shifting by N_BITS_PER_LIMB or larger becomes incorrect.

Why did you make this solely for shift == 0 ? Is it because we never use a shift that is >= N_BITS_PER_LIMB, hence you are simplifying the function?

Yes, essentially that's it! limbShift == 0 all the time. So why are we doing the modulus operation to figure out bitShift for cases where you don't need to do it? This seems like a waste of CPU cycles. I would rather create another method to shift over limbs separately. And it's not like the compiler will do dead code elimination there, because when you start using this method to shift over limbs it will have to deal with both cases.
That's why I changed the name to bitShiftLeftInto to make it sound like you are not shifting over the limbs but only within limbs.

I understand, good point. However:

note that even if limbShift is 0, the shift is not entirely only within limbs, as some part of a limb falls over to the next limb.

We should at least mention that the method should be used only with 0 <= shift < N_BITS_PER_LIMB, otherwise the result will be wrong. It could also be enforced, but then we lose the some performance benefits.

Although it looks like we are saving on a division, in fact the compiler is clever enough to understand they are with respect to powers of 2, so does shift operations which are far less expensive. We could have written them straight as shifts, but division is probably safer.

I would argue here that if the performance is similar, we should go for the more general and safer implementation. If we can gain a little on performance, than I am fine with it.

thomas-quadratic · 2025-11-10T16:24:09Z

evm/src/main/java/org/hyperledger/besu/evm/UInt256.java

  // Fixed number of bytes per limb.
  private static final int N_BYTES_PER_LIMB = 4;
-  // Fixed number of bits per limb.
-  private static final int N_BITS_PER_LIMB = 32;


Might want to reintroduce this one for the shifts methods.

thomas-quadratic · 2025-11-10T16:30:13Z

evm/src/main/java/org/hyperledger/besu/evm/UInt256.java

-        cmp = Integer.compareUnsigned(0, b[i]);
-        if (cmp != 0) return cmp;
-      }
+    if (aLen != bLen) {


This is incorrect I think: a = <1, 0, 0, 0> with length 4 is smaller than b = <1, 1, 0, 0> with length 2 (4 limbs but extrapolate to 8 limbs if you want). Yet both are valid internal representations of UInt256.
We have to revert back this change.

Ok, so from the javadoc I thought that length was only about the useful limbs and not always equal to N_LIMBS ? This is rather confusing then to whom is working on this class because it's not consistent. Why do we need length at all then, can we remove it? I thought the purpose was to speed up things.
Why do we need to have this case?

a = <1, 0, 0, 0> with length 4

Maybe we shouldn't allow for anyone to construct UInt256 if not it's own class. IMO constructors should be fully private too.

Yes, it is rather confusing I agree, and the subtlety is in the meaning of optional. I should have commented far more on this, as it is a subtle point.

All limbs past length are known to be zero, but it is not necessarily the smallest such number.
If you don't like length, you can always safely ignore it and assume N_LIMBS.

The reason for this is that if length should be the smallest, then you have to trim leading zeros all the time and we are back with part of the problem with BigInteger: trimming at instantiation. To avoid having to trim, we use fixed size integer: 8 limbs. All 8 limbs are set, and the number is defined by those 8 limbs.

However, we lose some performance benefits when we know we have many zeros, in loops in add and mul for example. So this is where we hit a middle ground: we reintroduce a length attribute for optimisation purposes only. The number is still defined by its 8 limbs, we don't trim, but length can optionally indicate to an algorithm that the leading limbs are zero. The algorithm can use this information or not.

Example: mod operation -> we know that the result will have no more limbs than the modulus, but we don't check exactly how many of the leading limbs are zero, because that would be forcing a trim. So mod results has length set to the modulus' length and that is perfectly safe.

thomas-quadratic · 2025-11-10T17:14:22Z

evm/src/main/java/org/hyperledger/besu/evm/UInt256.java

+
+    int[] x = this.limbs;
+    int[] y = modulus.limbs;
+    if (isDividendNegative) {


The reason why absInto was done unconditionally is that it had the side-effect of copying the limbs' array. I agree this is not super clear.
However, with proposed change, if numerator is positive, I believe the operation will modify it inplace, which would be definitely unwanted for a public method. We need a property test for this: given x, y, when doing r = x.signedMod(y), then x == x, y == y

Could be missing something but aren't you already creating a brand new array during normalization inside the division algorithm itself? The algorithm does not seem to modify any of the arrays passed as parameters - in fact if you look at mod() this is exactly what's happening.

You are right, sorry my bad. I definitely think that we should add immutability as property tests, and this I think will pass, so all good!

thomas-quadratic · 2025-11-10T17:16:23Z

evm/src/main/java/org/hyperledger/besu/evm/UInt256.java

+
+    int[] x = this.limbs;
+    int[] y = denominator.limbs;
+    if (isNumeratorNegative) {


Same comment as for signedMod concerning modifying this inplace.

Signed-off-by: Luis Pinto <luis.pinto@consensys.net>

lu-pinto added 5 commits October 31, 2025 14:38

reimplement div and sdiv

0c7d4e0

Signed-off-by: Luis Pinto <luis.pinto@consensys.net>

refactor UInt256Test to simplify assertions

b168e2f

Signed-off-by: Luis Pinto <luis.pinto@consensys.net>

add more tests for division

bdbf06f

Signed-off-by: Luis Pinto <luis.pinto@consensys.net>

Wire up division algo with opcodes

249ba6b

Signed-off-by: Luis Pinto <luis.pinto@consensys.net>

Add division benchmarks

46fb00e

Signed-off-by: Luis Pinto <luis.pinto@consensys.net>

lu-pinto marked this pull request as draft October 31, 2025 20:14

lu-pinto added 2 commits October 31, 2025 20:16

add entry to changelog

303f064

Signed-off-by: Luis Pinto <luis.pinto@consensys.net>

spotless

2b9b55b

Signed-off-by: Luis Pinto <luis.pinto@consensys.net>

lu-pinto commented Nov 1, 2025

View reviewed changes

lu-pinto added the performance label Nov 3, 2025

lu-pinto marked this pull request as ready for review November 3, 2025 15:40

macfarla reviewed Nov 6, 2025

View reviewed changes

lu-pinto added 2 commits November 6, 2025 12:56

address review comments by macfarla

648b413

Signed-off-by: Luis Pinto <luis.pinto@consensys.net>

Merge remote-tracking branch 'upstream/main' into implement-div-sdiv

03141a0

ahamlat approved these changes Nov 7, 2025

View reviewed changes

ahamlat requested changes Nov 7, 2025

View reviewed changes

thomas-quadratic reviewed Nov 10, 2025

View reviewed changes

lu-pinto added 2 commits November 12, 2025 08:58

Refactor fill pool methos in BinaryOperationbenchmark

23b1570

Signed-off-by: Luis Pinto <luis.pinto@consensys.net>

Make UInt256 constructors fully private

6edc4e2

Signed-off-by: Luis Pinto <luis.pinto@consensys.net>

ahamlat mentioned this pull request Nov 20, 2025

Implement EVM arithmetic operations using the new UInt256 implementation #9472

Closed

macfarla added this to RC 25.12.0 Dec 4, 2025

github-project-automation bot moved this to Backlog in RC 25.12.0 Dec 4, 2025

	throw new ArithmeticException("divided by zero");
	throw new ArithmeticException("division by zero");


		protected abstract Operation.OperationResult invoke(MessageFrame frame);

		public <T> void fillPools(

reimplement div and sdiv #9391

Are you sure you want to change the base?

reimplement div and sdiv #9391

Uh oh!

Conversation

lu-pinto commented Oct 31, 2025 • edited by macfarla Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fixed Issue(s)

Thanks for sending a pull request! Have you done the following?

Locally, you can run these tests to catch failures early:

Uh oh!

lu-pinto commented Oct 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

macfarla left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahamlat left a comment

Choose a reason for hiding this comment

Uh oh!

thomas-quadratic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lu-pinto commented Oct 31, 2025 •

edited by macfarla

Loading

lu-pinto Nov 12, 2025 •

edited

Loading