Arrow like storage over `ByteBuffer` to allow dual JVM buffer exchange #14309

JaroslavTulach · 2025-11-18T10:09:06Z

Pull Request Description

speeding up Mean.average benchmark
the idea is to reuse work done by Exchanging direct ByteBuffers between the dual JVMs #13904 to
convert ColumnStorage from a proxy to local ColumnStorage when creating a new Column - done in 4a052e8

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

All code follows the
Scala,
Java,
Unit tests have been written where possible.
Are library benchmarks influenced by Buffer usage?
- benchmark for 1827145 is executed at

JaroslavTulach · 2025-11-18T11:02:44Z

With 6655b7b the computation of Mean is as fast in dual JVM mode as in single JVM

- we have the solution, @jdunkerley - now there is time to design the system properly! - data were captured with following command

sbt:enso> runEngineDistribution 
  --vm.D=polyglot.enso.classLoading=Standard.Generic_JDBC:guest,hosted 
  --run /home/devel/enso-projects/SQLite/03-GenericConnect 
  --profiling-path /tmp/better.npss

std-bits/table/src/main/java/org/enso/table/data/column/builder/LongBuilder.java

std-bits/table/src/main/java/org/enso/table/data/table/Column.java

JaroslavTulach · 2025-11-20T16:25:57Z

There are performance regressions when running stdlib benchmarks. For example:

[org_enso_benchmarks_generated_Column_Arithmetic_1000000_Plus_Nothing](file:///run/user/1000/doc/370b903e/stdlib-benchs.html#org_enso_benchmarks_generated_Column_Arithmetic_1000000_Plus_Nothing)

use

sbt:enso> std-benchmarks/benchOnly Column_Arithmetic_1000000.Plus_Nothing

to run such a benchmark. Should be faster with b084696 - update on Nov 21: Yes, the enormous regression is gone, but still we have 2-3x slowdown mostly in Column_Arithmetic_1000000 benchmarks:

Slowdown is very likely coming from LongIterator.bgv. Time to speed it up!

JaroslavTulach · 2025-11-24T16:17:17Z

Slowdown is very likely coming from LongIterator.bgv. Time to speed it up!

The graph on left is the new LongBuffer.wrap version. The graph on right is the current long[] version. As can be seen, the LongBuffer is doing more operations as the code is more generic. It needs to check limit, it has concept of offset. It always reads hb array again (not sure why, that doesn't seem necessary to me). In any case, with these additional operations we just have to expect some slowdown.

…river (#14357) - _dual JVM_ benchmark to track progress of #13851 - `Column` is loaded by `Standard.Generic_JDBC` module, but the mean is computed by `Standard.Table` - that (in _dual JVM mode_) means the `Column` data are **crossing the boundary** # Important Notes - with c35aa2e we can run both tests at the same time: ``` sbt:enso> std-benchmarks/benchOnly JDBC.mean [info] Benchmark Mode Cnt Score Error Units [info] Dual_JVM_Generic_JDBC.mean avgt 544,708 ms/op [info] Single_JVM_Generic_JDBC.mean avgt 32,655 ms/op ``` - e.g. **20x** slower for now - #14309 will make it faster

…eOverByteBuffer13851

…rage up

…eOverByteBuffer13851

JaroslavTulach · 2025-11-27T15:26:53Z

std-bits/table/src/main/java/org/enso/table/data/column/storage/ColumnStorage.java

+   */
+  default long rawCapacity() {
+    return getSize();
+  }


Works Fast!

rawAddress and rawCapacity introduced by 4a052e8

with these changes we get the Dual JVM benchmark computing mean on a table loaded by Generic_JDBC driver #14357 on par:

sbt:std-benchmarks> benchOnly mean [info] Benchmark Mode Cnt Score Error Units [info] Dual_JVM_Generic_JDBC.mean avgt 33,393 ms/op [info] Single_JVM_Generic_JDBC.mean avgt 36,639 ms/op

the next step is to ensure the format of data at rawAddress() matches Arrow format as used by Arrow language #8512 & co.

to be continued at...

Appendix: A Vision that Failed

in one of the previous prototypes there was rawAddress and rawValidityAddress

in the process of rewriting the goal shifted to unify these two into one rawAddress and getSize()

however then it turned out that getSize() can be different than capacity of the buffer

hence we are back to two values rawAddress, rawCapacity and also getSize()

it might have been simpler to stick with the two rawAddressandrawValidityAddress` addresses...

…rowLanguage.

JaroslavTulach · 2025-11-27T16:17:12Z

std-bits/tests/src/test/java/org/enso/base/polyglot/tests/LongStorageTest.java

+  }
+
+  @Test
+  public void testCreateViaBuilderAndReadViaArrow() {


Tested for Compatibility with Arrow (and Arrow Language)

ca62140 generates LongStorage by LongBuilder

and reads it via ArrowLanguage expecting the same result

not only that guarantees our format is Arrow compatible

but it also ensures that whatever formats are generated by our storage Java code

are also supported by ArrowLanguage

e.g. we have all the pieces we need to drop the Java code and use ArrowLanguage when the time comes

Basic compatibility achieved at 89955a0

more work remaining as aad18a7 demonstrates

JaroslavTulach self-assigned this Nov 18, 2025

JaroslavTulach requested review from AdRiley, GregoryTravis, hubertp and jdunkerley as code owners November 18, 2025 10:09

JaroslavTulach added the CI: No changelog needed Do not require a changelog entry for this PR. label Nov 18, 2025

JaroslavTulach linked an issue Nov 18, 2025 that may be closed by this pull request

Use Arrow format for IntegerColumn, exchange between JVMs and Python #13851

Open

JaroslavTulach marked this pull request as draft November 18, 2025 10:40

enso-bot bot mentioned this pull request Nov 19, 2025

Use Arrow format for IntegerColumn, exchange between JVMs and Python #13851

Open

JaroslavTulach commented Nov 20, 2025

View reviewed changes

std-bits/table/src/main/java/org/enso/table/data/column/builder/LongBuilder.java Outdated Show resolved Hide resolved

JaroslavTulach commented Nov 20, 2025

View reviewed changes

std-bits/table/src/main/java/org/enso/table/data/table/Column.java Outdated Show resolved Hide resolved

JaroslavTulach changed the title ~~LongStorage over ByteBuffer~~ LongStorage & co. over ByteBuffer to allow dual JVM buffer exchange Nov 21, 2025

JaroslavTulach changed the title ~~LongStorage & co. over ByteBuffer to allow dual JVM buffer exchange~~ Arrow like storage over ByteBuffer to allow dual JVM buffer exchange Nov 21, 2025

JaroslavTulach mentioned this pull request Nov 22, 2025

Encapsulate access to LongBuilder.data field #14337

Closed

3 tasks

Using LongBuffer in LongStorage. 2,78ms

727631e

JaroslavTulach force-pushed the wip/jtulach/AverageOverByteBuffer13851 branch from 33f6403 to 727631e Compare November 25, 2025 08:13

Start with index = -1

1827145

JaroslavTulach requested a review from Akirathan November 25, 2025 12:12

JaroslavTulach added 3 commits November 25, 2025 18:16

Wrap data only up to current size

354198f

Use long buffer in builder

0d908a8

Allocate single buffer for validity map and values

91d7895

JaroslavTulach mentioned this pull request Nov 26, 2025

Dual JVM benchmark computing mean on a table loaded by Generic_JDBC driver #14357

Merged

2 tasks

Merge remote-tracking branch 'origin/develop' into wip/jtulach/Averag…

c45b605

…eOverByteBuffer13851

JaroslavTulach mentioned this pull request Nov 27, 2025

Testing EnsoMeta ability to load Enso types from Standard.Base #14287

Merged

2 tasks

Builder.makeLocal to speed access to Column data of the other LongSto…

4a052e8

…rage up

JaroslavTulach force-pushed the wip/jtulach/AverageOverByteBuffer13851 branch from 6464402 to 4a052e8 Compare November 27, 2025 15:21

Merge remote-tracking branch 'origin/develop' into wip/jtulach/Averag…

e03b1df

…eOverByteBuffer13851

JaroslavTulach commented Nov 27, 2025

View reviewed changes

Test reading off-heap memory layout as generated by LongStorage by Ar…

ca62140

…rowLanguage.

JaroslavTulach commented Nov 27, 2025

View reviewed changes

JaroslavTulach added 3 commits November 28, 2025 12:32

Simpler generateAndCompare to start with

5635bbb

Arrow can read LongStorage for 16 elements without nulls

89955a0

Randomly seeded test

aad18a7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Arrow like storage over `ByteBuffer` to allow dual JVM buffer exchange #14309

Arrow like storage over `ByteBuffer` to allow dual JVM buffer exchange #14309

JaroslavTulach commented Nov 18, 2025 •

edited

Loading

Uh oh!

JaroslavTulach commented Nov 18, 2025

Uh oh!

Uh oh!

Uh oh!

JaroslavTulach commented Nov 20, 2025 •

edited

Loading

Uh oh!

JaroslavTulach commented Nov 24, 2025

Uh oh!

JaroslavTulach Nov 27, 2025 •

edited

Loading

Uh oh!

JaroslavTulach Nov 27, 2025 •

edited

Loading

Uh oh!

JaroslavTulach Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Arrow like storage over ByteBuffer to allow dual JVM buffer exchange #14309

Are you sure you want to change the base?

Arrow like storage over ByteBuffer to allow dual JVM buffer exchange #14309

Conversation

JaroslavTulach commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Description

Checklist

Uh oh!

JaroslavTulach commented Nov 18, 2025

Uh oh!

Uh oh!

Uh oh!

JaroslavTulach commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JaroslavTulach commented Nov 24, 2025

Uh oh!

JaroslavTulach Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Works Fast!

Appendix: A Vision that Failed

Uh oh!

JaroslavTulach Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Tested for Compatibility with Arrow (and Arrow Language)

Uh oh!

JaroslavTulach Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Arrow like storage over `ByteBuffer` to allow dual JVM buffer exchange #14309

Arrow like storage over `ByteBuffer` to allow dual JVM buffer exchange #14309

JaroslavTulach commented Nov 18, 2025 •

edited

Loading

JaroslavTulach commented Nov 20, 2025 •

edited

Loading

JaroslavTulach Nov 27, 2025 •

edited

Loading

JaroslavTulach Nov 27, 2025 •

edited

Loading