-
Notifications
You must be signed in to change notification settings - Fork 334
Arrow like storage over ByteBuffer to allow dual JVM buffer exchange
#14309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
- we have the solution, @jdunkerley - now there is time to design the system properly!
- data were captured with following command
|
std-bits/table/src/main/java/org/enso/table/data/column/builder/LongBuilder.java
Outdated
Show resolved
Hide resolved
std-bits/table/src/main/java/org/enso/table/data/table/Column.java
Outdated
Show resolved
Hide resolved
|
There are performance regressions when running stdlib benchmarks. For example:
use to run such a benchmark. Should be faster with b084696 - update on Nov 21: Yes, the enormous regression is gone, but still we have 2-3x slowdown mostly in
Slowdown is very likely coming from LongIterator.bgv. Time to speed it up! |
ByteBuffer to allow dual JVM buffer exchange
ByteBuffer to allow dual JVM buffer exchangeByteBuffer to allow dual JVM buffer exchange
The graph on left is the new |
33f6403 to
727631e
Compare
…river (#14357) - _dual JVM_ benchmark to track progress of #13851 - `Column` is loaded by `Standard.Generic_JDBC` module, but the mean is computed by `Standard.Table` - that (in _dual JVM mode_) means the `Column` data are **crossing the boundary** # Important Notes - with c35aa2e we can run both tests at the same time: ``` sbt:enso> std-benchmarks/benchOnly JDBC.mean [info] Benchmark Mode Cnt Score Error Units [info] Dual_JVM_Generic_JDBC.mean avgt 544,708 ms/op [info] Single_JVM_Generic_JDBC.mean avgt 32,655 ms/op ``` - e.g. **20x** slower for now - #14309 will make it faster
…eOverByteBuffer13851
6464402 to
4a052e8
Compare
…eOverByteBuffer13851
| */ | ||
| default long rawCapacity() { | ||
| return getSize(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works Fast!
rawAddressandrawCapacityintroduced by 4a052e8- with these changes we get the Dual JVM benchmark computing mean on a table loaded by Generic_JDBC driver #14357 on par:
sbt:std-benchmarks> benchOnly mean
[info] Benchmark Mode Cnt Score Error Units
[info] Dual_JVM_Generic_JDBC.mean avgt 33,393 ms/op
[info] Single_JVM_Generic_JDBC.mean avgt 36,639 ms/op
- the next step is to ensure the format of data at
rawAddress()matches Arrow format as used by Arrow language #8512 & co.- to be continued at...
Appendix: A Vision that Failed
- in one of the previous prototypes there was
rawAddressandrawValidityAddress- in the process of rewriting the goal shifted to unify these two into one
rawAddressandgetSize() - however then it turned out that
getSize()can be different than capacity of the buffer
- in the process of rewriting the goal shifted to unify these two into one
- hence we are back to two values
rawAddress,rawCapacityand alsogetSize() - it might have been simpler to stick with the two rawAddress
andrawValidityAddress` addresses...
| } | ||
|
|
||
| @Test | ||
| public void testCreateViaBuilderAndReadViaArrow() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested for Compatibility with Arrow (and Arrow Language)
- ca62140 generates
LongStoragebyLongBuilder - and reads it via
ArrowLanguageexpecting the same result - not only that guarantees our format is Arrow compatible
- but it also ensures that whatever formats are generated by our storage Java code
- are also supported by
ArrowLanguage - e.g. we have all the pieces we need to drop the Java code and use
ArrowLanguagewhen the time comes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

](https://private-user-images.githubusercontent.com/26887752/516959851-57df5211-1794-4c14-8386-59630a230baf.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NjQ3NTE5MzYsIm5iZiI6MTc2NDc1MTYzNiwicGF0aCI6Ii8yNjg4Nzc1Mi81MTY5NTk4NTEtNTdkZjUyMTEtMTc5NC00YzE0LTgzODYtNTk2MzBhMjMwYmFmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTEyMDMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUxMjAzVDA4NDcxNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM5ZTAzYzRjMWQxYmJiOWRjMzI2MjQwMDM0MTI0ODJlZWE4NjEwNjM3OWViOWUxMDljZTMxZDEyMjAyNzlhNzImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.pb2x2WXXxmNOV7QRxdlGHgC7337seK7GJ0-65d_aHrI)


Pull Request Description
ByteBuffers between the dual JVMs #13904 toColumnStoragefrom a proxy to localColumnStoragewhen creating a newColumn- done in 4a052e8Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
Bufferusage?