Skip to content

[PECOBLR-1121] Arrow patch to circumvent Arrow issues with JDk 16+#1243

Merged
tejassp-db merged 161 commits intomainfrom
PECOBLR-1121/arrow-patch/stack-6
Mar 2, 2026
Merged

[PECOBLR-1121] Arrow patch to circumvent Arrow issues with JDk 16+#1243
tejassp-db merged 161 commits intomainfrom
PECOBLR-1121/arrow-patch/stack-6

Conversation

@tejassp-db
Copy link
Collaborator

Databricks server shares query results in Arrow format for easy cross language functionality. The JDBC driver experiences compatibility issues with JDK 16 and later versions when processing Arrow results.

This problem arises from stricter encapsulation of internal APIs in newer Java versions, which affects the driver's use of the Apache Arrow result format consumption with the Apache Arrow library. The JDBC driver is used in partner solutions, where they do not have control of the runtime environment, and the workaround of setting JVM arguments is not feasible.

This PR patches some of the Arrow code to provide alternative JVM Heap based byte allocators that do not use native MemoryUtil based direct reads from off-heap memory. This implementation uses the native Arrow code path if feasible, else falls back to the patched code.

All the code has been tested for read compatibility with all Arrow types, latency benchmarks have been tested, and automated tests have been added as well.

During the course of this change it became necessary to also convert the project into a multi-module maven project

Patch Arrow to create a Databricks ArrowBuf which allocates memory on
the heap and provides access to it through Java methods. This removes
the need to specify "--add-opens=java.base/java.nio=ALL-UNNAMED" as
JVM args for JDK 16+.
Added tests to validate Arrow patch code paths. Added Maven profiles
to validate the behaviour across JVM versions and with/without
"--add-opens=java.base/java.nio=ALL-UNNAMED" JVM arguments.

By default, JVM version 11 is assumed. To use other JVM versions,
the toolchain needs to be setup to point to the correct Java versions
on the local machine in .m2/toolchains.xml.
Use native Arrow if available. Otherwise fallback to the patch version.
Remove irrelevant reference counting in patch code. Patch code uses heap
memory for arrow operations and reference counting is not required.
Add unit tests for all public API.
Remove redundant todos for accounting.
A JMH benchmark for Arrow parsing of patched and unpatched Arrow Buffers
and Buffer allocators.
Convert the code to muli module project.
- Cleaner separation of JAR generation for Uber jar and normal/thin JAR
  with some patched Arrow changes.
- Test modules with tests for shaded jars.
Tests to verify that all dependencies are shaded as expected.
Add tests to handle all data types supported by Arrow.
Patch DecimalUtility to not use unsafe methods to set decimal values on DatabricksArrowBuf.
Add tests for Boolean, Null, Fixed size list, UTF-8 view, Binary view,
list view, large list view types.
Remove default profile of JDK 11. Do not fail on Github actions.
Add a boolean field to specify whether the patched Arrow code is being used in the JVM to parse Arrow responses.
@tejassp-db tejassp-db requested a review from gopalldb March 2, 2026 10:33
@tejassp-db tejassp-db self-assigned this Mar 2, 2026
@tejassp-db
Copy link
Collaborator Author

Current github actions wont pass, because the current github workflows  are setup for a single module maven project. I have a separate branch to enable these test runs and I have run it from there.

@tejassp-db tejassp-db requested a review from samikshya-db March 2, 2026 10:50
@samikshya-db samikshya-db changed the title PECOBLR-1121 Arrow patch to circumvent Arrow issues with JDk 16+ [PECOBLR-1121] Arrow patch to circumvent Arrow issues with JDk 16+ Mar 2, 2026
Copy link
Collaborator

@samikshya-db samikshya-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified that there are no additional changes other than the ones in

#1180 #1162 #1161 #1160 #1156 #1144 (These are already approved.)

@tejassp-db tejassp-db merged commit 5dc1e5a into main Mar 2, 2026
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants