Skip to content

Fix primitive type parsing in complex types for Arrow-serialized formats#1250

Open
vikrantpuppala wants to merge 1 commit intodatabricks:mainfrom
vikrantpuppala:fix/complex-type-timestamp-time-epoch-values
Open

Fix primitive type parsing in complex types for Arrow-serialized formats#1250
vikrantpuppala wants to merge 1 commit intodatabricks:mainfrom
vikrantpuppala:fix/complex-type-timestamp-time-epoch-values

Conversation

@vikrantpuppala
Copy link
Collaborator

@vikrantpuppala vikrantpuppala commented Mar 4, 2026

Summary

  • TIMESTAMP / TIMESTAMP_NTZ fields inside complex types (ARRAY, MAP, STRUCT) are serialized as epoch microseconds by Arrow. Added fallback to convert epoch micros to java.sql.Timestamp. Also handles TIMESTAMP_NTZ serialized as [year,month,day,hour,min,sec] component arrays.
  • BINARY fields inside complex types are serialized as base64-encoded strings by Arrow. Added base64 decoding in convertPrimitive().
  • Added TIMESTAMP_NTZ case to DatabricksStruct.convertSimpleValue() and DatabricksArray.convertValue() switch statements that were missing it.
  • Added server-format comments documenting how Arrow serializes each type within nested structures.

Related: #1247, #1248

Test plan

  • 7 new unit tests covering TIMESTAMP, TIMESTAMP_NTZ, and BINARY across struct, array, and map containers
  • All 20 tests in ComplexDataTypeParserTest pass
  • Server output formats verified via E2E tests against real Databricks warehouse
  • CI passes

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 4, 2026 06:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves parsing of Arrow-serialized primitive values when they appear inside complex types (ARRAY/MAP/STRUCT), aligning driver behavior with Databricks/Spark’s alternate nested encodings.

Changes:

  • Add fallback parsing for nested DATE (epoch days), TIMESTAMP/TIMESTAMP_NTZ (epoch micros) and BINARY (base64) in ComplexDataTypeParser.
  • Handle TIMESTAMP_NTZ explicitly in DatabricksStruct and DatabricksArray simple-value conversion.
  • Add unit tests for the new nested parsing behaviors and update NEXT_CHANGELOG.md.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/main/java/com/databricks/jdbc/api/impl/ComplexDataTypeParser.java Adds fallback conversions for Arrow nested encodings (epoch days/micros, TIMESTAMP_NTZ component arrays, base64 BINARY).
src/main/java/com/databricks/jdbc/api/impl/DatabricksStruct.java Adds TIMESTAMP_NTZ handling (and avoids re-parsing when already a Timestamp).
src/main/java/com/databricks/jdbc/api/impl/DatabricksArray.java Adds TIMESTAMP_NTZ handling (and avoids re-parsing when already a Timestamp).
src/test/java/com/databricks/jdbc/api/impl/ComplexDataTypeParserTest.java Adds tests covering epoch-micros timestamps, TIMESTAMP_NTZ representations, and base64 BINARY within containers.
NEXT_CHANGELOG.md Documents the user-visible parsing fix for complex types with Arrow alternate formats.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@vikrantpuppala vikrantpuppala force-pushed the fix/complex-type-timestamp-time-epoch-values branch 2 times, most recently from 7cd5ce5 to 92b4488 Compare March 4, 2026 06:19
Copy link
Collaborator

@gopalldb gopalldb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

Arrow serialization uses alternate formats for primitive types within
complex types (ARRAY, MAP, STRUCT): DATE as epoch day integers, TIMESTAMP
and TIMESTAMP_NTZ as epoch microseconds or component arrays, and BINARY
as base64-encoded strings. The parser now correctly handles all these
formats with appropriate fallback logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
@vikrantpuppala vikrantpuppala force-pushed the fix/complex-type-timestamp-time-epoch-values branch from 82780d2 to 2164a4b Compare March 9, 2026 14:26
@vikrantpuppala vikrantpuppala requested a review from madhav-db March 9, 2026 14:29
Copy link
Collaborator

@samikshya-db samikshya-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants