Skip to content

Conversation

@deniskuzZ
Copy link
Member

@deniskuzZ deniskuzZ commented Oct 23, 2025

What changes were proposed in this pull request?

Support for variant shredding, enabling Hive to write shredded variant data into Iceberg tables.

Ideally, this should follow the approach described in the reader/writer API proposal for Iceberg V4, where an execution engine provides the shredded writer schema.

As an interim solution, this PR introduces a writer that infers the shredded schema from the sample record captured before the Parquet writer is initialized.

Why are the changes needed?

Enables data skipping (predicate pushdown)

Does this PR introduce any user-facing change?

No

How was this patch tested?

variant_type_shredding.q

@deniskuzZ deniskuzZ marked this pull request as draft October 23, 2025 14:46
@deniskuzZ deniskuzZ changed the title [DRAFT] HIVE-29287: Variant Shredding [DRAFT] HIVE-29287: Iceberg: Variant Shredding Oct 23, 2025
@deniskuzZ
Copy link
Member Author

same thing as apache/iceberg#14297

@deniskuzZ deniskuzZ changed the title [DRAFT] HIVE-29287: Iceberg: Variant Shredding HIVE-29287: Iceberg: Variant Shredding support Oct 31, 2025
@deniskuzZ deniskuzZ marked this pull request as ready for review October 31, 2025 14:14
TableScan
alias: tbl_shredded_variant
filterExpr: (UDFToDouble(variant_get(data, '$.age')) > 25.0D) (type: boolean)
Statistics: Num rows: 3 Data size: 1020 Basic stats: COMPLETE Column stats: NONE
Copy link
Member Author

@deniskuzZ deniskuzZ Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PPD is not supported here, would be addressed in a separate JIRA

@deniskuzZ deniskuzZ requested a review from ayushtkn October 31, 2025 15:37
@sonarqubecloud
Copy link

sonarqubecloud bot commented Nov 1, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants