deps: upgrade to DataFusion 53.0, Arrow to 58.1#3629
Draft
mbutrovich wants to merge 35 commits intoapache:mainfrom
Draft
deps: upgrade to DataFusion 53.0, Arrow to 58.1#3629mbutrovich wants to merge 35 commits intoapache:mainfrom
mbutrovich wants to merge 35 commits intoapache:mainfrom
Conversation
# Conflicts: # native/Cargo.lock # native/Cargo.toml # native/core/Cargo.toml # native/core/src/execution/operators/iceberg_scan.rs
This was referenced Mar 17, 2026
Closed
Contributor
Author
|
So the shuffle failures are related to apache/arrow-rs#9506 I opened an upstream bug for hash join: apache/datafusion#20995 |
Contributor
Author
|
Down to |
# Conflicts: # native/Cargo.lock # native/Cargo.toml # native/core/Cargo.toml # native/core/src/execution/planner.rs # native/core/src/parquet/parquet_support.rs # native/core/src/parquet/schema_adapter.rs
Contributor
Author
|
Bumped to released crates. Let's see how CI goes. |
Contributor
|
Most of tests fail on, checking it: |
Contributor
Author
what's odd is those didn't fail on earlier versions of this branch, I don't think. |
# Conflicts: # native/Cargo.lock
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #3574.
Rationale for this change
Upgrade dependencies.
What changes are included in this PR?
Dependency changes:
datafusion,datafusion-datasource,datafusion-physical-expr-adapter,datafusion-spark→53.0.0datafusion-sparknow withfeatures = ["core"]datafusion-functions-nested(test dep) 52.4.0 → 53.0.0arrow57.3.0 → 58.1.0parquet57.3.0 → 58.1.0object_store0.12.3 → 0.13.1iceberg,iceberg-storage-opendal→ git rev477a1e5(DF 53 support)opendal,object_store_opendal→ git rev173feb6(unreleased commit onmainwith object_store 0.13 support, tracking opendal#7237)API fixes:
ExecutionPlan::properties()returns&Arc<PlanProperties>— wrappedcachefields inArcacross 7 files (expand, iceberg_scan, parquet_writer, scan, shuffle_scan, shuffle_writer, and theirproperties()return types)ExecutionPlan::statistics()fromparquet_writerandshuffle_writer(no longer in trait)HashJoinExec::try_newtakes newnull_aware: boolparam — addedfalse(Spark doesn't use null-aware anti join path)PhysicalExprAdapterFactory::createnow returnsResult— updatedSparkPhysicalExprAdapterFactoryandIcebergScanExecEncryptionFactorymethods returnResult<...>instead ofResult<..., DataFusionError>hdfsObjectStoreimpl toobject_store0.13 API — removed trait methods moved toObjectStoreExt(get,get_range,head,delete,copy,rename,copy_if_not_exists), added new required methods (delete_stream,copy_opts), rewroteget_rangesto open the file once and read all ranges directlyRoundFuncnow expectsInt32for decimal_places — converted point arg from Int64 to Int32 inspark_roundBehavioral fixes:
fields_with_udf()aggressively promotes types (e.g. Utf8→Utf8View, Int32→Int64). New 3-tier strategy: (1) trycoerce_types()for UDFs that implement it, (2) usefields_with_udf()only for "well-supported” signatures (Coercible, String, Numeric, Comparable) that preserve input types, (3) keep original types for all other signatures (Variadic, Exact, etc.)md5) to returnUtf8View/BinaryView. Added casts back toUtf8/Binarysince Comet does not yet support view typesSparkArrayCompact: New Comet UDF replacingarray_remove_all(arr, null)— DF53 changedarray_remove_allto return NULL when the element arg is NULL, breakingarray_compactsemanticsSparkArrayRepeatnot registered: Intentionally skipped because it returns NULL when the element is NULL (e.g.array_repeat(null, 3)→ NULL instead of[null, null, null]). Comet's Scala serde wraps the call in a CaseWhen, so DataFusion's built-inArrayRepeatis sufficientCometFairMemoryPool: DF53 changed timing of reservation atomic updates —reservation.size()now reflects post-shrink/pre-grow values. Switched to tracking viastate.usedinsteadCoalesceBatchesExecwrapping of SMJ:CoalesceBatchesExecis deprecated in DF53; removed the special-case wrapping for filtered sort-merge joinswrap_all_type_mismatchesnow resolves logical fields by name instead of column index, and remaps column indices to the physical file schema. Fixes pruned-schema scenarios where filter expressions reference columns at different indices than the full file schemaFeature-gated:
hdfs-opendal— cfg-gated HDFS code paths inparquet_writerso it compiles cleanly when the feature is offTests:
How are these changes tested?
Existing tests.