-
Notifications
You must be signed in to change notification settings - Fork 583
Open
Labels
Description
Backend
VL (Velox)
Bug description
Description
Gluten's columnar writer optimization wraps AdaptiveSparkPlanExec with ColumnarToCarrierRow to avoid unnecessary columnar-to-row conversions. However, this breaks the pattern matching used in Apache Spark PR #51432, which relies on:
queryExecution.executedPlan match {
case ae: AdaptiveSparkPlanExec =>
ae.context.shuffleIds.asScala.keys
}When AdaptiveSparkPlanExec is wrapped by ColumnarToCarrierRow, the pattern matching fails, making shuffle IDs inaccessible.
Root Cause
In GlutenWriterColumnarRules.injectFakeRowAdaptor(), when the child is an AdaptiveSparkPlanExec, the original implementation:
- Created a new
AdaptiveSparkPlanExecwithsupportsColumnar=true - Wrapped this with
genColumnarToCarrierRow()→ColumnarToCarrierRow(AdaptiveSparkPlanExec(...))
This structure hides AdaptiveSparkPlanExec inside ColumnarToCarrierRow, breaking any external pattern matching.
Solution
Refactored the wrapping logic to:
- Wrap
aqe.inputPlanwithgenColumnarToCarrierRow()first →ColumnarToCarrierRow(inputPlan) - Create a new
AdaptiveSparkPlanExecwith the wrapped child →AdaptiveSparkPlanExec(ColumnarToCarrierRow(...)) - Set
supportsColumnar=falsesince the child is already wrapped
Gluten version
main branch
Spark version
spark-4.0.x
Spark configurations
No response
System information
No response
Relevant logs
Reactions are currently unavailable