Skip to content

Fix AdaptiveSparkPlanExec wrapped by ColumnarToCarrierRow breaks shuffle IDs retrieval #11752

@wangyum

Description

@wangyum

Backend

VL (Velox)

Bug description

Description

Gluten's columnar writer optimization wraps AdaptiveSparkPlanExec with ColumnarToCarrierRow to avoid unnecessary columnar-to-row conversions. However, this breaks the pattern matching used in Apache Spark PR #51432, which relies on:

queryExecution.executedPlan match {
  case ae: AdaptiveSparkPlanExec =>
    ae.context.shuffleIds.asScala.keys
}

When AdaptiveSparkPlanExec is wrapped by ColumnarToCarrierRow, the pattern matching fails, making shuffle IDs inaccessible.

Root Cause

In GlutenWriterColumnarRules.injectFakeRowAdaptor(), when the child is an AdaptiveSparkPlanExec, the original implementation:

  1. Created a new AdaptiveSparkPlanExec with supportsColumnar=true
  2. Wrapped this with genColumnarToCarrierRow()ColumnarToCarrierRow(AdaptiveSparkPlanExec(...))

This structure hides AdaptiveSparkPlanExec inside ColumnarToCarrierRow, breaking any external pattern matching.

Solution

Refactored the wrapping logic to:

  1. Wrap aqe.inputPlan with genColumnarToCarrierRow() first → ColumnarToCarrierRow(inputPlan)
  2. Create a new AdaptiveSparkPlanExec with the wrapped child → AdaptiveSparkPlanExec(ColumnarToCarrierRow(...))
  3. Set supportsColumnar=false since the child is already wrapped

Gluten version

main branch

Spark version

spark-4.0.x

Spark configurations

No response

System information

No response

Relevant logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions