Skip to content

[SPARK-55819][SQL] Refactor ExpandExec to be more succinct#54601

Open
eason-yuchen-liu wants to merge 4 commits intoapache:masterfrom
eason-yuchen-liu:refactorExpandExec
Open

[SPARK-55819][SQL] Refactor ExpandExec to be more succinct#54601
eason-yuchen-liu wants to merge 4 commits intoapache:masterfrom
eason-yuchen-liu:refactorExpandExec

Conversation

@eason-yuchen-liu
Copy link
Contributor

What changes were proposed in this pull request?

The implementation of ExpandExec is unnecessarily convoluted. It makes it hard to understand for such a simple logic. This PR tries to simplify it by using all native Iterator transformations.

Why are the changes needed?

Makes the code better, and easier to understand.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing UT. (Also manually verified, and asked Claude to verify.)

Was this patch authored or co-authored using generative AI tooling?

No.

@HeartSaVioR HeartSaVioR changed the title [SPARK-55819] Refactor ExpandExec to be more succinct [SPARK-55819][SQL] Refactor ExpandExec to be more succinct Mar 4, 2026

val groupIterator = groups.iterator
iter.flatMap { input =>
groupIterator.map { group =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to create an iterator per input? You only traverse groups once among all inputs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first commit seems to be the right one for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right was think of object reuse but it will not work. Reverting the change.

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Mar 4, 2026

cc. @cloud-fan
I think this is reasonable but I assume there might be a reason of taking a complex approach, hence the ask for reviewing this. If that's just about a micro performance optimization, I wonder whether we leave it to codegen.

@@ -57,31 +57,11 @@ case class ExpandExec(
val numOutputRows = longMetric("numOutputRows")

child.execute().mapPartitionsWithIndexInternal { (index, iter) =>
val groups = projections.map(projection).toArray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can it compile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, my bad. Missing to revert this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants