Skip to content

Conversation

@gord02
Copy link
Contributor

@gord02 gord02 commented Oct 28, 2025

This PR is to change the substrait-java virtualTableScan to use nested structs in place of the now deprecated struct literals.

The issue

Key changes:

  • Add the visitation pattern logic
  • Proto to Rel conversion
  • Rel to proto conversion
  • Spark to Substrait
  • Substrait to Spark
  • SubstraitRelNodeConverter: substrait virtualTableScan to calcite logicalValues, required creating a project for non-literal expressions
  • SubstraitRelVisitor: Calcite to substrait conversion that removes the extra project added to account for non-literal expressions

Testing:

  • LogicalValuesTest tested the conversion from pojo to calcite node back to pojo
  • VirtualTableScanTest tests the creation of a VirtualTableScan using nested struct, and the equivalence of the pojo objects created from the proto VirtualTableScan set using the deprecated values, to the new fields attribute
  • VirtualTableScanTest also validates that VirtualTableScan proto rows cannot be set using both values and fields

Testing:
./gradlew test --tests io.substrait.isthmus.LogicalValuesTest --debug-jvm

@bestbeforetoday
Copy link
Member

@gord02 you will need to make sure that the subject line and text of all commits in your pull request conform to the structure dictated by conventional commits. You can look at the output of the Lint commits for semantic-release build check to see problems flagged, and also the CONTRIBUTING guide for tips.

When you have lots of commits that need fixing, it can sometimes be easier to squash them into fewer commits to reduce the number of changes you need to make.

@gord02 gord02 force-pushed the gordon.hamilton/nestedStructSupport branch 2 times, most recently from ef39861 to 17119c8 Compare October 30, 2025 14:39
@nielspardon
Copy link
Member

Does this PR replace #417 which seems to have tried to do a similar change?

@gord02 gord02 marked this pull request as ready for review October 30, 2025 16:01
@gord02
Copy link
Contributor Author

gord02 commented Oct 30, 2025

Hi @nielspardon! This PR completes and builds off of that PR yes. So it can be deleted.

@benbellick
Copy link
Member

I created an issue to track this particular problem @ #587. @gord02 would you mind including it somewhere in your PR description so that it gets linked?

@benbellick benbellick self-requested a review October 30, 2025 18:18
@gord02 gord02 force-pushed the gordon.hamilton/nestedStructSupport branch from 17119c8 to d799bdf Compare October 30, 2025 18:34
Copy link
Member

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left a few comments on here so far. I think some of them might cause some reshuffling around so I'll save some of the more in-depth suggestions for the future. One other thing, it would be great if you could introduce some test which specifically demonstrates the logic you introduced where deserialized protos can handle either the presence of fields or values, but not both. I can imagine a test in which you constuct two protos, one via values and one via fields, and just demonstrate that once you render them both as POJOs, they are equal. That's just one idea though.

Anyways, thanks for all of your work! Let me know if anything I wrote isn't clear :)

@gord02 gord02 force-pushed the gordon.hamilton/nestedStructSupport branch from 6d13b3f to ea02912 Compare November 7, 2025 14:47
Copy link
Member

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience, both in waiting for my review and for dealing with my comments! I appreciate the work. I didn't get a chance to review all of the PR, but wanted to flush the comments I had so far before running off to a meeting. 🙇

@gord02 gord02 force-pushed the gordon.hamilton/nestedStructSupport branch from b4eab3f to 607877b Compare November 10, 2025 14:45
@gord02 gord02 force-pushed the gordon.hamilton/nestedStructSupport branch from 607877b to 3406d3c Compare November 10, 2025 15:13
Copy link
Member

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay got the chance to do another pass. I think I'm just going to review the core section for now until I think its in a good place, and then I'll continue onwards with the review in isthmus / spark if that works for you! Thanks again for your work 🚀

Copy link
Member

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor changes to suggest within core, but after those, everything there looks good! I will come back and review the isthmus stuff in a bit. As mentioned before, I am not as familiar there so it could be a good idea to rope in @vbarua or someone.

Nice work!

Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. The biggest thing I noticed is the logic for converting to Substrait to Calcite doesn't quite work. Your LogicalValuesTest doesn't check for right invariant to detect it.

Assert.equals(1, logicalValues.tuples.size()); // one row
Assert.equals(2, logicalValues.tuples.get(0).size()); // 2 literal expressions
LogicalProject logicalProject = (LogicalProject) relNode;
Assert.equals(1, logicalProject.getProjects().size()); // one non-literal expression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you inspect the rowType of the LogicalProject you've created, you'll notice that it only outputs a single column. We should have two columns, because the VirtualTable we're converting has two columns. The issues is that Calcite, unlike Substrait, only outputs the values of project expressions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I wrote the code was just to use the logicalProject to store expressions that could not fit in the logicalValues table, which are all non-literal expressions. Should this be changed such that all the expressions are reflected in the logicalProject?


return LogicalValues.create(
relBuilder.getCluster(), rowTypeWithNames, ImmutableList.copyOf(tuples));
return relBuilder.push(logicalValues).project(projectExpressions).build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't actually output the correct number of columns when you use it. I left a comment in LogicalValueTest about it.

@vbarua vbarua changed the title Gordon.hamilton/nested struct support feat: support Nested Structs Nov 18, 2025
@gord02 gord02 force-pushed the gordon.hamilton/nestedStructSupport branch from be3f127 to a03532c Compare November 18, 2025 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants