-
Notifications
You must be signed in to change notification settings - Fork 114
Protobuf compliant translation for supporting richer identifiers for tables and columns #3696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protobuf compliant translation for supporting richer identifiers for tables and columns #3696
Conversation
5f3c083 to
baf5cb2
Compare
b5e8523 to
90f2cd6
Compare
📊 Metrics Diff Analysis ReportSummary
ℹ️ About this analysisThis automated analysis compares query planner metrics between the base branch and this PR. It categorizes changes into:
The last category in particular may indicate planner regressions that should be investigated. New QueriesCount of new queries by file:
|
| ; | ||
|
|
||
| recordConstructor | ||
| : ofTypeClause? '(' (uid DOT STAR | STAR | expressionWithName /* this can be removed */ | expressionWithOptionalName (',' expressionWithOptionalName)*) ')' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed because of ambiguity error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the ambiguity error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both expressionWithName and expressionWithOptionalName are valid matches for <expression> AS <uid>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my question is, what triggered the ambiguity error, was there a specific query that caused it? this is a known limitation (as written in the comment) and I am thinking it is triggered by an explicit test you wrote, right?
I am asking because we do have mechanisms in place that immediately fails when encountering an ambiguity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found the test.
...tional-core/src/main/java/com/apple/foundationdb/relational/recordlayer/query/QueryPlan.java
Show resolved
Hide resolved
| @Override | ||
| public Expression visitSelectQualifierStarElement(@Nonnull RelationalParser.SelectQualifierStarElementContext ctx) { | ||
| final var identifier = visitUid(ctx.uid()); | ||
| final var identifier = Identifier.toProtobufCompliant(visitUid(ctx.uid())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move the invocation of the method that converts a given string to pb-friendly string to visitUid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be possible to push down the translation to visitUid - which means that all identifier are translated to protobuf-friendly. However, uids are used to identify other constructs too, like schema template name, schema name, database name, index name, which I have consciously left to not translate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? It seems sensible to me to do for all identifiers instead of special-casing.
| ; | ||
|
|
||
| recordConstructor | ||
| : ofTypeClause? '(' (uid DOT STAR | STAR | expressionWithName /* this can be removed */ | expressionWithOptionalName (',' expressionWithOptionalName)*) ')' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the ambiguity error?
...ional-core/src/main/java/com/apple/foundationdb/relational/recordlayer/query/Identifier.java
Outdated
Show resolved
Hide resolved
| if (!PseudoColumn.isPseudoColumn(id.getName())) { | ||
| return getDelegate().getSemanticAnalyzer().resolveIdentifier(Identifier.toProtobufCompliant(id), getDelegate().getCurrentPlanFragment()); | ||
| } else { | ||
| return getDelegate().getSemanticAnalyzer().resolveIdentifier(id.replaceQualifier(q -> q.stream().map(DataTypeUtils::toProtoBufCompliantName).collect(Collectors.toList())), getDelegate().getCurrentPlanFragment()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand why we have this special handling of pseudo fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This handling of pseudo fields is needed to match it properly while looking up via Semantic Analyzer. I guess the alternative would be to make the other side of matcher understand about this translation, which I have avoided.
...n/java/com/apple/foundationdb/relational/recordlayer/ddl/RecordLayerCatalogQueryFactory.java
Outdated
Show resolved
Hide resolved
76ac08b to
3f8b547
Compare
...core/src/main/java/com/apple/foundationdb/relational/recordlayer/metadata/DataTypeUtils.java
Show resolved
Hide resolved
|
This looks ok to me, although I think we may be able to push the translation to PB deeper into the identifiers, and back to user identifiers in the Metadata, considering the urgency of this PR, let's do it later and bring in the PR now. |
This PR makes it possible to have some special characters in the identifiers of most of the constructs in Relational DDL and enables querying on them. Since the tables are defined as Descriptor and stored in DynamicMessage, it is imperative for table and column names to be a valid symbol in protoBuf, which is limited to a-z, A-Z, 0-9 and _. Hence, it support (slightly) richer character set, this PR mainly uses surgical translations over names to make the RecordLayer Type system and Protobuf types happy.
However, since the Relational Type and RecordLayer type themselves honour the fact that the Relational name and RecordLayer name should be the same, the translation spills to the topmost layer and is done quite preemptively. To alleviate the issue, we might want to have a more intrinsic understanding of "display" name and "underlying" name.
Another issue, comes from the fact that data flows through the execution plan in protobuf Dynamic Messages, which makes it imperative for the query aliases to be protobuf compliant as well!. We could potentially break out of this by doing some schenanigans around what the recordLayer type puts into the TypeRepository, but the work here takes a relatively simpler approach of just making all aliases and user names protobuf-friendly.
Note that, not all contructs and names are affecting. For example, index names, schema template names, schema and database names should support unicode charset out of the box. However, this PR does not really test around that. I tried testing some part of it - while index names and schema template names do work, databases naming is restrictive currently and does not work. So, the scope of this PR is (or, should be) everything else other than what is mentioned above.
From the implementation POV, the work takes a longer path of doing point translations in many places rather than just one translation in IdentifierVisitor to preemptively translate all identifiers. This is owing to the following reasons:
show,describeitems.