Skip to content

Conversation

@g31pranjal
Copy link
Member

@g31pranjal g31pranjal commented Oct 24, 2025

This PR makes it possible to have some special characters in the identifiers of most of the constructs in Relational DDL and enables querying on them. Since the tables are defined as Descriptor and stored in DynamicMessage, it is imperative for table and column names to be a valid symbol in protoBuf, which is limited to a-z, A-Z, 0-9 and _. Hence, it support (slightly) richer character set, this PR mainly uses surgical translations over names to make the RecordLayer Type system and Protobuf types happy.

However, since the Relational Type and RecordLayer type themselves honour the fact that the Relational name and RecordLayer name should be the same, the translation spills to the topmost layer and is done quite preemptively. To alleviate the issue, we might want to have a more intrinsic understanding of "display" name and "underlying" name.

Another issue, comes from the fact that data flows through the execution plan in protobuf Dynamic Messages, which makes it imperative for the query aliases to be protobuf compliant as well!. We could potentially break out of this by doing some schenanigans around what the recordLayer type puts into the TypeRepository, but the work here takes a relatively simpler approach of just making all aliases and user names protobuf-friendly.

Note that, not all contructs and names are affecting. For example, index names, schema template names, schema and database names should support unicode charset out of the box. However, this PR does not really test around that. I tried testing some part of it - while index names and schema template names do work, databases naming is restrictive currently and does not work. So, the scope of this PR is (or, should be) everything else other than what is mentioned above.

From the implementation POV, the work takes a longer path of doing point translations in many places rather than just one translation in IdentifierVisitor to preemptively translate all identifiers. This is owing to the following reasons:

  • We would still want to leave some constructs from this translation - those that do not require one. We could rather argue that translation could be done even if not needed. However, this would have complicated things around customer facing operations like show, describe items.
  • Theoretically, we could very soon drift apart from using protobuf for runtime data types flowing through plans. Once on it, the scope of translation would reduce to barely to storage access operators - supporting the fact that we should not do translations blindly.

@g31pranjal g31pranjal added the enhancement New feature or request label Oct 24, 2025
@g31pranjal g31pranjal force-pushed the support_dot_in_identifiers branch 2 times, most recently from 5f3c083 to baf5cb2 Compare October 28, 2025 16:02
@g31pranjal g31pranjal force-pushed the support_dot_in_identifiers branch from b5e8523 to 90f2cd6 Compare October 29, 2025 11:53
@github-actions
Copy link

📊 Metrics Diff Analysis Report

Summary

  • New queries: 1
  • Dropped queries: 0
  • Plan changed + metrics changed: 0
  • Plan unchanged + metrics changed: 0
ℹ️ About this analysis

This automated analysis compares query planner metrics between the base branch and this PR. It categorizes changes into:

  • New queries: Queries added in this PR
  • Dropped queries: Queries removed in this PR. These should be reviewed to ensure we are not losing coverage.
  • Plan changed + metrics changed: The query plan has changed along with planner metrics.
  • Metrics only changed: Same plan but different metrics

The last category in particular may indicate planner regressions that should be investigated.

New Queries

Count of new queries by file:

  • yaml-tests/src/test/resources/valid-identifiers.metrics.yaml: 1

@g31pranjal g31pranjal marked this pull request as ready for review October 29, 2025 12:21
@g31pranjal g31pranjal requested a review from hatyo October 29, 2025 12:21
;

recordConstructor
: ofTypeClause? '(' (uid DOT STAR | STAR | expressionWithName /* this can be removed */ | expressionWithOptionalName (',' expressionWithOptionalName)*) ')'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed because of ambiguity error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the ambiguity error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both expressionWithName and expressionWithOptionalName are valid matches for <expression> AS <uid>.

Copy link
Contributor

@hatyo hatyo Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my question is, what triggered the ambiguity error, was there a specific query that caused it? this is a known limitation (as written in the comment) and I am thinking it is triggered by an explicit test you wrote, right?

I am asking because we do have mechanisms in place that immediately fails when encountering an ambiguity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found the test.

@Override
public Expression visitSelectQualifierStarElement(@Nonnull RelationalParser.SelectQualifierStarElementContext ctx) {
final var identifier = visitUid(ctx.uid());
final var identifier = Identifier.toProtobufCompliant(visitUid(ctx.uid()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the invocation of the method that converts a given string to pb-friendly string to visitUid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be possible to push down the translation to visitUid - which means that all identifier are translated to protobuf-friendly. However, uids are used to identify other constructs too, like schema template name, schema name, database name, index name, which I have consciously left to not translate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? It seems sensible to me to do for all identifiers instead of special-casing.

;

recordConstructor
: ofTypeClause? '(' (uid DOT STAR | STAR | expressionWithName /* this can be removed */ | expressionWithOptionalName (',' expressionWithOptionalName)*) ')'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the ambiguity error?

if (!PseudoColumn.isPseudoColumn(id.getName())) {
return getDelegate().getSemanticAnalyzer().resolveIdentifier(Identifier.toProtobufCompliant(id), getDelegate().getCurrentPlanFragment());
} else {
return getDelegate().getSemanticAnalyzer().resolveIdentifier(id.replaceQualifier(q -> q.stream().map(DataTypeUtils::toProtoBufCompliantName).collect(Collectors.toList())), getDelegate().getCurrentPlanFragment());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand why we have this special handling of pseudo fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handling of pseudo fields is needed to match it properly while looking up via Semantic Analyzer. I guess the alternative would be to make the other side of matcher understand about this translation, which I have avoided.

@g31pranjal g31pranjal requested a review from hatyo October 30, 2025 01:01
@g31pranjal g31pranjal changed the title protobuf compliant translations Protobuf compliant translation for supporting richer identifiers for tables and columns Oct 30, 2025
@g31pranjal g31pranjal force-pushed the support_dot_in_identifiers branch from 76ac08b to 3f8b547 Compare October 30, 2025 11:06
@g31pranjal g31pranjal requested a review from hatyo October 30, 2025 15:31
@g31pranjal g31pranjal merged commit 9a18447 into FoundationDB:main Oct 30, 2025
8 checks passed
@g31pranjal g31pranjal deleted the support_dot_in_identifiers branch October 30, 2025 15:42
@hatyo
Copy link
Contributor

hatyo commented Oct 30, 2025

This looks ok to me, although I think we may be able to push the translation to PB deeper into the identifiers, and back to user identifiers in the Metadata, considering the urgency of this PR, let's do it later and bring in the PR now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants