Protobuf compliant translation for supporting richer identifiers for tables and columns #3696

g31pranjal · 2025-10-24T15:00:39Z

This PR makes it possible to have some special characters in the identifiers of most of the constructs in Relational DDL and enables querying on them. Since the tables are defined as Descriptor and stored in DynamicMessage, it is imperative for table and column names to be a valid symbol in protoBuf, which is limited to a-z, A-Z, 0-9 and _. Hence, it support (slightly) richer character set, this PR mainly uses surgical translations over names to make the RecordLayer Type system and Protobuf types happy.

However, since the Relational Type and RecordLayer type themselves honour the fact that the Relational name and RecordLayer name should be the same, the translation spills to the topmost layer and is done quite preemptively. To alleviate the issue, we might want to have a more intrinsic understanding of "display" name and "underlying" name.

Another issue, comes from the fact that data flows through the execution plan in protobuf Dynamic Messages, which makes it imperative for the query aliases to be protobuf compliant as well!. We could potentially break out of this by doing some schenanigans around what the recordLayer type puts into the TypeRepository, but the work here takes a relatively simpler approach of just making all aliases and user names protobuf-friendly.

Note that, not all contructs and names are affecting. For example, index names, schema template names, schema and database names should support unicode charset out of the box. However, this PR does not really test around that. I tried testing some part of it - while index names and schema template names do work, databases naming is restrictive currently and does not work. So, the scope of this PR is (or, should be) everything else other than what is mentioned above.

From the implementation POV, the work takes a longer path of doing point translations in many places rather than just one translation in IdentifierVisitor to preemptively translate all identifiers. This is owing to the following reasons:

We would still want to leave some constructs from this translation - those that do not require one. We could rather argue that translation could be done even if not needed. However, this would have complicated things around customer facing operations like show, describe items.
Theoretically, we could very soon drift apart from using protobuf for runtime data types flowing through plans. Once on it, the scope of translation would reduce to barely to storage access operators - supporting the fact that we should not do translations blindly.

github-actions · 2025-10-29T12:19:45Z

📊 Metrics Diff Analysis Report

Summary

New queries: 1
Dropped queries: 0
Plan changed + metrics changed: 0
Plan unchanged + metrics changed: 0

ℹ️ About this analysis

This automated analysis compares query planner metrics between the base branch and this PR. It categorizes changes into:

New queries: Queries added in this PR
Dropped queries: Queries removed in this PR. These should be reviewed to ensure we are not losing coverage.
Plan changed + metrics changed: The query plan has changed along with planner metrics.
Metrics only changed: Same plan but different metrics

The last category in particular may indicate planner regressions that should be investigated.

New Queries

Count of new queries by file:

yaml-tests/src/test/resources/valid-identifiers.metrics.yaml: 1

g31pranjal · 2025-10-29T12:29:02Z

fdb-relational-core/src/main/antlr/RelationalParser.g4

    ;

 recordConstructor
-    : ofTypeClause? '(' (uid DOT STAR | STAR | expressionWithName /* this can be removed */ | expressionWithOptionalName (',' expressionWithOptionalName)*) ')'


removed because of ambiguity error

What was the ambiguity error?

Both expressionWithName and expressionWithOptionalName are valid matches for <expression> AS <uid>.

I think my question is, what triggered the ambiguity error, was there a specific query that caused it? this is a known limitation (as written in the comment) and I am thinking it is triggered by an explicit test you wrote, right?

I am asking because we do have mechanisms in place that immediately fails when encountering an ambiguity.

Found the test.

...tional-core/src/main/java/com/apple/foundationdb/relational/recordlayer/query/QueryPlan.java

hatyo · 2025-10-29T14:47:43Z

...ain/java/com/apple/foundationdb/relational/recordlayer/query/visitors/ExpressionVisitor.java

    @Override
    public Expression visitSelectQualifierStarElement(@Nonnull RelationalParser.SelectQualifierStarElementContext ctx) {
-        final var identifier = visitUid(ctx.uid());
+        final var identifier = Identifier.toProtobufCompliant(visitUid(ctx.uid()));


Can you move the invocation of the method that converts a given string to pb-friendly string to visitUid?

It could be possible to push down the translation to visitUid - which means that all identifier are translated to protobuf-friendly. However, uids are used to identify other constructs too, like schema template name, schema name, database name, index name, which I have consciously left to not translate.

Why not? It seems sensible to me to do for all identifiers instead of special-casing.

hatyo · 2025-10-29T14:48:24Z

fdb-relational-core/src/main/antlr/RelationalParser.g4

    ;

 recordConstructor
-    : ofTypeClause? '(' (uid DOT STAR | STAR | expressionWithName /* this can be removed */ | expressionWithOptionalName (',' expressionWithOptionalName)*) ')'


What was the ambiguity error?

...ional-core/src/main/java/com/apple/foundationdb/relational/recordlayer/query/Identifier.java

hatyo · 2025-10-29T14:54:56Z

...ain/java/com/apple/foundationdb/relational/recordlayer/query/visitors/ExpressionVisitor.java

+        if (!PseudoColumn.isPseudoColumn(id.getName())) {
+            return getDelegate().getSemanticAnalyzer().resolveIdentifier(Identifier.toProtobufCompliant(id), getDelegate().getCurrentPlanFragment());
+        } else {
+            return getDelegate().getSemanticAnalyzer().resolveIdentifier(id.replaceQualifier(q -> q.stream().map(DataTypeUtils::toProtoBufCompliantName).collect(Collectors.toList())), getDelegate().getCurrentPlanFragment());


I am not sure I understand why we have this special handling of pseudo fields.

This handling of pseudo fields is needed to match it properly while looking up via Semantic Analyzer. I guess the alternative would be to make the other side of matcher understand about this translation, which I have avoided.

...n/java/com/apple/foundationdb/relational/recordlayer/ddl/RecordLayerCatalogQueryFactory.java

...core/src/main/java/com/apple/foundationdb/relational/recordlayer/metadata/DataTypeUtils.java

hatyo · 2025-10-30T15:43:19Z

This looks ok to me, although I think we may be able to push the translation to PB deeper into the identifiers, and back to user identifiers in the Metadata, considering the urgency of this PR, let's do it later and bring in the PR now.

g31pranjal added the enhancement New feature or request label Oct 24, 2025

g31pranjal force-pushed the support_dot_in_identifiers branch 2 times, most recently from 5f3c083 to baf5cb2 Compare October 28, 2025 16:02

g31pranjal added 3 commits October 29, 2025 11:53

protobuf translation for identifiers

ed217ad

more tests and cases

491321f

more tests and cases

90f2cd6

g31pranjal force-pushed the support_dot_in_identifiers branch from b5e8523 to 90f2cd6 Compare October 29, 2025 11:53

cleansing

8a4a135

cleanse yaml tests

57d7841

g31pranjal marked this pull request as ready for review October 29, 2025 12:21

g31pranjal requested a review from hatyo October 29, 2025 12:21

g31pranjal commented Oct 29, 2025

View reviewed changes

hatyo requested changes Oct 29, 2025

View reviewed changes

address comments

3f8b547

g31pranjal requested a review from hatyo October 30, 2025 01:01

g31pranjal changed the title ~~protobuf compliant translations~~ Protobuf compliant translation for supporting richer identifiers for tables and columns Oct 30, 2025

g31pranjal force-pushed the support_dot_in_identifiers branch from 76ac08b to 3f8b547 Compare October 30, 2025 11:06

this is not testable

a53ab5b

hatyo reviewed Oct 30, 2025

View reviewed changes

...core/src/main/java/com/apple/foundationdb/relational/recordlayer/metadata/DataTypeUtils.java Show resolved Hide resolved

address comments

783e567

g31pranjal requested a review from hatyo October 30, 2025 15:31

hatyo approved these changes Oct 30, 2025

View reviewed changes

g31pranjal merged commit 9a18447 into FoundationDB:main Oct 30, 2025
8 checks passed

g31pranjal deleted the support_dot_in_identifiers branch October 30, 2025 15:42

g31pranjal mentioned this pull request Oct 31, 2025

omit __ prefixed identifiers from protobuf translation #3706

Open

Protobuf compliant translation for supporting richer identifiers for tables and columns #3696

Protobuf compliant translation for supporting richer identifiers for tables and columns #3696

Uh oh!

Conversation

g31pranjal commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 29, 2025

📊 Metrics Diff Analysis Report

Summary

New Queries

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hatyo Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hatyo commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

g31pranjal commented Oct 24, 2025 •

edited

Loading

hatyo Oct 30, 2025 •

edited

Loading