-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Allow time travel reads to Hudi Tables #27140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Reviewer's GuideThis PR introduces a new session variable Sequence diagram for time travel read in Hudi connectorsequenceDiagram
actor User
participant TrinoSession
participant HudiSplitSource
participant HudiReadOptimizedDirectoryLister
participant HudiFileSystemView
User->>TrinoSession: Set hudi.time_travel_read_timestamp
TrinoSession->>HudiSplitSource: getTimeTravelReadTimestamp(session)
HudiSplitSource->>HudiReadOptimizedDirectoryLister: Pass timeTravelReadTimestamp
HudiReadOptimizedDirectoryLister->>HudiFileSystemView: getLatestBaseFilesBeforeOrOn(partitionPath, timeTravelReadTimestamp)
HudiFileSystemView-->>HudiReadOptimizedDirectoryLister: Return base files as of timestamp
HudiReadOptimizedDirectoryLister-->>HudiSplitSource: Return filtered file statuses
HudiSplitSource-->>TrinoSession: Return data as of timestamp
Class diagram for updated HudiReadOptimizedDirectoryLister and HudiSessionPropertiesclassDiagram
class HudiReadOptimizedDirectoryLister {
- HoodieTableFileSystemView fileSystemView
- List<Column> partitionColumns
- Map<String, HudiPartitionInfo> allPartitionInfoMap
- String timeTravelReadTimestamp
+ HudiReadOptimizedDirectoryLister(..., String timeTravelReadTimestamp)
+ List<HudiFileStatus> listStatus(HudiPartitionInfo partitionInfo)
- static StoragePathInfo getStoragePathInfo(HoodieBaseFile baseFile)
+ void close()
}
class HudiSessionProperties {
- static final String TIME_TRAVEL_READ_TIMESTAMP
+ static String getTimeTravelReadTimestamp(ConnectorSession session)
+ List<PropertyMetadata<?>> getSessionProperties()
}
HudiSplitSource --> HudiReadOptimizedDirectoryLister : passes timeTravelReadTimestamp
HudiSessionProperties <.. HudiSplitSource : static method getTimeTravelReadTimestamp
HudiSessionProperties <.. TrinoSession : session property
Class diagram for HudiSplitSource changesclassDiagram
class HudiSplitSource {
+ HudiSplitSource(..., String timeTravelReadTimestamp)
+ CompletableFuture<ConnectorSplitBatch> getNextBatch(int maxSize)
+ boolean isFinished()
- static HudiSplitWeightProvider createSplitWeightProvider(ConnectorSession session)
}
HudiSplitSource --> HudiReadOptimizedDirectoryLister : instantiates with timeTravelReadTimestamp
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiSessionProperties.java:150` </location>
<code_context>
- {
- return sessionProperties;
+ false),
+ stringProperty(TIME_TRAVEL_READ_TIMESTAMP, "Read data as of provided timestamp - if empty Trino will read from current snapshot", "", false));
}
</code_context>
<issue_to_address>
**suggestion:** Clarify expected format for time_travel_read_timestamp property.
Consider adding the required timestamp format to the property description to help users avoid mistakes.
```suggestion
stringProperty(
TIME_TRAVEL_READ_TIMESTAMP,
"Read data as of provided timestamp in format 'yyyy-MM-dd HH:mm:ss' - if empty Trino will read from current snapshot",
"",
false));
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| { | ||
| return sessionProperties; | ||
| false), | ||
| stringProperty(TIME_TRAVEL_READ_TIMESTAMP, "Read data as of provided timestamp - if empty Trino will read from current snapshot", "", false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Clarify expected format for time_travel_read_timestamp property.
Consider adding the required timestamp format to the property description to help users avoid mistakes.
| stringProperty(TIME_TRAVEL_READ_TIMESTAMP, "Read data as of provided timestamp - if empty Trino will read from current snapshot", "", false)); | |
| stringProperty( | |
| TIME_TRAVEL_READ_TIMESTAMP, | |
| "Read data as of provided timestamp in format 'yyyy-MM-dd HH:mm:ss' - if empty Trino will read from current snapshot", | |
| "", | |
| false)); |
ef12d42 to
6746772
Compare
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
6746772 to
5102fa4
Compare
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiSessionProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/TestHudiMetadata.java
Outdated
Show resolved
Hide resolved
5102fa4 to
3471a5a
Compare
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
|
@ebyhr I signed my CLA but nobody has reviewed it in a week. Would it be possible to get anybody to take a look? |
plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/TestHudiConnectorTest.java
Show resolved
Hide resolved
3471a5a to
63c448a
Compare
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
63c448a to
7c6ca6c
Compare
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
|
@cla-bot check |
|
The cla-bot has been summoned, and re-checked this pull request! |
...in/trino-hudi/src/main/java/io/trino/plugin/hudi/query/HudiReadOptimizedDirectoryLister.java
Outdated
Show resolved
Hide resolved
...in/trino-hudi/src/main/java/io/trino/plugin/hudi/query/HudiReadOptimizedDirectoryLister.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiTableHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/testing/TpchHudiTablesInitializer.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiMetadata.java
Outdated
Show resolved
Hide resolved
...in/trino-hudi/src/main/java/io/trino/plugin/hudi/query/HudiReadOptimizedDirectoryLister.java
Outdated
Show resolved
Hide resolved
e6c3331 to
7c9fd62
Compare
voonhous
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic looks good to me, no logic changes, just some stylistic suggestions.
plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/TestHudiConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiTableHandle.java
Show resolved
Hide resolved
7c9fd62 to
22db806
Compare
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiMetadata.java
Outdated
Show resolved
Hide resolved
22db806 to
b62d294
Compare
plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/TestHudiConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/TestHudiConnectorTest.java
Outdated
Show resolved
Hide resolved
0e3d9a0 to
c558142
Compare
c558142 to
a51096a
Compare
This commit handles simple "FOR VERSION AS OF X" syntax in the Hudi connector order to ensure that we can read previous table state. Previously, all reads to Hudi tables would occur at the latest commit timestamp. While a user could technically filter down the data using a predicate on the _hoodie_commit_time column, there is no functionality that pushes this down into the Hudi API to minimize file reads. Timestamps are provided as strings. Allow time travel reads to Hudi Tables This commit handles simple "FOR VERSION AS OF X" syntax in the Hudi connector order to ensure that we can read previous table state. Previously, all reads to Hudi tables would occur at the latest commit timestamp. While a user could technically filter down the data using a predicate on the _hoodie_commit_time column, there is no functionality that pushes this down into the Hudi API to minimize file reads. Timestamps are provided as strings.
f095231 to
7ad8eb0
Compare
|
|
||
| ConnectorTableVersion version = endVersion.get(); | ||
| if (version.getPointerType() == PointerType.TEMPORAL) { | ||
| throw new TrinoException(NOT_SUPPORTED, "Cannot read 'TIMESTAMP' of Hudi table, use 'VERSION' instead"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| throw new TrinoException(NOT_SUPPORTED, "Cannot read 'TIMESTAMP' of Hudi table, use 'VERSION' instead"); | |
| throw new TrinoException(NOT_SUPPORTED, "This connector does not support reading tables with TIMESTAMP AS OF"); |
|
|
||
| ConnectorTableVersion version = endVersion.get(); | ||
| if (version.getPointerType() == PointerType.TEMPORAL) { | ||
| throw new TrinoException(NOT_SUPPORTED, "Cannot read 'TIMESTAMP' of Hudi table, use 'VERSION' instead"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is this limitation(reading with temporal) coming from Hudi, or is it something that could be supported in the future?
If it's the latter, could you file an issue and add a TODO here with a link to it?
| assertThat(query("SELECT CAST(nationkey AS INT) FROM hudi.tests.nation")).matches(expectedValues); | ||
| assertThat(query("SELECT CAST(nationkey AS INT) FROM hudi.tests.nation FOR VERSION AS OF '" + COMMIT_TIMESTAMP + "'")).matches(expectedValues); | ||
| assertThat(query("SELECT CAST(nationkey AS INT) FROM hudi.tests.nation FOR VERSION AS OF '" + (COMMIT_TIMESTAMP - 1) + "'")).returnsEmptyResult(); | ||
| assertQueryFails("SELECT CAST(nationkey AS INT) FROM hudi.tests.nation FOR VERSION AS OF 0", "Provided read version must be a string"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add a case that is negative number/string?
| public class TestHudiConnectorTest | ||
| extends BaseConnectorTest | ||
| { | ||
| private static final long COMMIT_TIMESTAMP = 20251027183851494L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the naming aligned with the Hudi glossary? using "commit timestamp" sounds more like a temporal read, it actually more like a "commit version"
This commit handles simple "FOR VERSION AS OF X" syntax in
the Hudi connector order to ensure that we can read previous
table state.
Previously, all reads to Hudi tables would occur at the latest
commit timestamp. While a user could technically filter down
the data using a predicate on the _hoodie_commit_time column,
there is no functionality that pushes this down into the Hudi
API to minimize file reads.
Timestamps are provided as strings..
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:
Added time travel reads in Hudi connector.