-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Search before asking
- I searched in the issues and found nothing similar.
Motivation
The procedure trigger_tag_automatic_creation was designed to force creating an auto-tag after the right tag time.
For example, if called after hour 0, an auto tag should be created for a daily auto-tagged table which has not been auto-tagged with the name of yesterday.
But it doesn't work under some conditions
- No snapshot exits since the auto-tag time;
- The calling time is before the delay time.
The reason is that
- Spark is not like Flink which could commit an empty message and make a new snapshot, so when triggered there may be not an appropriate snapshot for tag.
- Even if there is a snapshot since the auto-tag time, the TagAutoCreation may still refuse to create since it needs a snapshot after the delay time.
For example
- T is daily tag table with creation-delay=10m
- spark-sql call this procedure at 2025-10-22T00:02:00,
- if there is no snapshot between 00:00:0 ~ 00:02:00, creating no auto-tag
- if there is a snapshot at 00:01:00, but it's earlier than the delay time, still creating no auto-tag.
Solution
- Like Flink, spark will also commit empty message to make sure a snapshot when calling trigger procedure.
- Set up a new spark-sql conf option, like
set `spark.paimon.trigger-tag-auto-creation-ignore-delay` = true;
if the option is true, creat an auto tag even if the snapshot is earlier than the delay.
Anything else?
No response
Are you willing to submit a PR?
- I'm willing to submit a PR!
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request