Skip to content

[Feature] Ignore tag.creation-delay with a new spark-sql option when calling trigger_tag_automatic_creation #6450

@JackeyLee007

Description

@JackeyLee007

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

The procedure trigger_tag_automatic_creation was designed to force creating an auto-tag after the right tag time.
For example, if called after hour 0, an auto tag should be created for a daily auto-tagged table which has not been auto-tagged with the name of yesterday.

But it doesn't work under some conditions

  • No snapshot exits since the auto-tag time;
  • The calling time is before the delay time.

The reason is that

  1. Spark is not like Flink which could commit an empty message and make a new snapshot, so when triggered there may be not an appropriate snapshot for tag.
  2. Even if there is a snapshot since the auto-tag time, the TagAutoCreation may still refuse to create since it needs a snapshot after the delay time.

For example

  1. T is daily tag table with creation-delay=10m
  2. spark-sql call this procedure at 2025-10-22T00:02:00,
  3. if there is no snapshot between 00:00:0 ~ 00:02:00, creating no auto-tag
  4. if there is a snapshot at 00:01:00, but it's earlier than the delay time, still creating no auto-tag.

Solution

  1. Like Flink, spark will also commit empty message to make sure a snapshot when calling trigger procedure.
  2. Set up a new spark-sql conf option, like
set `spark.paimon.trigger-tag-auto-creation-ignore-delay` = true;

if the option is true, creat an auto tag even if the snapshot is earlier than the delay.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions