Skip to content

[proposal] Notifications for triggers#414

Draft
kumare3 wants to merge 10 commits intomainfrom
notifications
Draft

[proposal] Notifications for triggers#414
kumare3 wants to merge 10 commits intomainfrom
notifications

Conversation

@kumare3
Copy link
Contributor

@kumare3 kumare3 commented Dec 11, 2025

No description provided.

Signed-off-by: Ketan Umare <kumare3@users.noreply.github.com>
Comment on lines +28 to +31
on_phase=(
"FAILED",
"TIMED OUT",
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some way to pass in list of enums instead of strings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enums we avoid as you have to import so we use literals. these are literals

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make them enums, no problem though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking, if we do this, then all phases will be in a common model for all of user facing apis and we can then use them everywhere

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we re-use the phase literals as defined here? https://github.com/flyteorg/flyte-sdk/blob/main/src/flyte/remote/_run.py#L21-L23 (i.e. lower-cased)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh ya - but I think I will
Move them to a common place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved them to a common place now, will update PR

Signed-off-by: Ketan Umare <kumare3@users.noreply.github.com>
@kumare3 kumare3 marked this pull request as draft December 13, 2025 01:57
name="hourly",
auto_activate=True,
automation=flyte.Cron("* * * * *"),
notifications=flyte.notify.Slack(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is this saying that notifications will ONLY be sent for failed triggered runs?
    What if I want to configure notifications even for runs that are manually created(via Web or CLI). Lets say, its an important workflow on prod domain and everybody in the team must be aware about failures even If I'm the author of the run and I saw the failure.

  2. Are the notifications only attached to individual tasks?
    do we want to allow configuring notifications for entire domain(prod), project+domain(<important_project> on prod) or maybe project+domain+task_regex?
    What if I want to configure your notifications once, without repeating same configs on multiple task definitions?

As a follow up for 1 and 2, should notifications be a separate SDK entity in which you can configure multiple matching criteria independently? project, domain, task_regex, triggered vs manual, maybe others in future?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like it to be possible to notify at the organization, domain, and project-domain levels as this is a frequent request we got in v1 over the past few years. People want to be able to notify on any failure in the prod domain, for instance. I don't know that this makes sense to include in the SDK, though - I feel like this makes sense in the UI for sure, as well as the API/CLI (which can be used by a terraform provider).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Votta. With scoping for now to be triggered runs only

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that we really only do email for now even for slack, should we just do email?

),
flyte.notify.Webhook(
on_phase="SUCCEEDED",
headers={"Content-Type": "application/json"},
Copy link

@iaroslav-ciupin iaroslav-ciupin Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. should webhooks URLs be accessible without any auth credentials? if not, what should be the DevUX for customer to provide credentials for his webhook securely to Union? and I guess Union notifications service(workers) should be able to read those secure credentials during the notification delivery.
  2. same question about Slack API. Not familiar with it, but do we need some token to hit the API? who and how will provide this token to us?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will stick to unauthenticated webhooks (or more specifically, webhooks which embed the auth into the webhook itself)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

flyte.notify.Email(
on_phase="SUCCEEDED",
subject="Hello world! from {task.name}",
body="Hello world! from {task.name}",
Copy link

@iaroslav-ciupin iaroslav-ciupin Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just want to double check the DevUX here: if I want the same email template to be sent for all my failed tasks with some variables like <task_name>, <domain>, <project>, etc. , I can simply store this template in a Python variable, and then re-use it in all Email notification configs?
not sure what email provider are we going to use or if its critical right now, but, for example, AWS SES supports both sending raw emails with entire HTML body and sending emails with template id, that refers to previously uploaded template.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just render on client side. As this may change all the time

Signed-off-by: Ketan Umare <kumare3@users.noreply.github.com>
wild-endeavor
wild-endeavor previously approved these changes Mar 2, 2026
Copy link
Contributor

@wild-endeavor wild-endeavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we tried the slack notification webhook? i know for v1 everything was done through email - will the webhook url ever need any templating?

name="hourly",
auto_activate=True,
automation=flyte.Cron("* * * * *"),
notifications=flyte.notify.Slack(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that we really only do email for now even for slack, should we just do email?

on_phase=ActionPhase.FAILED,
webhook_url="https://webhook.site/",
message="Hello world! from {task.name}",
blocks=(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we drop this for now, until customers ask? just wary of supporting too many features out of the gate, and then we have to mark things deprecated if they ever change.

auto_activate=True,
automation=flyte.Cron("* * * * *"),
notifications=flyte.notify.Slack(
on_phase=ActionPhase.FAILED,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just call this on?

automation=flyte.Cron("* * * * *"),
notifications=flyte.notify.Slack(
on_phase=ActionPhase.FAILED,
webhook_url="https://webhook.site/",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change the name maybe so that it doesn't cause any confusion with the Webhook notification? slack_endpoint or service_url or something?

Supported Phases:
- "SUCCEEDED": Task completed successfully
- "FAILED": Task failed
- "TIMED_OUT": Task timed out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will lease failovers count here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not right now only terminal. I think we should also drop support for Queued and Running, cc @iaroslav-ciupin / @EngHabu

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what is a lease failover.
I think the original requirements was to ONLY send notifications for terminal root actions(when entire run completes).

kumare3 added 2 commits March 2, 2026 21:39
@iaroslav-ciupin
Copy link

@kumare3 I started this IDL PR with my proposal for backend API https://github.com/flyteorg/flyte/pull/6946/changes
This SDK proposal should be possible to implement with one-off notification settings stored directly in RunSpec.
Please also refer to implementation roadmap here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants