Skip to content

feat: notification channels for alerts#37

Merged
TerrifiedBug merged 12 commits intomainfrom
feat/notification-channels
Mar 7, 2026
Merged

feat: notification channels for alerts#37
TerrifiedBug merged 12 commits intomainfrom
feat/notification-channels

Conversation

@TerrifiedBug
Copy link
Copy Markdown
Owner

Summary

  • Add NotificationChannel and AlertRuleChannel Prisma models with SQL migration
  • Implement channel delivery drivers for Slack (Block Kit), Email (nodemailer/SMTP), PagerDuty (Events API v2), and generic Webhook (HMAC-signed)
  • Add tRPC CRUD procedures (listChannels, createChannel, updateChannel, deleteChannel, testChannel) with team-scoped auth and audit logging
  • Update createRule/updateRule to accept optional channelIds for per-rule channel routing
  • Integrate deliverToChannels into the heartbeat alert delivery pipeline alongside existing webhook delivery
  • Build full notification channels UI with type-specific config forms, test/toggle/edit/delete, and channel multi-select on alert rules
  • Expand alerts documentation with setup examples for all four channel types using GitBook tabs

Test plan

  • Run npx prisma migrate deploy against a database to apply the migration
  • Create notification channels of each type (Slack, Email, PagerDuty, Webhook) and verify config forms render correctly
  • Test each channel type using the test button
  • Create an alert rule with specific channels selected; verify AlertRuleChannel records are created
  • Edit an alert rule to change linked channels; verify old links are replaced
  • Trigger an alert and verify delivery to both legacy webhooks and new notification channels
  • Toggle channel enable/disable and verify delivery respects the flag
  • Delete a channel and verify cascade removes AlertRuleChannel links
  • Verify legacy webhooks section only appears when legacy webhooks exist
  • Verify sensitive config fields (smtpPass, integrationKey, hmacSecret) are redacted in listChannels response

@github-actions github-actions bot added feature documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file and removed feature labels Mar 7, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 7, 2026

Greptile Summary

This PR introduces a full notification channels system as a first-class replacement for legacy webhooks, adding Slack (Block Kit), Email (nodemailer/SMTP), PagerDuty (Events API v2), and generic HMAC-signed Webhook delivery drivers alongside a complete tRPC CRUD API and a polished channel management UI.

Key changes:

  • NotificationChannel and AlertRuleChannel Prisma models added with cascade-deletes and correct indexes; migration is safe to deploy
  • All five new tRPC mutations (createChannel, updateChannel, deleteChannel, testChannel, plus updateRule extension) carry withTeamAccess("EDITOR") and withAudit middleware; listChannels uses withTeamAccess("VIEWER")
  • Channel-environment ownership is validated before AlertRuleChannel records are written, closing the cross-environment channel-link vulnerability
  • updateRule channel link replacement is wrapped in prisma.$transaction() for atomicity
  • PRESERVE_IF_ABSENT logic in updateChannel correctly restores smtpPass, integrationKey, and hmacSecret when a client submits blank values for redacted fields, and the PagerDuty integration-key validation is correctly skipped on the client when editing an existing channel
  • All four drivers re-validate URLs/hosts on every deliver() call, closing the DNS rebinding window for both Slack/Webhook (validatePublicUrl) and Email (validateSmtpHost)
  • HTML in email bodies is entity-escaped via escapeHtml before interpolation
  • The NotificationChannel entity loader in audit.ts intentionally excludes config to avoid persisting SMTP passwords and routing keys in audit-log snapshots
  • detail-panel.tsx fixes a pre-existing bug where componentKey (undefined in scope) was passed instead of storeKey to LiveTailPanel
  • Minor UX issue: the shared testMutation instance causes all channel test buttons to become disabled while any single channel test is in flight (see inline comment)

Confidence Score: 4/5

  • Safe to merge — all previously flagged security and correctness issues have been addressed; only a minor UX polish item remains.
  • The critical and important issues raised in earlier review rounds (plaintext secrets, cross-environment channel linking, missing withAudit, non-atomic updateRule, orphan rule on create failure, SSRF on SMTP host, HTML injection, credential wipe on edit, missing required-field validation, DNS rebinding, missing URL enforcement on update, and the PagerDuty edit deadlock) are all resolved. The remaining finding is a low-impact UX bug where a shared mutation hook disables all test buttons during any single test. Database migration is additive with cascade semantics and no destructive changes.
  • src/app/(dashboard)/alerts/page.tsx — shared testMutation.isPending disables all test buttons simultaneously

Important Files Changed

Filename Overview
src/server/routers/alert.ts Adds full CRUD + test procedures for NotificationChannel with correct team-scoped auth, withAudit on all mutations, channel-environment validation before rule creation, atomic channel-link replacement in updateRule, and PRESERVE_IF_ABSENT logic to protect redacted credential fields on update. Minor: pagerduty validation in updateChannel correctly enforces integrationKey after credential restoration.
src/server/services/channels/index.ts Clean driver dispatch and deliverToChannels implementation. Correctly distinguishes "no links exist" (broadcast to all env channels) from "links exist but all disabled" (deliver to no channels), preventing the previously-flagged silent broadcast regression.
src/server/services/channels/email.ts HTML injection addressed with escapeHtml on all user-controlled fields. DNS rebinding addressed by calling validateSmtpHost on every deliver() invocation, not just at channel creation time. TRPCError thrown by validateSmtpHost is a subclass of Error and is correctly caught and converted to a delivery-failure result.
src/server/services/channels/slack.ts SSRF protection applied on every deliver() call via validatePublicUrl. Block Kit formatting is safe — user-controlled strings passed to plain_text/mrkdwn fields are not HTML-rendered by Slack.
src/server/services/channels/webhook.ts SSRF protection on every deliver() call. HMAC signing correctly applied before the request. User-supplied custom headers are spread in without per-value type checks, but the worst-case outcome is a runtime fetch failure, not a security issue.
src/server/services/channels/pagerduty.ts Correct dedup_key strategy using alertId for PagerDuty incident correlation. Resolve events are sent for resolved alerts. Test method triggers then immediately resolves to avoid leaving open incidents.
src/server/services/url-validation.ts New validateSmtpHost follows the same private-IP block-list logic as validatePublicUrl. IPv6 bracket stripping, DNS resolution fallback to empty array on failure, and re-throw of TRPCError are all handled correctly.
src/app/(dashboard)/alerts/page.tsx Full NotificationChannelsSection UI with per-type config forms, inline test/edit/delete actions, and channel multi-select on alert rules. PagerDuty credential-field validation correctly exempted when editing (editingChannelId check). One minor issue: shared testMutation instance causes all test buttons to disable while any single test is in flight.
src/app/api/agent/heartbeat/route.ts Legacy webhook delivery remains awaited for backward compatibility; notification channel delivery is fire-and-forget with a .catch() error logger. This keeps heartbeat latency low but means in-flight channel deliveries may be interrupted on process shutdown. Error logging is in place.
src/server/middleware/audit.ts NotificationChannel added to resolveTeamId, resolveEnvironmentId, and ENTITY_LOADERS. The entity loader intentionally excludes config to avoid persisting SMTP passwords, PagerDuty keys, and HMAC secrets in the audit log — correct security decision.
src/trpc/init.ts withTeamAccess updated to resolve team context from a NotificationChannel id, enabling the middleware to work correctly for updateChannel, deleteChannel, and testChannel mutations that receive only an id input.
src/components/flow/detail-panel.tsx One-line fix: LiveTailPanel now receives storeKey instead of componentKey. storeKey is the correct variable in scope for this component; componentKey was undefined here and would have silently broken live-tail filtering.
prisma/migrations/20260307100000_add_notification_channels/migration.sql Migration creates NotificationChannel and AlertRuleChannel tables with correct cascade deletes on both foreign keys, a unique index on (alertRuleId, channelId), and an index on environmentId for efficient scoped lookups.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant Heartbeat as /api/agent/heartbeat
    participant AlertEval as alert-evaluator
    participant LegacyWH as deliverWebhooks
    participant Channels as deliverToChannels
    participant DB as PostgreSQL

    Agent->>Heartbeat: POST metrics/statuses
    Heartbeat->>AlertEval: evaluateAlerts()
    AlertEval-->>Heartbeat: alerts[] to deliver

    loop for each alert
        Heartbeat->>LegacyWH: await deliverWebhooks(envId, payload)
        LegacyWH->>DB: findMany AlertWebhook where envId
        LegacyWH-->>Heartbeat: done (awaited)

        Heartbeat->>Channels: deliverToChannels(envId, ruleId, payload) [fire-and-forget]
        Note over Channels: Resolves channel routing
        Channels->>DB: findMany AlertRuleChannel where ruleId
        alt linked channels exist
            Channels->>DB: use enabled linked channels only
        else no links
            Channels->>DB: findMany NotificationChannel where envId+enabled
        end
        Channels->>Channels: dispatch to slack/email/pagerduty/webhook drivers
    end

    Heartbeat-->>Agent: 200 OK (before channel delivery completes)
Loading

Last reviewed commit: 1aa7c05

Add NotificationChannel and AlertRuleChannel models to Prisma schema
with corresponding SQL migration. Create channel delivery service with
drivers for Slack (Block Kit), Email (nodemailer/SMTP), PagerDuty
(Events API v2), and generic Webhook (HMAC-signed). Install nodemailer
dependency.
Add listChannels, createChannel, updateChannel, deleteChannel, and
testChannel tRPC procedures. Update createRule and updateRule to accept
optional channelIds for linking alert rules to specific channels.
Update withTeamAccess and audit middleware to resolve NotificationChannel
entities.
Call deliverToChannels alongside existing deliverWebhooks in the
heartbeat route for each fired/resolved alert. Legacy webhook delivery
is preserved for backward compatibility.
Replace the standalone Webhooks section with a full Notification Channels
section supporting Slack, Email, PagerDuty, and Webhook types. Each type
has a dedicated config form. Channels can be tested, toggled, edited, and
deleted. Alert rule create/edit dialogs now include a multi-select for
notification channels. Legacy webhooks section is preserved but only
shown when legacy webhooks exist.
Cover Slack, Email, PagerDuty, and Webhook channel setup with
type-specific examples in tabbed format. Document channel routing,
legacy webhook migration path, and updated alert rule configuration.
- Validate channelIds belong to the rule's environment before creating
  AlertRuleChannel records (prevents cross-environment channel linking)
- Wrap channel link replacement in updateRule in a $transaction for
  atomicity
- Add withAudit middleware to testChannel mutation
- Fix deliverToChannels fallback: when channels are explicitly linked
  but all disabled, do not fall back to all environment channels
- Rebase onto main to pick up DashboardView, PipelineSli, node labels,
  nodeSelector, and OIDC group sync fields added in recent PRs
- Fix TS error: use storeKey instead of undefined componentKey variable
  in LiveTailPanel prop
- Restructure updateChannel validation to run required-field checks
  AFTER sensitive-field preservation, preventing empty config payloads
  from creating broken channels
@TerrifiedBug TerrifiedBug force-pushed the feat/notification-channels branch from 1b9e587 to a015226 Compare March 7, 2026 15:03
The API redacts integrationKey in responses, but the client-side form
validation required it to be non-empty. This made it impossible to edit
PagerDuty channels. Skip the integrationKey requirement when editing
(the server already preserves the existing key) and show a placeholder
hint so users know they can leave it blank.
<DialogHeader>
<DialogTitle>
{editingChannelId
? "Edit Notification Channel"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared testMutation disables all test buttons simultaneously

testMutation is a single hook instance shared across every channel row. When a test is in-progress for any channel, testMutation.isPending is true for the whole component, so every test button in the table becomes disabled — even for channels that aren't being tested.

A user who wants to test a second channel while the first is still running will see all buttons grayed out with no indication of why. Consider tracking which channel is currently under test:

const [testingChannelId, setTestingChannelId] = useState<string | null>(null);

const testMutation = useMutation(
  trpc.alert.testChannel.mutationOptions({
    onSuccess: (result) => {
      setTestingChannelId(null);
      // ...
    },
    onError: (error) => {
      setTestingChannelId(null);
      // ...
    },
  }),
);

// In the row:
onClick={() => {
  setTestingChannelId(channel.id);
  testMutation.mutate({ id: channel.id });
}}
disabled={testingChannelId === channel.id}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/(dashboard)/alerts/page.tsx
Line: 991

Comment:
**Shared `testMutation` disables all test buttons simultaneously**

`testMutation` is a single hook instance shared across every channel row. When a test is in-progress for any channel, `testMutation.isPending` is `true` for the whole component, so every test button in the table becomes disabled — even for channels that aren't being tested.

A user who wants to test a second channel while the first is still running will see all buttons grayed out with no indication of why. Consider tracking which channel is currently under test:

```typescript
const [testingChannelId, setTestingChannelId] = useState<string | null>(null);

const testMutation = useMutation(
  trpc.alert.testChannel.mutationOptions({
    onSuccess: (result) => {
      setTestingChannelId(null);
      // ...
    },
    onError: (error) => {
      setTestingChannelId(null);
      // ...
    },
  }),
);

// In the row:
onClick={() => {
  setTestingChannelId(channel.id);
  testMutation.mutate({ id: channel.id });
}}
disabled={testingChannelId === channel.id}
```

How can I resolve this? If you propose a fix, please make it concise.

@TerrifiedBug TerrifiedBug merged commit a84f517 into main Mar 7, 2026
12 checks passed
@TerrifiedBug TerrifiedBug deleted the feat/notification-channels branch March 7, 2026 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant