M002: Real-Time Monitoring — SSE push, batch writes, live dashboards#109
Merged
TerrifiedBug merged 19 commits intomainfrom Mar 24, 2026
Merged
M002: Real-Time Monitoring — SSE push, batch writes, live dashboards#109TerrifiedBug merged 19 commits intomainfrom
TerrifiedBug merged 19 commits intomainfrom
Conversation
Contributor
Greptile SummaryThis PR transforms VectorFlow's monitoring layer from poll-based (15–30 s stale) to push-based (5 s live) by adding a full SSE pipeline: a single authenticated What's in good shape:
Minor residual from previous threads:
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Agent as Go Agent
participant HB as /api/agent/heartbeat
participant MetricStore as MetricStore
participant SSEReg as SSERegistry
participant SSERoute as /api/sse
participant Browser as Browser (useSSE)
Browser->>SSERoute: GET /api/sse (authenticated)
SSERoute->>SSERoute: Resolve environmentIds
SSERoute->>SSEReg: register(connId, controller, userId, envIds)
SSERoute-->>Browser: text/event-stream (: connected)
loop Every 5 seconds
Agent->>HB: POST /api/agent/heartbeat {metrics, pipelineStatuses}
HB->>HB: batchUpsertPipelineStatuses()
HB->>HB: ingestMetrics() → accumulateRow() delta
HB->>MetricStore: recordTotals() + flush(nodeId, pipelineId)
MetricStore-->>HB: MetricUpdateEvent[]
HB->>SSEReg: broadcast(metric_update, envId)
HB->>SSEReg: broadcast(fleet_status HEALTHY, envId)
HB->>SSEReg: broadcast(status_change if pipeline transitioned, envId)
HB->>SSEReg: broadcast(log_entry if recentLogs, envId)
SSEReg->>Browser: SSE event stream (filtered by envIds)
end
Browser->>Browser: useSSE dispatches to subscribers
Browser->>Browser: useRealtimeInvalidation → debounced React Query invalidation
Browser->>Browser: useSSEToasts → sonner toast on CRASHED/DEPLOYED
Browser->>Browser: useFlowMetrics → updateNodeMetrics in flow store
Browser->>Browser: useStreamingLogs → parsed log buffer
note over Browser: On SSE disconnect → usePollingInterval returns 30s floor
Reviews (2): Last reviewed commit: "fix: accumulate per-minute metric deltas..." | Re-trigger Greptile |
Owner
Author
|
@greptile review |
- agent/internal/config/config.go - src/server/services/metric-store.ts
- src/server/services/heartbeat-batch.ts - src/app/api/agent/heartbeat/route.ts - src/server/services/__tests__/heartbeat-batch.test.ts
- src/server/services/metrics-ingest.ts - src/app/api/agent/heartbeat/route.ts - src/server/services/__tests__/metrics-ingest.test.ts
- src/server/services/__tests__/heartbeat-batch.test.ts - src/server/services/__tests__/metrics-ingest.test.ts - src/server/services/metric-store.ts
- src/lib/sse/types.ts - src/server/services/metric-store.ts
- src/server/services/sse-registry.ts - src/app/api/sse/route.ts - src/app/api/agent/heartbeat/route.ts
- src/stores/sse-store.ts - src/hooks/use-sse.ts
- src/server/services/__tests__/metric-store.test.ts - src/server/services/__tests__/sse-registry.test.ts
- src/hooks/use-polling-interval.ts - src/hooks/use-realtime-invalidation.ts - src/app/(dashboard)/layout.tsx - src/hooks/__tests__/use-polling-interval.test.ts - src/hooks/__tests__/use-realtime-invalidation.test.ts
- src/app/(dashboard)/page.tsx - src/app/(dashboard)/analytics/page.tsx - src/app/(dashboard)/fleet/[nodeId]/page.tsx - src/components/fleet/deployment-matrix.tsx - src/components/fleet/uptime-cards.tsx - src/components/fleet/node-metrics-charts.tsx - src/components/fleet/event-log.tsx - src/components/fleet/status-timeline.tsx
- src/hooks/use-sse.ts - src/hooks/__tests__/use-sse.test.ts
- src/hooks/use-flow-metrics.ts - src/hooks/__tests__/use-flow-metrics.test.ts
- src/app/(dashboard)/pipelines/[id]/page.tsx
- src/lib/sse/types.ts - src/app/api/agent/heartbeat/route.ts - src/server/routers/deploy.ts - src/hooks/use-realtime-invalidation.ts - src/lib/log-utils.ts - src/lib/__tests__/log-utils.test.ts - src/hooks/__tests__/use-realtime-invalidation.test.ts
- src/hooks/use-streaming-logs.ts - src/hooks/__tests__/use-streaming-logs.test.ts - src/components/pipeline/pipeline-logs.tsx - src/components/fleet/node-logs.tsx
- src/hooks/use-sse-toasts.ts - src/hooks/__tests__/use-sse-toasts.test.ts - src/app/(dashboard)/layout.tsx
- src/server/services/__tests__/sse-integration.test.ts - src/hooks/__tests__/sse-lifecycle.test.ts
…ranch, fix deploy fromStatus Fixes three issues flagged by Greptile code review: 1. metrics-ingest.ts: The batch rewrite (S01) replaced findFirst+increment with deleteMany+createMany, which discarded accumulated deltas within the same minute. At 5s heartbeats, only the last delta survived instead of all ~12 per minute — causing ~12x undercount in historical analytics. Fix: read existing rows first, add deltas in-memory (accumulateRow), then delete+create with accumulated totals. 5 new tests. 2. use-sse-toasts.ts: The status_change OFFLINE branch was dead code — the heartbeat handler only emits status_change for recovery (HEALTHY) and pipeline transitions (always has pipelineId). A node going offline means no heartbeats, so no status_change is emitted. Removed the dead branch. The fleet_status OFFLINE branch is retained with a comment noting it needs a future server-side watchdog to emit OFFLINE events. 3. deploy.ts: Replaced hardcoded fromStatus 'PENDING' with empty string since the deploy action doesn't know the previous pipeline status. Added descriptive reason strings distinguishing direct deploy from deploy-request approval.
b782148 to
b453d34
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Transforms VectorFlow monitoring from poll-based (15–30s stale) to push-based (5s live). Agent heartbeats speed up to 5s, all hot-path DB writes are batched, and a single authenticated SSE endpoint streams metrics, fleet status, logs, and status changes to every dashboard page, the pipeline editor, and log viewers. Toast notifications surface critical state transitions. Graceful fallback to 30s polling on SSE disconnect.
73 files changed, 4702 insertions, 256 tests across 20 files.
What Changed
S01: Agent Heartbeat Speedup & Batch Writes
batchUpsertPipelineStatuses()— singleINSERT...ON CONFLICTraw SQL for any pipeline countingestMetrics()rewritten with$transaction(deleteMany + createMany)S02: SSE Infrastructure & MetricStore Pub/Sub
src/lib/sse/types.ts— 4 typed SSE events + discriminated unionsrc/server/services/metric-store.ts— subscribe/unsubscribe/flush pub/subsrc/server/services/sse-registry.ts— connection registry with environment-scoped permission filteringsrc/app/api/sse/route.ts— authenticated SSE endpoint with ReadableStream + 30s keepalivesrc/hooks/use-sse.ts— EventSource lifecycle with exponential backoff reconnect (1s→30s)src/stores/sse-store.ts— Zustand connection status storeS03: Dashboard & Fleet Pages Wired to SSE
usePollingInterval— suppresses polling when SSE connected, 30s floor on disconnectuseRealtimeInvalidation— maps SSE events → React Query key invalidation (500ms debounce)refetchIntervalto SSE-aware pollingS04: Pipeline Editor Live Overlays
useFlowMetrics(pipelineId)— bridges SSE metric_update events to flow storederiveMetrics— maps source/transform/sink node kinds to correct rate fieldsS05: Streaming Logs & Toast Notifications
useStreamingLogs— SSE log_entry subscription with parse/dedup/200-entry bufferparseLogLine— handles JSON (Vector msg/message convention) and plain-text formatsuseSSEToasts— CRASHED→error, DEPLOYED→success, OFFLINE→warning with 30s cooldown dedupS06: Integration Verification
New Files (25)
src/app/api/sse/route.tssrc/lib/sse/types.ts,src/lib/log-utils.tssrc/server/services/heartbeat-batch.ts,src/server/services/sse-registry.tssrc/hooks/use-sse.ts,use-polling-interval.ts,use-realtime-invalidation.ts,use-flow-metrics.ts,use-streaming-logs.ts,use-sse-toasts.tssrc/stores/sse-store.tsRequirements Validated
R013 (5s heartbeats), R014 (MetricStore pub/sub), R015 (SSE endpoint), R016 (live dashboard), R017 (editor overlays), R018 (streaming logs), R019 (toast notifications), R020 (polling fallback), R021 (batch DB writes), R022 (SSE permission scoping)
Verification
tsc --noEmit✅eslint src/✅pnpm test— 256/256 pass ✅