feat(M009): Fleet-Wide Observability — KPI dashboard, throughput comparison, data loss detection#110
feat(M009): Fleet-Wide Observability — KPI dashboard, throughput comparison, data loss detection#110TerrifiedBug wants to merge 5 commits intomainfrom
Conversation
- src/server/services/fleet-data.ts - src/server/services/__tests__/fleet-data.test.ts
- src/server/routers/fleet.ts - src/hooks/use-realtime-invalidation.ts - src/hooks/__tests__/use-realtime-invalidation.test.ts
…d chart, time range selector - Fleet overview page with 15s auto-refresh polling and time range selector - 4 KPI cards: bytes in/out, events in/out, fleet health %, node count - Volume trend AreaChart with bytesIn/bytesOut series using Recharts - Added "Fleet Overview" nav link to fleet page Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…arts Add node-level throughput bar chart and per-node capacity utilization trend charts (memory%, disk%, CPU load) to fleet overview page. Includes getNodeThroughput/getNodeCapacity service functions, tRPC procedures with team access control, SSE invalidation, bottleneck highlighting, and 8-color node palette. 6 new tests (505 total). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…verlays Add data loss detection with configurable threshold (default 5%) showing pipelines where events-out/events-in gap is suspicious. Enhance deployment matrix with per-cell throughput rates and red highlighting on cells with data loss. New tRPC endpoints: fleet.dataLoss, fleet.matrixThroughput. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR introduces the M009 Fleet-Wide Observability milestone: a new Confidence Score: 4/5Safe to merge after fixing the one-line formatPercent call in data-loss-table.tsx; all backend code is correct and secure. Strong PR overall — secure backend, proper RBAC, parameterized SQL, good test coverage. One concrete P1 display bug in data-loss-table where loss rates will render 100× too small, making the feature misleading in production. A single one-line fix resolves it. src/components/fleet/data-loss-table.tsx — formatPercent call needs * 100 scaling. Important Files Changed
Sequence DiagramsequenceDiagram
participant Page as FleetOverviewPage
participant tRPC as tRPC (fleet router)
participant MW as withTeamAccess("VIEWER")
participant Svc as fleet-data.ts
participant DB as PostgreSQL
Page->>tRPC: fleet.overview(environmentId, range)
Page->>tRPC: fleet.volumeTrend(environmentId, range)
Page->>tRPC: fleet.nodeThroughput(environmentId, range)
Page->>tRPC: fleet.nodeCapacity(environmentId, range)
Page->>tRPC: fleet.dataLoss(environmentId, range, threshold)
Page->>tRPC: fleet.matrixThroughput(environmentId, range)
tRPC->>MW: validate team membership
MW-->>tRPC: authorized
tRPC->>Svc: getFleetOverview / getVolumeTrend / etc.
Svc->>DB: Prisma.$queryRaw (date_trunc aggregation)
DB-->>Svc: BigInt rows
Svc-->>tRPC: typed results (BigInt to Number)
tRPC-->>Page: serialized JSON
Page->>Page: render KPI cards, charts, data-loss table, matrix
|
Summary
/fleet/overviewpage with 4 KPI cards (bytes in/out, events in/out, fleet health), volume trend AreaChart with time range selector (1h/6h/1d/7d/30d), backed by SQL-leveldate_truncaggregation infleet-data.tsservice. 19 unit tests.Key decisions
Prisma.$queryRawwithdate_trunc(not JS-side bucketing)/fleet/overviewas sibling page to/fleetlist with tab navigationStats
Test plan
/fleet/overviewrenders KPI cards and volume trend chartvitest run src/server/services/__tests__/fleet-data.test.ts— 19/19 passtsc --noEmit— 0 errors🤖 Generated with Claude Code