fix: use idle-based CPU formula in dashboard charts and alert evaluator by TerrifiedBug · Pull Request #54 · TerrifiedBug/vectorflow

TerrifiedBug · 2026-03-07T18:56:48Z

Summary

The fleet detail page correctly uses (total - idle) / total for CPU%, but three other code paths still used the old cpuDelta / wallClockSeconds formula — producing values like 787% on multi-core machines
Aligns all CPU calculations to the same idle-based formula clamped to 0-100%

Affected code paths

alert-evaluator.ts getCpuUsage() — alert threshold checks (the 787.80% firing alert)
dashboard.ts chartMetrics — dashboard CPU chart time-series
dashboard.ts nodeCards sparkline — was passing raw cumulative cpuSecondsTotal, now computes percentage

Test plan

Dashboard CPU chart shows realistic values (~43% matching top output), not pinned at 100%
Node cards sparkline reflects actual CPU usage
CPU alert rules fire at correct thresholds (e.g., >50% fires at real 50%, not 400%)

GET /api/v1/pipelines/:id and /api/v1/nodes/:id return 500 because NodePipelineStatus contains BigInt fields (eventsIn, eventsOut, etc.) that JSON.stringify cannot serialize. Adds a jsonResponse() helper using a custom replacer to convert BigInts to numbers.

The fleet detail page correctly uses (total - idle) / total for CPU%, but the dashboard charts and alert evaluator still used the old cpuDelta / wallClockSeconds formula which gives per-core percentages (e.g. 787% on an 8-core machine). This aligns all CPU calculations to the same idle-based formula clamped to 0-100%. Affected code paths: - dashboard.chartMetrics: chart CPU time-series - dashboard.nodeCards: sparkline CPU values - alert-evaluator getCpuUsage: alert threshold checks

greptile-apps · 2026-03-07T19:01:17Z

Greptile Summary

This PR aligns CPU usage calculations across three previously-inconsistent code paths — alert-evaluator.ts and two sections of dashboard.ts — to use the idle-based formula (total - idle) / total, matching the fleet detail page that was already correct. It also adds a jsonResponse BigInt-safe helper and wires it into the /api/v1/nodes/[id] and /api/v1/pipelines/[id] routes, where NextResponse.json() would throw on BigInt fields from Prisma.

Changes:

alert-evaluator.ts: getCpuUsage() now selects cpuSecondsIdle instead of timestamp (wall-clock delta is no longer needed), computes totalDelta/idleDelta, guards against totalDelta ≤ 0, and clamps the result to 0–100%.
dashboard.ts nodeCards: Sparkline was incorrectly passing raw cumulative cpuSecondsTotal (could be millions of seconds); now correctly computes a percentage delta between consecutive rows.
dashboard.ts chartMetrics: Replaces cpuDelta / dtSec (yields >100% on multi-core hosts) with the idle-based formula; dtSec ≤ 0 guard is preserved as a sanity check on data ordering.
api-handler.ts: Introduces jsonResponse() with a JSON.stringify replacer that converts BigInt → Number; the any type is explicitly suppressed with an ESLint comment.
Route files: Both public API routes now use jsonResponse() so that BigInt metric fields (eventsIn, eventsOut, errorsTotal) are serialized correctly instead of throwing at runtime.

No correctness bugs were identified. The formulas are mathematically correct, counter-reset edge cases are properly guarded, and the Prisma schema confirms cpuSecondsTotal/cpuSecondsIdle are Float (TypeScript number), so no BigInt coercion is needed for those fields.

Confidence Score: 5/5

This PR is safe to merge — the formula changes are mathematically correct, all edge cases (counter resets, division by zero) are properly guarded, and the BigInt serialization fix prevents runtime errors in the public API routes.
All three CPU formula changes consistently apply the same idle-based calculation that the fleet detail page already used. The Prisma schema confirms cpuSecondsIdle is a Float with a default of 0, so no type coercion issues exist. The Math.max/Math.min clamp correctly handles any pathological input (negative idleDelta from an unlikely isolated counter reset). The jsonResponse helper is a straightforward fix for BigInt serialization that was previously crashing the API routes silently.
No files require special attention.

Important Files Changed

Filename	Overview
src/server/services/alert-evaluator.ts	Switches getCpuUsage() from wall-clock division to idle-based formula; removes unused timestamp field from select; correctly guards against totalDelta ≤ 0; clamps result to 0-100%.
src/server/routers/dashboard.ts	Adds cpuSecondsIdle to both the nodeCards sparkline query and chartMetrics query; applies idle-based formula correctly in both places; first sparkline point is always cpu=0 by design (acceptable); no BigInt issues since cpuSeconds fields are Float.
src/app/api/v1/_lib/api-handler.ts	Adds jsonResponse() helper that converts BigInt values to Number before JSON serialization; uses any type with explicit ESLint disable; Number(bigInt) could lose precision for values > 2^53 but is acceptable for metrics in practice.
src/app/api/v1/nodes/[id]/route.ts	Switches final response from NextResponse.json() to jsonResponse() to handle any BigInt fields in the node payload; no functional logic changes.
src/app/api/v1/pipelines/[id]/route.ts	Switches final response to jsonResponse(); needed because nodeStatuses includes eventsIn, eventsOut, errorsTotal BigInt fields that JSON.stringify would throw on.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Agent Heartbeat] --> B[metrics-ingest: Store NodeMetric with cpuSecondsTotal and cpuSecondsIdle]

    B --> C[alert-evaluator getCpuUsage]
    B --> D[dashboard nodeCards sparkline]
    B --> E[dashboard chartMetrics CPU series]

    C --> F["totalDelta = newer.total - older.total\nif totalDelta le 0 return null\ncpu% = clamp totalDelta - idleDelta / totalDelta x 100"]
    D --> G["per-row delta between arr i and arr i-1\nif totalDelta > 0 compute cpu%\nelse cpu = 0"]
    E --> H["per-row delta between curr and prev\nif cpuTotalDelta > 0 compute cpu%\nelse cpu = 0"]

    F --> I[AlertEvent fire or resolve]
    G --> J[Node card sparkline chart]
    H --> K[Dashboard CPU time-series chart]

    L[Public REST routes nodes and pipelines] --> M[jsonResponse: BigInt to Number replacer]

_{Last reviewed commit: e18b43d}

TerrifiedBug added 2 commits March 7, 2026 18:39

github-actions bot added the fix label Mar 7, 2026

TerrifiedBug merged commit de45e24 into main Mar 7, 2026
12 checks passed

TerrifiedBug deleted the fix/api-v1-bigint-serialization branch March 7, 2026 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use idle-based CPU formula in dashboard charts and alert evaluator#54

fix: use idle-based CPU formula in dashboard charts and alert evaluator#54
TerrifiedBug merged 2 commits intomainfrom
fix/api-v1-bigint-serialization

TerrifiedBug commented Mar 7, 2026

Uh oh!

greptile-apps bot commented Mar 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TerrifiedBug commented Mar 7, 2026

Summary

Affected code paths

Test plan

Uh oh!

greptile-apps bot commented Mar 7, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant