Throughput forecasting UI and API by jamby77 · Pull Request #69 · BetterDB-inc/monitor

jamby77 · 2026-03-31T11:09:59Z

Summary

Add throughput forecasting APIs and shared types
Modularize forecasting components with reusable settings panel
Frontend UI for throughput forecasting with configurable settings

Test plan

Verify forecasting page loads and displays metrics
Verify settings panel saves rolling window, ceiling, and alert threshold
Run unit tests: pnpm test

🤖 Generated with Claude Code

Introduced new APIs for throughput forecasting, including endpoints for retrieving forecasts, accessing settings, and updating settings. Added corresponding types and a comprehensive test suite for the `ThroughputForecastingService`. Enhanced `WebhookForm` event options to support throughput limit alerts.

…mponent

Split throughput forecasting page into reusable components (`ForecastCard`, `ThroughputChart`, `SettingsPanel`, etc.) and utility functions to improve code organization and maintainability. Updated imports to use shared types from `@betterdb/shared`.

…hputForecasting page

apps/api/src/metric-forecasting/metric-forecasting.service.ts

apps/api/src/throughput-forecasting/throughput-forecasting.service.ts

apps/api/src/throughput-forecasting/throughput-forecasting.controller.ts

apps/api/src/prometheus/prometheus.service.ts

apps/web/src/components/pages/throughput-forecasting/ForecastCard.tsx

apps/web/src/App.tsx

Replaced custom polling hooks with `react-query` for data fetching, caching, and synchronization in throughput forecasting workflows. Adjusted API calls, updated components to use `react-query` hooks, and removed redundant `usePolling` implementation. Enhanced error handling and refetch mechanisms for improved user experience. Extract data-fetching logic from page components into dedicated hooks using @tanstack/react-query for caching, deduplication, and polling. Add Vitest test infrastructure and unit tests for all new hooks.

apps/api/src/metric-forecasting/metric-forecasting.service.ts

apps/api/src/throughput-forecasting/throughput-forecasting.controller.ts

apps/api/src/prometheus/prometheus.service.ts

proprietary/webhook-pro/webhook-events-pro.service.ts

apps/api/src/throughput-forecasting/throughput-forecasting.controller.ts

Generalize throughput forecasting into a MetricForecastingService parameterized by MetricKind. Add metric extractors, ceiling resolvers (with auto-detect for memory maxmemory), unified storage table, REST endpoints, Prometheus export, and a tabbed frontend page at /forecasting. Add tests for checkAlerts dispatch, ceiling-exceeded paths, falling/ stable trends for non-ops metrics, zero-slope behavior, connection isolation, data sufficiency boundaries, formatter edge cases, and validation pipe edge cases. Add `docker-compose.test.yml` for isolated test environments using dedicated containers and ports. Refactor metric forecasting to use actual values for fast spike detection. Enhance tier validation with `DEV_LICENSE_TIER` override for development. Adjust global test setup/teardown scripts for test containers. Aligns setting names with the generic metric forecasting feature, replacing the old throughput-specific naming. DB column names are unchanged to avoid requiring a migration. Improve logging in `checkAlerts` for uninitialized services. Add safe fallback values for hysteresis recovery during webhook dispatch. Refactor to use `WebhookEventType` for consistency. Update tests to validate dispatch behavior with recovery values. --------- Co-authored-by: Kristiyan Ivanov <k.ivanow@gmail.com>

apps/api/src/metric-forecasting/metric-forecasting.service.ts

apps/api/src/metric-forecasting/metric-extractors.ts

apps/api/src/metric-forecasting/metric-forecasting.module.ts

apps/api/src/metric-forecasting/metric-forecasting.controller.ts

apps/api/src/metric-forecasting/metric-forecasting.service.ts

apps/api/src/prometheus/prometheus.service.ts

apps/web/src/components/pages/metric-forecasting/MetricSettingsPanel.tsx

apps/api/src/storage/adapters/base-sql.adapter.ts

The storage-port interface re-exports it as 'export type', which cannot be used as a runtime value. Import directly from shared.

.env.example

apps/api/src/metric-forecasting/metric-forecasting.service.ts

apps/api/src/metric-forecasting/metric-forecasting.controller.ts

@min

…tional ceilings, clamp trend line - buildInsufficientForecast now receives actual sorted.length instead of hardcoding 0 - Change ceiling @min(1) to @min(0.01) so memFragmentation ceiling of 1.0 is valid - Clamp trend line projection to Math.max(0) to prevent negative values

jamby77 · 2026-04-01T06:52:54Z

Thanks for the review. Here's the status of each item:

HIGH — checkAlerts dispatches unconditionally: Intentional for hysteresis recovery. The dispatcher's shouldFireAlert handles rate-limiting. Guarded by !forecast.enabled, insufficientData, and ceiling === null checks. The downstream contract is documented in the test name.

HIGH — dataPointCount: 0 hardcoded: Fixed — buildInsufficientForecast now receives and uses the actual sorted.length.

HIGH — No auth guards: Verified — CloudAuthGuard is registered as APP_GUARD via CloudAuthModule, protecting all controllers globally.

MEDIUM — No explicit ValidationPipe on @Body: Global ValidationPipe with whitelist: true is configured in main.ts. Adding per-endpoint pipes would be redundant.

MEDIUM — @Min(1) blocks memFragmentation 1.0: Fixed — changed to @Min(0.01).

LOW — windowMs: 0 in disabled forecast: Cosmetic — the disabled state shows a different component (MetricDisabled) that doesn't render the window label.

LOW — Negative trend line: Fixed — clamped to Math.max(0, ...).

Minor — Column naming: Intentional to avoid DB migration. Documented in code comments.

apps/api/src/metric-forecasting/metric-forecasting.service.ts

apps/api/src/metric-forecasting/dto/update-metric-forecast-settings.dto.ts

apps/api/src/metric-forecasting/metric-forecasting.controller.ts

claude · 2026-04-01T07:02:37Z

Review: Throughput Forecasting UI and API

Good overall structure — the linear regression is numerically sound (x-normalisation avoids epoch cancellation), the storage adapters all use UPSERT correctly, and the test coverage is solid. Three issues need attention before merge; one is low-severity but tracked.

High

checkAlerts dispatches for stable/falling metrics every minute (metric-forecasting.service.ts:379)

After the forecast.ceiling === null guard the code falls through and calls dispatchMetricForecastLimit even when timeToLimitMs === null (stable/falling trend). It sends Number.MAX_SAFE_INTEGER as the sentinel. The webhook dispatcher correctly suppresses the alert, but the unnecessary call runs every 60 s for every active metric kind with a ceiling whose trend isn't rising. More importantly, if the dispatcher's threshold logic ever changes this silently becomes a spurious-alert bug.

Fix: add if (forecast.timeToLimitMs === null) continue; immediately after the ceiling guard. See inline comment.

Medium

TOCTOU race in getOrCreateSettings can silently overwrite user settings (metric-forecasting.service.ts:237)

The read-then-conditional-write is not atomic. If a concurrent updateSettings call runs between the getMetricForecastSettings read (which returns null) and the subsequent saveMetricForecastSettings write, the user's changes are overwritten with bare defaults. The adapters use UPSERT so data is never corrupted, but user-configured ceilings or thresholds can be silently lost. See inline comment for a suggested mitigation.

Hard-coded limit: 1500 silently truncates regression input (metric-forecasting.service.ts:95)

rollingWindowMs goes up to 86 400 000 ms (24 h). At a 30 s poll interval that's ~2880 snapshots — nearly 2× the cap. The query silently returns only the 1500 most-recent rows, so the regression runs on ~12.5 h of data while the UI reports a 24 h window. The slope is wrong and there's no indication anything was truncated. See inline comment.

No cross-field validation: alertThresholdMs can exceed rollingWindowMs (dto/update-metric-forecast-settings.dto.ts:24)

A 24 h alert threshold with a 1 h rolling window will never trigger because the regression can't project that far ahead. The combination is silently accepted and produces a perpetually-inactive alert. See inline comment.

Low

'env-default' hardcoded in the controller (metric-forecasting.controller.ts:21)

The registry service already has const ENV_DEFAULT_ID = 'env-default' with a TODO to export and share it. Three occurrences in the new controller should use that constant once it's exported. See inline comment.

- Prevent concurrent checkAlerts runs with a boolean guard - Export ENV_DEFAULT_ID from connection-registry and use it in the metric forecasting controller

apps/api/src/metric-forecasting/metric-forecasting.service.ts

apps/web/src/pages/MetricForecasting.tsx

apps/api/src/metric-forecasting/dto/update-metric-forecast-settings.dto.ts

apps/api/src/settings/settings.service.ts

claude · 2026-04-01T07:16:21Z

Review: Throughput Forecasting UI and API

Overall the feature is well-structured — the linear regression is numerically sound (normalised timestamps to avoid floating-point cancellation), the hysteresis path for webhook alerts goes through the existing dispatchThresholdAlert mechanism correctly, schema migrations are additive and idempotent in both adapters, and the test coverage across slices is solid.

Four issues need attention before merge.

High — Snapshot `limit: 1500` silently truncates the regression window

getMemorySnapshots({ limit: 1500 }) is applied regardless of rollingWindowMs. For a 24 h window with sub-minute polling this returns only the most recent fraction of the window. The windowMs field in the returned MetricForecast still reports the configured window, so the mismatch is invisible to callers and to the UI. The same limit is applied to the frontend chart query, so chart and forecast can diverge from each other.

Fix: increase the limit significantly (e.g. 10 000), or derive it from rollingWindowMs / minExpectedIntervalMs, and log a warning when the limit is hit.

Medium — `Number.MAX_SAFE_INTEGER` leaks into the webhook payload

When forecast.timeToLimitMs is null (stable / falling trend), the checkAlerts method substitutes Number.MAX_SAFE_INTEGER and still calls dispatchMetricForecastLimit. The dispatchThresholdAlert hysteresis will prevent the alert from firing, but the sentinel value (9007199254740991) is embedded verbatim in the payload that reaches the pro service and any downstream webhook consumer.

Fix: short-circuit before calling dispatchMetricForecastLimit when timeToLimitMs is null or already exceeds the threshold.

Medium — `updateSetting` not memoised; stale-closure risk on connection change

The function is recreated on every render. The debounce closure captures connectionId at creation time, so if the active connection changes while a save is debouncing, queryClient.invalidateQueries can target the wrong query key. Wrap in useCallback([connectionId, activeTab, queryClient]).

Low — Cross-field and env-var nits

alertThresholdMs can exceed rollingWindowMs via the API even though the UI prevents it — worth a class-level validator.
METRIC_FORECASTING_ENABLED silently stays false for '1', 'TRUE', etc. A toLowerCase() call or a note in .env.example would help.

What looks good:

Normalised-timestamp linear regression correctly avoids catastrophic cancellation with epoch-scale x values.
getOrCreateSettings upsert is safe under concurrent requests (DB-level ON CONFLICT handles races).
MetricKindValidationPipe properly rejects unknown path params before they reach storage.
Alert hysteresis correctly routes through dispatchThresholdAlert — no alert spam.
Both SQLite and Postgres migrations are IF NOT EXISTS / ADD COLUMN IF NOT EXISTS — safe for existing deployments.
The || slope <= 0 guard in the forecast path is necessary (handles the edge case where predictedStart === 0 and predictedEnd < 0 gives growthPercent = 100) — keep it.

…m/BetterDB-inc/monitor into feature/61-throughput-forecasting

claude · 2026-04-01T08:06:51Z

apps/api/src/metric-forecasting/dto/update-metric-forecast-settings.dto.ts

+  @IsNumber()
+  @Min(60_000)
+  @Max(86_400_000)
+  alertThresholdMs?: number;


Medium — Silent misconfiguration: alertThresholdMs can exceed rollingWindowMs

Both fields are validated independently, but there is no cross-field constraint. If a user sets rollingWindowMs = 3_600_000 (1 h) and alertThresholdMs = 86_400_000 (24 h), the forecast can project at most ~1 h of growth, so timeToLimitMs will essentially never drop below a 24 h threshold. The alert silently never fires.

Add a custom validator (or a simple @ValidateIf-based check) that enforces alertThresholdMs <= rollingWindowMs:

import { registerDecorator, ValidationOptions, ValidationArguments } from 'class-validator'; function IsLessThanOrEqual(property: string, opts?: ValidationOptions) { return (object: object, propertyName: string) => { registerDecorator({ name: 'isLessThanOrEqual', target: (object as any).constructor, propertyName, constraints: [property], options: opts, validator: { validate(value: any, args: ValidationArguments) { const other = (args.object as any)[args.constraints[0]]; return typeof value === 'number' && typeof other === 'number' ? value <= other : true; }, defaultMessage: (args) => `${args.property} must be ≤ ${args.constraints[0]}`, }, }); }; }

Then annotate alertThresholdMs with @IsLessThanOrEqual('rollingWindowMs'). Because both fields are optional in a PATCH, you also need to validate at the service layer (reading existing settings to resolve the final values) if only one of the two is updated.

claude · 2026-04-01T08:06:57Z

apps/api/src/metric-forecasting/metric-forecasting.service.ts

+    const growthPercent =
+      predictedStart !== 0
+        ? ((predictedEnd - predictedStart) / Math.abs(predictedStart)) * 100
+        : predictedEnd !== 0
+          ? 100 // growing from zero — treat as significant rise
+          : 0;


Medium — Misleading trendDirection when metric falls from near-zero

When predictedStart === 0 (regression line starts at zero) and predictedEnd < 0 (negative slope — metric falling away from zero), the ternary returns 100, making trendDirection = 'rising'. The slope <= 0 guard on line 187 correctly prevents a negative timeToLimitMs, so no incorrect alert fires, but the returned MetricForecast object has the contradictory state:

trendDirection: 'rising' timeToLimitHuman: 'Not projected to reach ceiling'

Any UI or downstream consumer that displays trendDirection directly will show an upward trend indicator for a metric that is actually falling.

Fix the ternary so a negative predictedEnd with zero predictedStart is treated as falling rather than rising:

const growthPercent = predictedStart !== 0 ? ((predictedEnd - predictedStart) / Math.abs(predictedStart)) * 100 : predictedEnd > 0 ? 100 // growing from zero — treat as significant rise : predictedEnd < 0 ? -100 // falling from zero — treat as significant fall : 0;

claude · 2026-04-01T08:07:02Z

apps/api/src/metric-forecasting/metric-forecasting.service.ts

+      for (const settings of activeSettings) {
+        try {
+          const forecast = await this.getForecast(settings.connectionId, settings.metricKind);
+          if (!forecast.enabled || forecast.insufficientData) continue;


Low — Unnecessary work for stable/falling metrics on every alert tick

When timeToLimitMs === null (metric is stable or falling), checkAlerts still calls dispatchMetricForecastLimit with Number.MAX_SAFE_INTEGER. shouldFireAlert will return false immediately because MAX_SAFE_INTEGER <= threshold is never true, but the call still incurs a getWebhooksByEvent DB query and an LRU cache lookup for every stable metric on every 60-second tick.

Consider adding an early-exit after the debug log:

if (forecast.timeToLimitMs === null) continue; // stable/falling — no alert needed

This makes the intent explicit and avoids the wasted DB query per stable metric per minute.

claude · 2026-04-01T08:07:44Z

Review Summary

Overall the implementation is solid: linear regression is properly normalized (avoiding the catastrophic cancellation that was fixed in an earlier commit), the hysteresis logic in shouldFireAlert is correct for both above and below threshold variants, SQL queries use parameterized statements throughout, and the re-entry guard on checkAlerts prevents concurrent runs. Good test coverage for the new service and utilities.

Four issues worth addressing before merge:

Medium — Missing cross-field validation: `alertThresholdMs` can exceed `rollingWindowMs`

File: apps/api/src/metric-forecasting/dto/update-metric-forecast-settings.dto.ts

Both fields are validated independently with @Min/@Max, but nothing prevents alertThresholdMs > rollingWindowMs. If a user sets a 1-hour rolling window but a 24-hour alert threshold, the regression can project at most ~1 h of growth, so timeToLimitMs will never fall below 24 h and the alert silently never fires.

Because both fields are optional in a PATCH, the constraint needs to be enforced at the service layer too (merging the update with existing stored settings before comparing). The DTO alone can't catch the case where only one of the two fields is updated.

Medium — `trendDirection: 'rising'` when metric is actually falling from near-zero

File: apps/api/src/metric-forecasting/metric-forecasting.service.ts lines 130–135

When predictedStart === 0 and predictedEnd < 0 (negative slope), growthPercent is set to 100 and trendDirection becomes 'rising'. The slope <= 0 guard on line 187 correctly prevents a negative timeToLimitMs, so no incorrect webhook fires, but the returned MetricForecast object has a contradictory state (trendDirection: 'rising' + timeToLimitHuman: 'Not projected to reach ceiling'). Any UI component or API consumer that displays the trend arrow will show "rising" for a falling metric.

Fix: treat predictedEnd < 0 as -100 (falling) in the zero-start branch:

: predictedEnd > 0
  ? 100   // growing from zero
  : predictedEnd < 0
    ? -100  // falling from zero
    : 0;

Low — Header values not sanitized for CRLF injection in `sanitizeHeaders`

File: apps/api/src/webhooks/webhook-dispatcher.service.ts (sanitizeHeaders method, unchanged in this PR)

The method blocks forbidden header names but does not strip \r\n from header values. A webhook configured with a header value like legit\r\nX-Injected: evil would inject an extra HTTP header on any HTTP/1.1 client that does not validate values. Node.js 18+ native fetch throws TypeError on CRLF in values, so the current runtime mitigates this, but the protection is implicit. Add an explicit strip as defense-in-depth:

sanitized[key] = value.replace(/[\r\n]/g, '');

Low — Unnecessary DB query per stable metric per `checkAlerts` tick

File: apps/api/src/metric-forecasting/metric-forecasting.service.ts line 384

When forecast.timeToLimitMs === null (stable or falling metric), the code calls dispatchMetricForecastLimit with Number.MAX_SAFE_INTEGER. shouldFireAlert returns false immediately, so no webhook fires, but the call still incurs a getWebhooksByEvent DB query and an LRU lookup for every stable metric on every 60-second tick. Adding if (forecast.timeToLimitMs === null) continue; before the dispatch call makes the intent explicit and avoids the unnecessary queries.

jamby77 added 5 commits March 27, 2026 11:09

Refactor throughput forecasting settings to use a reusable section co…

5d8dde4

…mponent

Update SettingsPanel heading and adjust component placement in Throug…

5cb015e

…hputForecasting page

Update SettingsPanel heading and adjust component placement in Throug…

6b2c15e

…hputForecasting page