Skip to content

Throughput forecasting UI and API#69

Merged
jamby77 merged 21 commits intomasterfrom
feature/61-throughput-forecasting
Apr 1, 2026
Merged

Throughput forecasting UI and API#69
jamby77 merged 21 commits intomasterfrom
feature/61-throughput-forecasting

Conversation

@jamby77
Copy link
Copy Markdown
Collaborator

@jamby77 jamby77 commented Mar 31, 2026

Summary

  • Add throughput forecasting APIs and shared types
  • Modularize forecasting components with reusable settings panel
  • Frontend UI for throughput forecasting with configurable settings

Test plan

  • Verify forecasting page loads and displays metrics
  • Verify settings panel saves rolling window, ceiling, and alert threshold
  • Run unit tests: pnpm test

🤖 Generated with Claude Code

jamby77 added 5 commits March 27, 2026 11:09
Introduced new APIs for throughput forecasting, including endpoints for retrieving forecasts, accessing settings, and updating settings. Added corresponding types and a comprehensive test suite for the `ThroughputForecastingService`. Enhanced `WebhookForm` event options to support throughput limit alerts.
Split throughput forecasting page into reusable components (`ForecastCard`, `ThroughputChart`, `SettingsPanel`, etc.) and utility functions to improve code organization and maintainability. Updated imports to use shared types from `@betterdb/shared`.
@claude

This comment was marked as outdated.

Replaced custom polling hooks with `react-query` for data fetching, caching, and synchronization in throughput forecasting workflows. Adjusted API calls, updated components to use `react-query` hooks, and removed redundant `usePolling` implementation. Enhanced error handling and refetch mechanisms for improved user experience.

Extract data-fetching logic from page components into dedicated hooks
using @tanstack/react-query for caching, deduplication, and polling.
Add Vitest test infrastructure and unit tests for all new hooks.
Generalize throughput forecasting into a MetricForecastingService
parameterized by MetricKind. Add metric extractors, ceiling resolvers
(with auto-detect for memory maxmemory), unified storage table, REST
endpoints, Prometheus export, and a tabbed frontend page at /forecasting.

Add tests for checkAlerts dispatch, ceiling-exceeded paths, falling/
stable trends for non-ops metrics, zero-slope behavior, connection
isolation, data sufficiency boundaries, formatter edge cases, and
validation pipe edge cases.

Add `docker-compose.test.yml` for isolated test environments using dedicated containers and ports. Refactor metric forecasting to use actual values for fast spike detection. Enhance tier validation with `DEV_LICENSE_TIER` override for development. Adjust global test setup/teardown scripts for test containers.

Aligns setting names with the generic metric forecasting feature,
replacing the old throughput-specific naming. DB column names are
unchanged to avoid requiring a migration.

Improve logging in `checkAlerts` for uninitialized services. Add safe fallback values for hysteresis recovery during webhook dispatch. Refactor to use `WebhookEventType` for consistency. Update tests to validate dispatch behavior with recovery values.

---------

Co-authored-by: Kristiyan Ivanov <k.ivanow@gmail.com>
@claude

This comment was marked as outdated.

The storage-port interface re-exports it as 'export type', which
cannot be used as a runtime value. Import directly from shared.
@claude

This comment was marked as outdated.

…tional ceilings, clamp trend line

- buildInsufficientForecast now receives actual sorted.length instead
  of hardcoding 0
- Change ceiling @min(1) to @min(0.01) so memFragmentation ceiling of
  1.0 is valid
- Clamp trend line projection to Math.max(0) to prevent negative values
@jamby77
Copy link
Copy Markdown
Collaborator Author

jamby77 commented Apr 1, 2026

Thanks for the review. Here's the status of each item:

HIGH — checkAlerts dispatches unconditionally: Intentional for hysteresis recovery. The dispatcher's shouldFireAlert handles rate-limiting. Guarded by !forecast.enabled, insufficientData, and ceiling === null checks. The downstream contract is documented in the test name.

HIGH — dataPointCount: 0 hardcoded: Fixed — buildInsufficientForecast now receives and uses the actual sorted.length.

HIGH — No auth guards: Verified — CloudAuthGuard is registered as APP_GUARD via CloudAuthModule, protecting all controllers globally.

MEDIUM — No explicit ValidationPipe on @Body: Global ValidationPipe with whitelist: true is configured in main.ts. Adding per-endpoint pipes would be redundant.

MEDIUM — @Min(1) blocks memFragmentation 1.0: Fixed — changed to @Min(0.01).

LOW — windowMs: 0 in disabled forecast: Cosmetic — the disabled state shows a different component (MetricDisabled) that doesn't render the window label.

LOW — Negative trend line: Fixed — clamped to Math.max(0, ...).

Minor — Column naming: Intentional to avoid DB migration. Documented in code comments.

@claude
Copy link
Copy Markdown

claude bot commented Apr 1, 2026

Review: Throughput Forecasting UI and API

Good overall structure — the linear regression is numerically sound (x-normalisation avoids epoch cancellation), the storage adapters all use UPSERT correctly, and the test coverage is solid. Three issues need attention before merge; one is low-severity but tracked.


High

checkAlerts dispatches for stable/falling metrics every minute (metric-forecasting.service.ts:379)

After the forecast.ceiling === null guard the code falls through and calls dispatchMetricForecastLimit even when timeToLimitMs === null (stable/falling trend). It sends Number.MAX_SAFE_INTEGER as the sentinel. The webhook dispatcher correctly suppresses the alert, but the unnecessary call runs every 60 s for every active metric kind with a ceiling whose trend isn't rising. More importantly, if the dispatcher's threshold logic ever changes this silently becomes a spurious-alert bug.

Fix: add if (forecast.timeToLimitMs === null) continue; immediately after the ceiling guard. See inline comment.


Medium

TOCTOU race in getOrCreateSettings can silently overwrite user settings (metric-forecasting.service.ts:237)

The read-then-conditional-write is not atomic. If a concurrent updateSettings call runs between the getMetricForecastSettings read (which returns null) and the subsequent saveMetricForecastSettings write, the user's changes are overwritten with bare defaults. The adapters use UPSERT so data is never corrupted, but user-configured ceilings or thresholds can be silently lost. See inline comment for a suggested mitigation.

Hard-coded limit: 1500 silently truncates regression input (metric-forecasting.service.ts:95)

rollingWindowMs goes up to 86 400 000 ms (24 h). At a 30 s poll interval that's ~2880 snapshots — nearly 2× the cap. The query silently returns only the 1500 most-recent rows, so the regression runs on ~12.5 h of data while the UI reports a 24 h window. The slope is wrong and there's no indication anything was truncated. See inline comment.

No cross-field validation: alertThresholdMs can exceed rollingWindowMs (dto/update-metric-forecast-settings.dto.ts:24)

A 24 h alert threshold with a 1 h rolling window will never trigger because the regression can't project that far ahead. The combination is silently accepted and produces a perpetually-inactive alert. See inline comment.


Low

'env-default' hardcoded in the controller (metric-forecasting.controller.ts:21)

The registry service already has const ENV_DEFAULT_ID = 'env-default' with a TODO to export and share it. Three occurrences in the new controller should use that constant once it's exported. See inline comment.

- Prevent concurrent checkAlerts runs with a boolean guard
- Export ENV_DEFAULT_ID from connection-registry and use it in the
  metric forecasting controller
@claude
Copy link
Copy Markdown

claude bot commented Apr 1, 2026

Review: Throughput Forecasting UI and API

Overall the feature is well-structured — the linear regression is numerically sound (normalised timestamps to avoid floating-point cancellation), the hysteresis path for webhook alerts goes through the existing dispatchThresholdAlert mechanism correctly, schema migrations are additive and idempotent in both adapters, and the test coverage across slices is solid.

Four issues need attention before merge.


High — Snapshot limit: 1500 silently truncates the regression window

getMemorySnapshots({ limit: 1500 }) is applied regardless of rollingWindowMs. For a 24 h window with sub-minute polling this returns only the most recent fraction of the window. The windowMs field in the returned MetricForecast still reports the configured window, so the mismatch is invisible to callers and to the UI. The same limit is applied to the frontend chart query, so chart and forecast can diverge from each other.

Fix: increase the limit significantly (e.g. 10 000), or derive it from rollingWindowMs / minExpectedIntervalMs, and log a warning when the limit is hit.


Medium — Number.MAX_SAFE_INTEGER leaks into the webhook payload

When forecast.timeToLimitMs is null (stable / falling trend), the checkAlerts method substitutes Number.MAX_SAFE_INTEGER and still calls dispatchMetricForecastLimit. The dispatchThresholdAlert hysteresis will prevent the alert from firing, but the sentinel value (9007199254740991) is embedded verbatim in the payload that reaches the pro service and any downstream webhook consumer.

Fix: short-circuit before calling dispatchMetricForecastLimit when timeToLimitMs is null or already exceeds the threshold.


Medium — updateSetting not memoised; stale-closure risk on connection change

The function is recreated on every render. The debounce closure captures connectionId at creation time, so if the active connection changes while a save is debouncing, queryClient.invalidateQueries can target the wrong query key. Wrap in useCallback([connectionId, activeTab, queryClient]).


Low — Cross-field and env-var nits

  • alertThresholdMs can exceed rollingWindowMs via the API even though the UI prevents it — worth a class-level validator.
  • METRIC_FORECASTING_ENABLED silently stays false for '1', 'TRUE', etc. A toLowerCase() call or a note in .env.example would help.

What looks good:

  • Normalised-timestamp linear regression correctly avoids catastrophic cancellation with epoch-scale x values.
  • getOrCreateSettings upsert is safe under concurrent requests (DB-level ON CONFLICT handles races).
  • MetricKindValidationPipe properly rejects unknown path params before they reach storage.
  • Alert hysteresis correctly routes through dispatchThresholdAlert — no alert spam.
  • Both SQLite and Postgres migrations are IF NOT EXISTS / ADD COLUMN IF NOT EXISTS — safe for existing deployments.
  • The || slope <= 0 guard in the forecast path is necessary (handles the edge case where predictedStart === 0 and predictedEnd < 0 gives growthPercent = 100) — keep it.

@IsNumber()
@Min(60_000)
@Max(86_400_000)
alertThresholdMs?: number;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium — Silent misconfiguration: alertThresholdMs can exceed rollingWindowMs

Both fields are validated independently, but there is no cross-field constraint. If a user sets rollingWindowMs = 3_600_000 (1 h) and alertThresholdMs = 86_400_000 (24 h), the forecast can project at most ~1 h of growth, so timeToLimitMs will essentially never drop below a 24 h threshold. The alert silently never fires.

Add a custom validator (or a simple @ValidateIf-based check) that enforces alertThresholdMs <= rollingWindowMs:

import { registerDecorator, ValidationOptions, ValidationArguments } from 'class-validator';

function IsLessThanOrEqual(property: string, opts?: ValidationOptions) {
  return (object: object, propertyName: string) => {
    registerDecorator({
      name: 'isLessThanOrEqual',
      target: (object as any).constructor,
      propertyName,
      constraints: [property],
      options: opts,
      validator: {
        validate(value: any, args: ValidationArguments) {
          const other = (args.object as any)[args.constraints[0]];
          return typeof value === 'number' && typeof other === 'number' ? value <= other : true;
        },
        defaultMessage: (args) => `${args.property} must be ≤ ${args.constraints[0]}`,
      },
    });
  };
}

Then annotate alertThresholdMs with @IsLessThanOrEqual('rollingWindowMs'). Because both fields are optional in a PATCH, you also need to validate at the service layer (reading existing settings to resolve the final values) if only one of the two is updated.

Comment on lines +130 to +135
const growthPercent =
predictedStart !== 0
? ((predictedEnd - predictedStart) / Math.abs(predictedStart)) * 100
: predictedEnd !== 0
? 100 // growing from zero — treat as significant rise
: 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium — Misleading trendDirection when metric falls from near-zero

When predictedStart === 0 (regression line starts at zero) and predictedEnd < 0 (negative slope — metric falling away from zero), the ternary returns 100, making trendDirection = 'rising'. The slope <= 0 guard on line 187 correctly prevents a negative timeToLimitMs, so no incorrect alert fires, but the returned MetricForecast object has the contradictory state:

trendDirection: 'rising'
timeToLimitHuman: 'Not projected to reach ceiling'

Any UI or downstream consumer that displays trendDirection directly will show an upward trend indicator for a metric that is actually falling.

Fix the ternary so a negative predictedEnd with zero predictedStart is treated as falling rather than rising:

const growthPercent =
  predictedStart !== 0
    ? ((predictedEnd - predictedStart) / Math.abs(predictedStart)) * 100
    : predictedEnd > 0
      ? 100  // growing from zero — treat as significant rise
      : predictedEnd < 0
        ? -100 // falling from zero — treat as significant fall
        : 0;

for (const settings of activeSettings) {
try {
const forecast = await this.getForecast(settings.connectionId, settings.metricKind);
if (!forecast.enabled || forecast.insufficientData) continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low — Unnecessary work for stable/falling metrics on every alert tick

When timeToLimitMs === null (metric is stable or falling), checkAlerts still calls dispatchMetricForecastLimit with Number.MAX_SAFE_INTEGER. shouldFireAlert will return false immediately because MAX_SAFE_INTEGER <= threshold is never true, but the call still incurs a getWebhooksByEvent DB query and an LRU cache lookup for every stable metric on every 60-second tick.

Consider adding an early-exit after the debug log:

if (forecast.timeToLimitMs === null) continue; // stable/falling — no alert needed

This makes the intent explicit and avoids the wasted DB query per stable metric per minute.

@claude
Copy link
Copy Markdown

claude bot commented Apr 1, 2026

Review Summary

Overall the implementation is solid: linear regression is properly normalized (avoiding the catastrophic cancellation that was fixed in an earlier commit), the hysteresis logic in shouldFireAlert is correct for both above and below threshold variants, SQL queries use parameterized statements throughout, and the re-entry guard on checkAlerts prevents concurrent runs. Good test coverage for the new service and utilities.

Four issues worth addressing before merge:


Medium — Missing cross-field validation: alertThresholdMs can exceed rollingWindowMs

File: apps/api/src/metric-forecasting/dto/update-metric-forecast-settings.dto.ts

Both fields are validated independently with @Min/@Max, but nothing prevents alertThresholdMs > rollingWindowMs. If a user sets a 1-hour rolling window but a 24-hour alert threshold, the regression can project at most ~1 h of growth, so timeToLimitMs will never fall below 24 h and the alert silently never fires.

Because both fields are optional in a PATCH, the constraint needs to be enforced at the service layer too (merging the update with existing stored settings before comparing). The DTO alone can't catch the case where only one of the two fields is updated.


Medium — trendDirection: 'rising' when metric is actually falling from near-zero

File: apps/api/src/metric-forecasting/metric-forecasting.service.ts lines 130–135

When predictedStart === 0 and predictedEnd < 0 (negative slope), growthPercent is set to 100 and trendDirection becomes 'rising'. The slope <= 0 guard on line 187 correctly prevents a negative timeToLimitMs, so no incorrect webhook fires, but the returned MetricForecast object has a contradictory state (trendDirection: 'rising' + timeToLimitHuman: 'Not projected to reach ceiling'). Any UI component or API consumer that displays the trend arrow will show "rising" for a falling metric.

Fix: treat predictedEnd < 0 as -100 (falling) in the zero-start branch:

: predictedEnd > 0
  ? 100   // growing from zero
  : predictedEnd < 0
    ? -100  // falling from zero
    : 0;

Low — Header values not sanitized for CRLF injection in sanitizeHeaders

File: apps/api/src/webhooks/webhook-dispatcher.service.ts (sanitizeHeaders method, unchanged in this PR)

The method blocks forbidden header names but does not strip \r\n from header values. A webhook configured with a header value like legit\r\nX-Injected: evil would inject an extra HTTP header on any HTTP/1.1 client that does not validate values. Node.js 18+ native fetch throws TypeError on CRLF in values, so the current runtime mitigates this, but the protection is implicit. Add an explicit strip as defense-in-depth:

sanitized[key] = value.replace(/[\r\n]/g, '');

Low — Unnecessary DB query per stable metric per checkAlerts tick

File: apps/api/src/metric-forecasting/metric-forecasting.service.ts line 384

When forecast.timeToLimitMs === null (stable or falling metric), the code calls dispatchMetricForecastLimit with Number.MAX_SAFE_INTEGER. shouldFireAlert returns false immediately, so no webhook fires, but the call still incurs a getWebhooksByEvent DB query and an LRU lookup for every stable metric on every 60-second tick. Adding if (forecast.timeToLimitMs === null) continue; before the dispatch call makes the intent explicit and avoids the unnecessary queries.

@jamby77 jamby77 merged commit a90238f into master Apr 1, 2026
6 checks passed
@jamby77 jamby77 deleted the feature/61-throughput-forecasting branch April 1, 2026 08:25
@github-actions github-actions bot locked and limited conversation to collaborators Apr 1, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants