feat: per-node maintenance mode by TerrifiedBug · Pull Request #26 · TerrifiedBug/vectorflow

TerrifiedBug · 2026-03-07T11:23:56Z

Summary

Add per-node maintenance mode that stops all pipelines on a specific agent without affecting other nodes in the environment
New maintenanceMode boolean and maintenanceModeAt timestamp on VectorNode, with Prisma migration
Server-side only: agent config endpoint returns empty pipelines: [] when node is in maintenance (agent stops pipelines naturally, zero agent changes)
New setMaintenanceMode fleet router mutation (ADMIN-only, audit logged)
Fleet table: orange "Maintenance" status badge + toggle button with confirmation dialog
Agent detail page: toggle button with running pipeline count in confirmation, prominent orange banner
Deployment matrix: dimmed columns with "Maintenance" label for nodes in maintenance

Test Plan

Apply migration, verify maintenanceMode and maintenanceModeAt columns exist on VectorNode
Toggle maintenance mode on a node from the fleet table — verify confirmation dialog appears
Confirm the node shows orange "Maintenance" badge in status column
Check agent detail page shows the orange maintenance banner
Verify deployment matrix dims the maintenance node's column
With an agent connected: enter maintenance mode and verify the agent stops its pipelines on next poll
Exit maintenance mode and verify the agent restarts pipelines on next poll
Verify audit log records the maintenance toggle events
Confirm self-update (pendingAction) still works while in maintenance mode

… matrix on toggle

greptile-apps · 2026-03-07T11:28:45Z

Greptile Summary

This PR implements per-node maintenance mode for VectorFlow's fleet management. When enabled, the agent config endpoint returns pipelines: [], causing the connected agent to drain and stop all its pipelines on the next poll — with zero changes required to the agent binary. The feature is gated behind ADMIN-only RBAC, fully audit-logged, and surfaces in the fleet table, node detail page, and deployment matrix.

Key changes:

Schema/migration: Two new columns on VectorNode — maintenanceMode BOOLEAN NOT NULL DEFAULT false and maintenanceModeAt TIMESTAMP(3)?
setMaintenanceMode tRPC mutation: Correctly guarded by withTeamAccess("ADMIN") and withAudit; team context is properly resolved from nodeId via the existing middleware path
Agent config route: Early return with pipelines: [] for maintenance nodes; pendingAction is preserved so self-updates still work during maintenance — but secretBackendConfig is omitted for non-BUILTIN backends (see inline comment)
Fleet table: Per-node isPending guard correctly scoped to the in-flight nodeId; orange badge; the three required query invalidations are all present on both the fleet page and the node detail page mutations

Confidence Score: 4/5

Safe to merge — one logic gap around secretBackendConfig for non-BUILTIN backends, but harmless for BUILTIN (the common case) and the overall feature is correct and well-guarded.
Authorization is correctly enforced via withTeamAccess("ADMIN") with proper nodeId resolution. Audit logging is present. The previously reported bugs (shared isPending state, missing query invalidations) are both fixed. The only new issue is the omission of secretBackendConfig in the maintenance-mode early return for non-BUILTIN secret backends — this is a correctness gap but has no impact unless the environment is configured with Vault/AWS SM and the agent uses that field to maintain a live connection to the backend.
src/app/api/agent/config/route.ts — the maintenance mode early return should include secretBackendConfig for non-BUILTIN backends to match the normal response shape.

Important Files Changed

Filename	Overview
prisma/migrations/20260307000000_add_node_maintenance_mode/migration.sql	Adds `maintenanceMode BOOLEAN NOT NULL DEFAULT false` and nullable `maintenanceModeAt TIMESTAMP(3)` to `VectorNode`. Correct and safe — default false means existing rows are unaffected.
prisma/schema.prisma	Adds `maintenanceMode Boolean @default(false)` and `maintenanceModeAt DateTime?` to the `VectorNode` model. Matches the migration exactly.
src/server/routers/fleet.ts	Adds `setMaintenanceMode` mutation with correct `withTeamAccess("ADMIN")` + `withAudit` middleware, Zod-validated input, proper NOT_FOUND guard, and sets `maintenanceModeAt` to `null` on exit. Authorization is fully correct — `withTeamAccess` resolves team from `nodeId` at line 242-250 of `init.ts`.
src/app/api/agent/config/route.ts	Adds maintenance mode early return that serves `pipelines: []` to halt the agent. One issue: the early return omits `secretBackendConfig` that the normal path includes for non-BUILTIN backends, which could cause agents to lose external secret-backend initialization data during maintenance.
src/app/(dashboard)/fleet/[nodeId]/page.tsx	Adds maintenance toggle button and orange banner. `maintenanceMutation.onSuccess` now correctly invalidates all three relevant queries (`fleet.get`, `fleet.list`, `listWithPipelineStatus`) — addressing the previously reported gap.
src/app/(dashboard)/fleet/page.tsx	Fleet table gets orange "Maintenance" badge and toggle button. The per-node pending-state guard (`setMaintenance.isPending && setMaintenance.variables?.nodeId === node.id`) correctly scopes the disabled state to the in-flight node only, resolving the previously reported shared-state bug.
src/components/fleet/deployment-matrix.tsx	Dims maintenance-mode columns with `opacity-30` and adds an orange "Maintenance" label under node header. Correct and self-contained change.

Sequence Diagram

sequenceDiagram
    participant Admin as Admin Browser
    participant tRPC as tRPC (fleet.setMaintenanceMode)
    participant DB as PostgreSQL
    participant Agent as Vector Agent

    Admin->>tRPC: setMaintenanceMode({ nodeId, enabled: true })
    tRPC->>DB: withTeamAccess resolves nodeId → teamId
    tRPC->>DB: UPDATE VectorNode SET maintenanceMode=true, maintenanceModeAt=now()
    DB-->>tRPC: updated node
    tRPC-->>Admin: success → invalidate fleet.list, listWithPipelineStatus, fleet.get

    Note over Agent: next config poll (≤15s)
    Agent->>DB: GET /api/agent/config (bearer token)
    DB-->>Agent: { pipelines: [], pollIntervalMs, secretBackend, pendingAction }
    Note over Agent: stops all running pipelines naturally

    Admin->>tRPC: setMaintenanceMode({ nodeId, enabled: false })
    tRPC->>DB: UPDATE VectorNode SET maintenanceMode=false, maintenanceModeAt=null
    DB-->>tRPC: updated node
    tRPC-->>Admin: success

    Note over Agent: next config poll
    Agent->>DB: GET /api/agent/config
    DB-->>Agent: { pipelines: [...full config...], ... }
    Note over Agent: resumes all pipelines

_{Last reviewed commit: b23ff14}

src/app/(dashboard)/fleet/page.tsx

src/app/(dashboard)/fleet/[nodeId]/page.tsx

…nce toggle

greptile-apps · 2026-03-07T11:42:31Z

src/app/api/agent/config/route.ts

+    if (node?.maintenanceMode) {
+      const environment = await prisma.environment.findUnique({
+        where: { id: agent.environmentId },
+        select: { secretBackend: true },
+      });
+      const settings = await prisma.systemSettings.findUnique({
+        where: { id: "singleton" },
+        select: { fleetPollIntervalMs: true },
+      });
+      return NextResponse.json({
+        pipelines: [],
+        pollIntervalMs: settings?.fleetPollIntervalMs ?? 15_000,
+        secretBackend: environment?.secretBackend ?? "BUILTIN",
+        pendingAction: node.pendingAction ?? undefined,
+      });
+    }


secretBackendConfig omitted for non-BUILTIN backends in maintenance response

When the node is in maintenance mode, the early return fetches the environment with only { secretBackend: true } and never includes secretBackendConfig in the response. The normal path includes it for all non-BUILTIN backends:

// normal path ...(environment.secretBackend !== "BUILTIN" ? { secretBackendConfig: environment.secretBackendConfig } : {}),

If an environment uses Vault, AWS SM, or another external backend and the agent relies on receiving secretBackendConfig to maintain or re-initialize its connection to that backend on each config poll, it would lose that initialization data for the duration of maintenance mode. When maintenance ends, the first poll would restore the full config, but if the agent's secret-backend client has any transient state derived from that field it could fail to reconnect cleanly.

The fix is to select secretBackendConfig alongside secretBackend in the maintenance-mode environment query, and then conditionally include it in the early-return payload to match the normal path:

const environment = await prisma.environment.findUnique({ where: { id: agent.environmentId }, select: { secretBackend: true, secretBackendConfig: true }, }); // ... return NextResponse.json({ pipelines: [], pollIntervalMs: settings?.fleetPollIntervalMs ?? 15_000, secretBackend: environment?.secretBackend ?? "BUILTIN", ...(environment?.secretBackend !== "BUILTIN" && environment?.secretBackendConfig ? { secretBackendConfig: environment.secretBackendConfig } : {}), pendingAction: node.pendingAction ?? undefined, });

Prompt To Fix With AI

This is a comment left during a code review. Path: src/app/api/agent/config/route.ts Line: 22-37 Comment: **`secretBackendConfig` omitted for non-BUILTIN backends in maintenance response** When the node is in maintenance mode, the early return fetches the environment with only `{ secretBackend: true }` and never includes `secretBackendConfig` in the response. The normal path includes it for all non-BUILTIN backends: ```ts // normal path ...(environment.secretBackend !== "BUILTIN" ? { secretBackendConfig: environment.secretBackendConfig } : {}), ``` If an environment uses Vault, AWS SM, or another external backend and the agent relies on receiving `secretBackendConfig` to maintain or re-initialize its connection to that backend on each config poll, it would lose that initialization data for the duration of maintenance mode. When maintenance ends, the first poll would restore the full config, but if the agent's secret-backend client has any transient state derived from that field it could fail to reconnect cleanly. The fix is to select `secretBackendConfig` alongside `secretBackend` in the maintenance-mode environment query, and then conditionally include it in the early-return payload to match the normal path: ```ts const environment = await prisma.environment.findUnique({ where: { id: agent.environmentId }, select: { secretBackend: true, secretBackendConfig: true }, }); // ... return NextResponse.json({ pipelines: [], pollIntervalMs: settings?.fleetPollIntervalMs ?? 15_000, secretBackend: environment?.secretBackend ?? "BUILTIN", ...(environment?.secretBackend !== "BUILTIN" && environment?.secretBackendConfig ? { secretBackendConfig: environment.secretBackendConfig } : {}), pendingAction: node.pendingAction ?? undefined, }); ``` How can I resolve this? If you propose a fix, please make it concise.

TerrifiedBug added 8 commits March 7, 2026 11:08

feat: add maintenanceMode fields to VectorNode schema

6b7786b

feat: add setMaintenanceMode mutation to fleet router

96ebb6f

feat: return empty pipeline list for nodes in maintenance mode

69a355a

feat: dim deployment matrix columns for nodes in maintenance mode

f1ab2dc

feat: add maintenance mode toggle to fleet table

bf6c29a

feat: add maintenance mode toggle and banner to agent detail page

244bf50

fix: pass nodeId to queryKey in maintenance mutation invalidation

62fecca

fix: review cleanup — drop unused select field, invalidate deployment…

c81f685

… matrix on toggle

github-actions bot added the feature label Mar 7, 2026

greptile-apps bot reviewed Mar 7, 2026

View reviewed changes

src/app/(dashboard)/fleet/page.tsx Show resolved Hide resolved

fix: scope maintenance button disabled state to targeted node

75836ef

greptile-apps bot reviewed Mar 7, 2026

View reviewed changes

src/app/(dashboard)/fleet/[nodeId]/page.tsx Show resolved Hide resolved

fix: invalidate fleet list and matrix queries on detail page maintena…

b23ff14

…nce toggle

greptile-apps bot reviewed Mar 7, 2026

View reviewed changes

TerrifiedBug merged commit c0b2229 into main Mar 7, 2026
10 checks passed

TerrifiedBug deleted the feat/node-maintenance-mode branch March 7, 2026 11:52

TerrifiedBug mentioned this pull request Mar 7, 2026

docs: update public docs for recent features #27

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: per-node maintenance mode#26

feat: per-node maintenance mode#26
TerrifiedBug merged 10 commits intomainfrom
feat/node-maintenance-mode

TerrifiedBug commented Mar 7, 2026

Uh oh!

greptile-apps bot commented Mar 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot Mar 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TerrifiedBug commented Mar 7, 2026

Summary

Test Plan

Uh oh!

greptile-apps bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Mar 7, 2026 •

edited

Loading