Skip to content

feat: M017 Enterprise Scale — 5 specs, 154 files, 12K lines#117

Open
TerrifiedBug wants to merge 57 commits intomainfrom
feat/m017-enterprise-scale
Open

feat: M017 Enterprise Scale — 5 specs, 154 files, 12K lines#117
TerrifiedBug wants to merge 57 commits intomainfrom
feat/m017-enterprise-scale

Conversation

@TerrifiedBug
Copy link
Copy Markdown
Owner

Summary

Comprehensive enterprise scalability milestone making VectorFlow production-ready for corporate teams managing 100+ pipelines across multi-node fleets.

1. Scale UX

  • Server-side cursor pagination for pipeline list with URL-synced filters
  • Fleet matrix redesigned: summary cards + filtered matrix (replaces 100-column grid)
  • Saved filter presets (FilterPreset model with CRUD, auto-apply defaults)
  • Pipeline table: density toggle, keyboard navigation, column sorting

2. API v1 Completeness

  • 25 new REST endpoints (pipeline CRUD, graph manipulation, YAML import, fleet, metrics, logs, promotions, deploy requests, node groups, environments)
  • Rate limiting middleware (sliding window, 3 tiers, per-account override)
  • 7 new service account permissions
  • OpenAPI spec v2.0.0

3. GitOps Enhancement

  • GitProvider abstraction: GitHub + GitLab + Bitbucket support
  • Git sync retry with exponential backoff (30s, 2m, 10m) and error visibility
  • Bidirectional sync approval gate (respects requireDeployApproval)
  • Pipeline name/filename decoupling via gitPath field
  • Sync status dashboard in environment settings

4. Monitoring & Compliance

  • Version drift detection (fleet-wide, 30s polling)
  • Config drift detection (per-node, checksum comparison)
  • Go agent reports config checksum in heartbeat (backward-compatible)
  • Alert rule templates for drift detection
  • Compliance page: drift counts, overall compliance score, matrix drift indicators

5. Production Polish

  • Log viewer: virtual scrolling, highlight-all search, server-side search, time-range filter, export, copy
  • Pipeline editor: minimap, canvas search with node highlighting, Dagre auto-layout, collapsible detail panel
  • Alert delivery: manual retry, aggregate failed deliveries view
  • Settings: overview landing page, back navigation context preservation
  • Dashboard: tab visibility optimization (pause polling when hidden), MetricStore LRU memory cap (5000 streams), React Query tuning

Stats

  • 154 files changed (+11,992 / -735)
  • 188 new tests across all specs
  • 57 commits (5 parallel worktree branches merged)

Test plan

  • All existing tests pass (1 pre-existing leader-guard timing failure)
  • Pipeline list pagination works with 100+ pipelines
  • Fleet matrix summary cards render correctly
  • REST API endpoints respond with correct auth/permissions
  • GitOps webhook works for GitHub, GitLab, Bitbucket
  • Drift alerts fire on version/config mismatch
  • Log viewer virtual scrolling handles 1000+ entries
  • Editor minimap and canvas search work on large pipelines
  • MetricStore evicts at 5000 stream cap
  • Tab visibility pauses/resumes polling correctly

Add gitProvider to Environment, gitPath to Pipeline, GitSyncJob model
for retry queue, and git_sync_failed alert metric. Includes backfill
script for existing pipelines with active git sync.
Add configChecksum nullable field to NodePipelineStatus model for
tracking agent-reported config checksums. Add version_drift and
config_drift values to AlertMetric enum for drift detection alerting.
Extract GitHub-specific logic (webhook HMAC verification, event parsing,
file fetching, PR creation) into a GitHubProvider class behind a common
GitProvider interface. Registry auto-detects provider from repo URL.
Phase 0 of API v1 completeness:
- Add rateLimit field to ServiceAccount model
- Implement sliding window rate limiter with read/default/deploy tiers
- Integrate rate limiting into apiRoute() wrapper
- Add VALID_PERMISSIONS constant with new permission strings
Add configChecksum to the PipelineStatusInput interface and batch
upsert SQL. Update heartbeat route Zod schema to accept optional
configChecksum string from agents.
GitLab REST API v4 for webhook verification (X-Gitlab-Token), push/MR
event parsing, file fetching, branch creation, commits, and merge
request creation. Supports self-hosted instances and nested groups.
Add ListPipelinesOptions to listPipelinesForEnvironment with cursor,
limit, search, status, tags, groupId, sortBy, sortOrder parameters.
Returns { pipelines, nextCursor, totalCount } using Prisma cursor
pagination (same pattern as audit.list). Backward-compatible — all
existing call sites updated to destructure new return shape.
Bitbucket Cloud REST API 2.0 for HMAC-SHA256 webhook verification,
push/PR event parsing, file fetching, branch creation, commits via
multipart form, and pull request creation. Includes diffstat helper
for supplementing push events that lack file-level changes.
Add ConfigChecksum field to PipelineStatus struct and ProcessInfo.
Store checksum when agent applies pipeline config (start/restart).
Include checksum in every heartbeat payload for drift detection.
Replace single query with useInfiniteQuery for cursor pagination.
Add usePipelineListFilters hook for URL-synced filter state.
Add Load More button at bottom of pipeline list table.
Replace hardcoded GitHub HMAC verification and API calls with the
GitProvider interface. Webhook handler now auto-resolves the correct
provider per environment, normalizes events, and handles Bitbucket
push events (which lack file-level changes) via diffstat fallback.
Adds bidirectional import approval gate when requireDeployApproval
is enabled. YAML import errors are logged to audit trail.
New Prisma model stores serialized filter state per environment and
scope (pipeline_list or fleet_matrix). Supports one default preset
per scope that auto-applies on page load.
- POST /api/v1/pipelines - create pipeline
- PUT /api/v1/pipelines/{id} - update pipeline metadata
- DELETE /api/v1/pipelines/{id} - delete draft pipeline
- GET /api/v1/pipelines/{id}/config - get generated YAML config
- POST /api/v1/pipelines/{id}/nodes - add node to pipeline graph
- PUT/DELETE /api/v1/pipelines/{id}/nodes/{nodeId} - update/remove node
- POST /api/v1/pipelines/{id}/edges - add edge
- DELETE /api/v1/pipelines/{id}/edges/{edgeId} - remove edge
- POST /api/v1/pipelines/import - import from YAML
Replace direct Octokit usage in createPromotionPR with the provider
interface. Now supports GitHub, GitLab, and Bitbucket for PR-based
pipeline promotion. Uses gitPath when available for stable file paths.
Create drift-metrics service with getVersionDrift (fleet-wide) and
getConfigDrift (per-node) functions. Register version_drift as a
fleet metric in alert-evaluator. Add version_drift evaluation to
FleetAlertService. Populate expected checksum cache from config
endpoint.
New filterPreset router supports list, create, update, delete,
setDefault, and clearDefault. Enforces 20 preset limit per
environment+scope. Scoped to environment and shared across team.
Add config_drift case to readMetricValue in alert-evaluator.ts. When
config_drift is evaluated, it calls getConfigDrift to compare the
agent-reported checksum against the server's expected checksum.
Failed git sync operations are queued as GitSyncJob records and
retried at 30s, 2m, 10m intervals by a leader-only singleton
service. After max retries, the job is marked failed and a
git_sync_failed event alert is fired. Deploy-agent now creates
retry jobs automatically when git sync fails.
- POST /api/v1/nodes - register node manually
- DELETE /api/v1/nodes/{id} - remove node
- PUT /api/v1/nodes/{id}/labels - update node labels
- GET /api/v1/nodes/{id}/metrics - node metrics
- GET /api/v1/pipelines/{id}/metrics - pipeline metrics
- GET /api/v1/pipelines/{id}/logs - cursor-paginated pipeline logs
- GET /api/v1/pipelines/{id}/health - pipeline health/SLI status
- GET /api/v1/fleet/overview - fleet-wide summary
Test that version_drift fires when drift is detected and resolves
when drift drops to zero. Mocks getVersionDrift from drift-metrics.
git-sync functions now accept an optional gitPath parameter. On first
successful sync, the derived path is persisted to pipeline.gitPath.
Subsequent renames update the pipeline name but preserve the stable
git file path, preventing sync breakage.
Add version_drift to FLEET_METRIC_VALUES and GLOBAL_METRICS. Add
drift metric labels to METRIC_LABELS for UI display.
Add two new alert templates to the template picker: Version Drift
(fires immediately when any pipeline version differs from latest)
and Config Drift (fires after 60s when config checksum mismatches).
FilterPresetBar renders saved presets as quick-access chips.
SaveFilterDialog captures preset name with optional default flag.
Integrated into pipeline list toolbar and fleet matrix toolbar.
- POST /api/v1/pipelines/{id}/promote - promote pipeline to target env
- GET /api/v1/deploy-requests - list deploy requests
- POST /api/v1/deploy-requests/{id}/approve - approve deploy request
- POST /api/v1/deploy-requests/{id}/reject - reject deploy request
- GET/POST /api/v1/node-groups - list/create node groups
- GET /api/v1/environments - list environments in team
New router exposes sync status summary, recent job list, retry-all,
retry-single, and import error queries. Mounted as gitSync on the
app router for frontend consumption.
Update groupHealthStats to compute versionDriftCount, configDriftCount,
and overallCompliance per node group. Display version drift count and
overall compliance score in NodeGroupHealthCard alongside existing
label compliance and alert counts.
Returns per-node pipeline count, error count, version drift count,
and health status. Used by the redesigned fleet matrix top section.
New GitSyncStatus component shows sync health badge, last successful
sync timestamp, pending/failed job counts, error details, recent jobs
table with per-job retry, and YAML import errors from audit log.
Renders below the Git Integration settings on the environment page.
Show an AlertTriangle icon next to outdated pipeline version badges
in the deployment matrix to visually indicate version drift.
Add versionDriftCount and configDriftCount to FleetOverview interface
and getFleetOverview function. Add a fifth KPI card showing drift
counts alongside existing bytes, events, and fleet health cards.
Add git provider selector (auto-detect, GitHub, GitLab, Bitbucket)
to the environment Git Integration settings. Webhook setup help text
updated to be provider-agnostic. Provider preference is persisted to
the Environment.gitProvider field.
Comfortable mode shows descriptions and normal padding. Compact mode
reduces row height and hides descriptions for scanning large lists.
Preference persisted in localStorage.
Add integration test confirming that configChecksum from the agent
heartbeat payload is passed through to batchUpsertPipelineStatuses.
Tests cover URL-based auto-detection for GitHub, GitLab, and Bitbucket,
explicit gitProvider override, null/invalid inputs, and self-hosted
instance edge cases.
…lity

Document GitLab and Bitbucket support, provider auto-detection,
explicit provider override for self-hosted instances, sync status
dashboard, retry mechanism, and pipeline name/filename decoupling.
Arrow Up/Down moves row focus, Enter navigates to pipeline editor.
Focused row auto-scrolls into view.
The groupHealthStats query now includes pipeline status and pipeline
version lookups for drift detection. Add beforeEach mocks to return
empty arrays for these new queries in existing tests.
When pipeline list or fleet page loads with no active URL filters,
the default preset (if set) is automatically applied.
…gs, perf

Log viewer improvements:
- Highlight all search matches with match count display
- Server-side log search for terms >= 3 chars
- Time-range filter (15m/1h/6h/1d/7d/All) with server-side since param
- Virtual scrolling via @tanstack/react-virtual for large log sets
- Log export (download as .log file) and copy support (per-line + all visible)

Pipeline editor enhancements:
- MiniMap with color-coded nodes (source/transform/sink)
- Canvas node search with match cycling and visual highlighting
- Dagre auto-layout (all nodes or selected subset) with undo support
- Collapsible detail panel with localStorage persistence

Alert delivery improvements:
- Manual retry mutation for failed delivery attempts
- Aggregate failed deliveries section on alerts page grouped by channel
- Retry-all-for-channel bulk operation

Settings UX:
- Settings overview landing page replacing auto-redirect to /settings/version
- Back navigation uses router.back() with "/" fallback

Dashboard performance:
- useDocumentVisibility hook pauses polling when tab is hidden
- SSE event buffering when tab is hidden, flush on visibility restore
- MetricStore LRU eviction with configurable maxKeys (default 5000)
- Prometheus gauges for MetricStore stream count and memory usage
- React Query staleTime tuned to 30s (from 5s) with explicit gcTime
Verify matrix summary computation completes within 500ms budget
with 2000 pipeline-node combinations.
Update pr-merge tests to work with provider-based signature
verification (ping events now require environment lookup first).
Add git-sync-retry service mock to leader-guard test. Note: the
leader-guard failover test was already failing before these changes
due to a pre-existing fake timer issue with async imports.
…handler

Remove unused loop variable in Bitbucket push event parsing and fix
dynamic import pattern for BitbucketProvider in webhook handler.
…type SSE events

- Pass pipeline.gitPath to gitSyncCommitPipeline and gitSyncDeletePipeline
  in the retry service so retried syncs write to the correct file path
- Add GitSyncStatusEvent to the SSE event type union and remove the
  unsafe `as never` type assertions from broadcastSSE calls
- Add SSRF consideration comment to GitLab provider apiBase method
- Update retry test to expect the new gitPath parameter
…eanup, and consistent userEmail

- Add writeAuditLog calls to POST /nodes, DELETE /nodes/:id, and PUT /nodes/:id/labels
- Validate `since` query parameter in pipeline and node metrics endpoints to prevent unhandled 500 errors
- Wire up rateLimiter.cleanup() on a 120s interval to prevent memory leaks from stale sliding windows
- Add userEmail: null to all writeAuditLog calls in API v1 endpoints for consistency
- Use trimmed pipeline name in POST /pipelines audit metadata to match stored value
… proper alert retry tests

- Move useVirtualizer call above the useEffect that references it in
  pipeline-logs.tsx to fix a ReferenceError at runtime (React hooks
  must be called before any code that references their return value)
- Add visual search feedback to source-node, transform-node, and
  sink-node: matching nodes get ring-2 ring-yellow-400, non-matching
  nodes get opacity-40 when a canvas search is active
- Rewrite alert-retry-delivery tests to actually call the tRPC
  procedure through a caller, covering NOT_FOUND, BAD_REQUEST,
  webhook retry, channel retry, and missing target scenarios
C1/C2: Pipeline list page now uses usePipelineListFilters hook for
URL-synced filter state and passes all filter parameters (search,
status, tags, groupId, sortBy, sortOrder) to the server via
useInfiniteQuery. Removed redundant client-side filtering that
duplicated server work — only client-side sort for status and
throughput (not available server-side) is retained.

I1: Fleet deployment matrix conditionally renders based on filter
state — shows a dashed-border prompt when no filters are active,
renders the matrix only after the user applies a filter.

I2: Added "Show exceptions only" toggle to DeploymentMatrixToolbar
that filters the matrix to pipelines with version mismatch, crashed
status, or missing deployment on some nodes.

I3: Added withAudit middleware to setDefault and clearDefault
mutations in filter-preset router for audit trail compliance.

I4: Wrapped setDefault clear+set operations in prisma.$transaction
to prevent race conditions from concurrent default-setting. Updated
test to mock the transaction callback.
Adds cursor pagination to pipeline.list(), fleet matrix redesign with summary
cards + filtered matrix, FilterPreset model with CRUD, density toggle,
keyboard navigation, and exceptions-only filter.
…sions

Adds pipeline CRUD, graph manipulation, YAML import, fleet management, metrics,
logs, health, promotions, deploy requests, node groups, environments. Includes
rate limiting middleware and OpenAPI v2.0.0 spec.
…atus

Adds GitProvider abstraction (GitHub, GitLab, Bitbucket), git sync retry with
exponential backoff, bidirectional sync approval gate, pipeline gitPath
decoupling, and sync status dashboard.
…board perf

Adds virtual scrolling and search improvements to log viewer, minimap + canvas
search + auto-layout to pipeline editor, manual retry and aggregate view for
alert deliveries, settings overview page, tab visibility optimization, and
MetricStore memory cap with LRU eviction.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 28, 2026

Too many files changed for review. (151 files found, 100 file limit)

@github-actions github-actions bot added feature documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file agent and removed feature labels Mar 28, 2026
- pipelines/page.tsx: replace manual useInfiniteQuery with tRPC v11
  infiniteQueryOptions pattern; add Ref type to StaggerItemProps to
  allow ref callback on polymorphic tr elements
- settings-overview.tsx: cast session user to include isSuperAdmin and
  role fields that are not yet in the NextAuth session type
- node-groups/route.ts, pipelines/[id]/nodes/route.ts,
  filter-preset.ts: cast Record<string,unknown> to Prisma.InputJsonValue
  for JSON column assignments
- import-pipeline.test.ts: use (fn: unknown) mock pattern for
  $transaction to match Prisma's typed callback signature
- git-sync-status.tsx: wrap unknown commitRef guard in Boolean() to
  produce ReactNode-compatible conditional expression
- alert-evaluator.ts: add git_sync_failed to METRIC_LABELS record
  (enum value added to Prisma schema but record was not updated)

Also regenerated Prisma client (src/generated/) to include git_sync_failed
and other enum values present in schema.prisma but missing from generated
client, and ran pnpm install to restore @tanstack/react-virtual and
@testing-library/react which were missing from node_modules.
@TerrifiedBug
Copy link
Copy Markdown
Owner Author

@codex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant