feat: M017 Enterprise Scale — 5 specs, 154 files, 12K lines#117
Open
TerrifiedBug wants to merge 57 commits intomainfrom
Open
feat: M017 Enterprise Scale — 5 specs, 154 files, 12K lines#117TerrifiedBug wants to merge 57 commits intomainfrom
TerrifiedBug wants to merge 57 commits intomainfrom
Conversation
Add gitProvider to Environment, gitPath to Pipeline, GitSyncJob model for retry queue, and git_sync_failed alert metric. Includes backfill script for existing pipelines with active git sync.
Add configChecksum nullable field to NodePipelineStatus model for tracking agent-reported config checksums. Add version_drift and config_drift values to AlertMetric enum for drift detection alerting.
Extract GitHub-specific logic (webhook HMAC verification, event parsing, file fetching, PR creation) into a GitHubProvider class behind a common GitProvider interface. Registry auto-detects provider from repo URL.
Phase 0 of API v1 completeness: - Add rateLimit field to ServiceAccount model - Implement sliding window rate limiter with read/default/deploy tiers - Integrate rate limiting into apiRoute() wrapper - Add VALID_PERMISSIONS constant with new permission strings
Add configChecksum to the PipelineStatusInput interface and batch upsert SQL. Update heartbeat route Zod schema to accept optional configChecksum string from agents.
GitLab REST API v4 for webhook verification (X-Gitlab-Token), push/MR event parsing, file fetching, branch creation, commits, and merge request creation. Supports self-hosted instances and nested groups.
Add ListPipelinesOptions to listPipelinesForEnvironment with cursor,
limit, search, status, tags, groupId, sortBy, sortOrder parameters.
Returns { pipelines, nextCursor, totalCount } using Prisma cursor
pagination (same pattern as audit.list). Backward-compatible — all
existing call sites updated to destructure new return shape.
Bitbucket Cloud REST API 2.0 for HMAC-SHA256 webhook verification, push/PR event parsing, file fetching, branch creation, commits via multipart form, and pull request creation. Includes diffstat helper for supplementing push events that lack file-level changes.
Add ConfigChecksum field to PipelineStatus struct and ProcessInfo. Store checksum when agent applies pipeline config (start/restart). Include checksum in every heartbeat payload for drift detection.
Replace single query with useInfiniteQuery for cursor pagination. Add usePipelineListFilters hook for URL-synced filter state. Add Load More button at bottom of pipeline list table.
Replace hardcoded GitHub HMAC verification and API calls with the GitProvider interface. Webhook handler now auto-resolves the correct provider per environment, normalizes events, and handles Bitbucket push events (which lack file-level changes) via diffstat fallback. Adds bidirectional import approval gate when requireDeployApproval is enabled. YAML import errors are logged to audit trail.
New Prisma model stores serialized filter state per environment and scope (pipeline_list or fleet_matrix). Supports one default preset per scope that auto-applies on page load.
- POST /api/v1/pipelines - create pipeline
- PUT /api/v1/pipelines/{id} - update pipeline metadata
- DELETE /api/v1/pipelines/{id} - delete draft pipeline
- GET /api/v1/pipelines/{id}/config - get generated YAML config
- POST /api/v1/pipelines/{id}/nodes - add node to pipeline graph
- PUT/DELETE /api/v1/pipelines/{id}/nodes/{nodeId} - update/remove node
- POST /api/v1/pipelines/{id}/edges - add edge
- DELETE /api/v1/pipelines/{id}/edges/{edgeId} - remove edge
- POST /api/v1/pipelines/import - import from YAML
Replace direct Octokit usage in createPromotionPR with the provider interface. Now supports GitHub, GitLab, and Bitbucket for PR-based pipeline promotion. Uses gitPath when available for stable file paths.
Create drift-metrics service with getVersionDrift (fleet-wide) and getConfigDrift (per-node) functions. Register version_drift as a fleet metric in alert-evaluator. Add version_drift evaluation to FleetAlertService. Populate expected checksum cache from config endpoint.
New filterPreset router supports list, create, update, delete, setDefault, and clearDefault. Enforces 20 preset limit per environment+scope. Scoped to environment and shared across team.
Add config_drift case to readMetricValue in alert-evaluator.ts. When config_drift is evaluated, it calls getConfigDrift to compare the agent-reported checksum against the server's expected checksum.
Failed git sync operations are queued as GitSyncJob records and retried at 30s, 2m, 10m intervals by a leader-only singleton service. After max retries, the job is marked failed and a git_sync_failed event alert is fired. Deploy-agent now creates retry jobs automatically when git sync fails.
- POST /api/v1/nodes - register node manually
- DELETE /api/v1/nodes/{id} - remove node
- PUT /api/v1/nodes/{id}/labels - update node labels
- GET /api/v1/nodes/{id}/metrics - node metrics
- GET /api/v1/pipelines/{id}/metrics - pipeline metrics
- GET /api/v1/pipelines/{id}/logs - cursor-paginated pipeline logs
- GET /api/v1/pipelines/{id}/health - pipeline health/SLI status
- GET /api/v1/fleet/overview - fleet-wide summary
Test that version_drift fires when drift is detected and resolves when drift drops to zero. Mocks getVersionDrift from drift-metrics.
git-sync functions now accept an optional gitPath parameter. On first successful sync, the derived path is persisted to pipeline.gitPath. Subsequent renames update the pipeline name but preserve the stable git file path, preventing sync breakage.
Add version_drift to FLEET_METRIC_VALUES and GLOBAL_METRICS. Add drift metric labels to METRIC_LABELS for UI display.
Add two new alert templates to the template picker: Version Drift (fires immediately when any pipeline version differs from latest) and Config Drift (fires after 60s when config checksum mismatches).
FilterPresetBar renders saved presets as quick-access chips. SaveFilterDialog captures preset name with optional default flag. Integrated into pipeline list toolbar and fleet matrix toolbar.
- POST /api/v1/pipelines/{id}/promote - promote pipeline to target env
- GET /api/v1/deploy-requests - list deploy requests
- POST /api/v1/deploy-requests/{id}/approve - approve deploy request
- POST /api/v1/deploy-requests/{id}/reject - reject deploy request
- GET/POST /api/v1/node-groups - list/create node groups
- GET /api/v1/environments - list environments in team
New router exposes sync status summary, recent job list, retry-all, retry-single, and import error queries. Mounted as gitSync on the app router for frontend consumption.
Update groupHealthStats to compute versionDriftCount, configDriftCount, and overallCompliance per node group. Display version drift count and overall compliance score in NodeGroupHealthCard alongside existing label compliance and alert counts.
Returns per-node pipeline count, error count, version drift count, and health status. Used by the redesigned fleet matrix top section.
New GitSyncStatus component shows sync health badge, last successful sync timestamp, pending/failed job counts, error details, recent jobs table with per-job retry, and YAML import errors from audit log. Renders below the Git Integration settings on the environment page.
Show an AlertTriangle icon next to outdated pipeline version badges in the deployment matrix to visually indicate version drift.
Add versionDriftCount and configDriftCount to FleetOverview interface and getFleetOverview function. Add a fifth KPI card showing drift counts alongside existing bytes, events, and fleet health cards.
Add git provider selector (auto-detect, GitHub, GitLab, Bitbucket) to the environment Git Integration settings. Webhook setup help text updated to be provider-agnostic. Provider preference is persisted to the Environment.gitProvider field.
Comfortable mode shows descriptions and normal padding. Compact mode reduces row height and hides descriptions for scanning large lists. Preference persisted in localStorage.
Add integration test confirming that configChecksum from the agent heartbeat payload is passed through to batchUpsertPipelineStatuses.
Tests cover URL-based auto-detection for GitHub, GitLab, and Bitbucket, explicit gitProvider override, null/invalid inputs, and self-hosted instance edge cases.
…lity Document GitLab and Bitbucket support, provider auto-detection, explicit provider override for self-hosted instances, sync status dashboard, retry mechanism, and pipeline name/filename decoupling.
Arrow Up/Down moves row focus, Enter navigates to pipeline editor. Focused row auto-scrolls into view.
The groupHealthStats query now includes pipeline status and pipeline version lookups for drift detection. Add beforeEach mocks to return empty arrays for these new queries in existing tests.
When pipeline list or fleet page loads with no active URL filters, the default preset (if set) is automatically applied.
…gs, perf Log viewer improvements: - Highlight all search matches with match count display - Server-side log search for terms >= 3 chars - Time-range filter (15m/1h/6h/1d/7d/All) with server-side since param - Virtual scrolling via @tanstack/react-virtual for large log sets - Log export (download as .log file) and copy support (per-line + all visible) Pipeline editor enhancements: - MiniMap with color-coded nodes (source/transform/sink) - Canvas node search with match cycling and visual highlighting - Dagre auto-layout (all nodes or selected subset) with undo support - Collapsible detail panel with localStorage persistence Alert delivery improvements: - Manual retry mutation for failed delivery attempts - Aggregate failed deliveries section on alerts page grouped by channel - Retry-all-for-channel bulk operation Settings UX: - Settings overview landing page replacing auto-redirect to /settings/version - Back navigation uses router.back() with "/" fallback Dashboard performance: - useDocumentVisibility hook pauses polling when tab is hidden - SSE event buffering when tab is hidden, flush on visibility restore - MetricStore LRU eviction with configurable maxKeys (default 5000) - Prometheus gauges for MetricStore stream count and memory usage - React Query staleTime tuned to 30s (from 5s) with explicit gcTime
Verify matrix summary computation completes within 500ms budget with 2000 pipeline-node combinations.
Update pr-merge tests to work with provider-based signature verification (ping events now require environment lookup first). Add git-sync-retry service mock to leader-guard test. Note: the leader-guard failover test was already failing before these changes due to a pre-existing fake timer issue with async imports.
…handler Remove unused loop variable in Bitbucket push event parsing and fix dynamic import pattern for BitbucketProvider in webhook handler.
…type SSE events - Pass pipeline.gitPath to gitSyncCommitPipeline and gitSyncDeletePipeline in the retry service so retried syncs write to the correct file path - Add GitSyncStatusEvent to the SSE event type union and remove the unsafe `as never` type assertions from broadcastSSE calls - Add SSRF consideration comment to GitLab provider apiBase method - Update retry test to expect the new gitPath parameter
…eanup, and consistent userEmail - Add writeAuditLog calls to POST /nodes, DELETE /nodes/:id, and PUT /nodes/:id/labels - Validate `since` query parameter in pipeline and node metrics endpoints to prevent unhandled 500 errors - Wire up rateLimiter.cleanup() on a 120s interval to prevent memory leaks from stale sliding windows - Add userEmail: null to all writeAuditLog calls in API v1 endpoints for consistency - Use trimmed pipeline name in POST /pipelines audit metadata to match stored value
… proper alert retry tests - Move useVirtualizer call above the useEffect that references it in pipeline-logs.tsx to fix a ReferenceError at runtime (React hooks must be called before any code that references their return value) - Add visual search feedback to source-node, transform-node, and sink-node: matching nodes get ring-2 ring-yellow-400, non-matching nodes get opacity-40 when a canvas search is active - Rewrite alert-retry-delivery tests to actually call the tRPC procedure through a caller, covering NOT_FOUND, BAD_REQUEST, webhook retry, channel retry, and missing target scenarios
C1/C2: Pipeline list page now uses usePipelineListFilters hook for URL-synced filter state and passes all filter parameters (search, status, tags, groupId, sortBy, sortOrder) to the server via useInfiniteQuery. Removed redundant client-side filtering that duplicated server work — only client-side sort for status and throughput (not available server-side) is retained. I1: Fleet deployment matrix conditionally renders based on filter state — shows a dashed-border prompt when no filters are active, renders the matrix only after the user applies a filter. I2: Added "Show exceptions only" toggle to DeploymentMatrixToolbar that filters the matrix to pipelines with version mismatch, crashed status, or missing deployment on some nodes. I3: Added withAudit middleware to setDefault and clearDefault mutations in filter-preset router for audit trail compliance. I4: Wrapped setDefault clear+set operations in prisma.$transaction to prevent race conditions from concurrent default-setting. Updated test to mock the transaction callback.
Adds cursor pagination to pipeline.list(), fleet matrix redesign with summary cards + filtered matrix, FilterPreset model with CRUD, density toggle, keyboard navigation, and exceptions-only filter.
…sions Adds pipeline CRUD, graph manipulation, YAML import, fleet management, metrics, logs, health, promotions, deploy requests, node groups, environments. Includes rate limiting middleware and OpenAPI v2.0.0 spec.
…atus Adds GitProvider abstraction (GitHub, GitLab, Bitbucket), git sync retry with exponential backoff, bidirectional sync approval gate, pipeline gitPath decoupling, and sync status dashboard.
…board perf Adds virtual scrolling and search improvements to log viewer, minimap + canvas search + auto-layout to pipeline editor, manual retry and aggregate view for alert deliveries, settings overview page, tab visibility optimization, and MetricStore memory cap with LRU eviction.
Contributor
|
Too many files changed for review. ( |
- pipelines/page.tsx: replace manual useInfiniteQuery with tRPC v11 infiniteQueryOptions pattern; add Ref type to StaggerItemProps to allow ref callback on polymorphic tr elements - settings-overview.tsx: cast session user to include isSuperAdmin and role fields that are not yet in the NextAuth session type - node-groups/route.ts, pipelines/[id]/nodes/route.ts, filter-preset.ts: cast Record<string,unknown> to Prisma.InputJsonValue for JSON column assignments - import-pipeline.test.ts: use (fn: unknown) mock pattern for $transaction to match Prisma's typed callback signature - git-sync-status.tsx: wrap unknown commitRef guard in Boolean() to produce ReactNode-compatible conditional expression - alert-evaluator.ts: add git_sync_failed to METRIC_LABELS record (enum value added to Prisma schema but record was not updated) Also regenerated Prisma client (src/generated/) to include git_sync_failed and other enum values present in schema.prisma but missing from generated client, and ran pnpm install to restore @tanstack/react-virtual and @testing-library/react which were missing from node_modules.
Owner
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Comprehensive enterprise scalability milestone making VectorFlow production-ready for corporate teams managing 100+ pipelines across multi-node fleets.
1. Scale UX
2. API v1 Completeness
3. GitOps Enhancement
4. Monitoring & Compliance
5. Production Polish
Stats
Test plan