Modern PostgreSQL Control Plane for Enterprise Cluster Management
pgControlPlane is a comprehensive, production-ready control plane for managing PostgreSQL clusters with high availability, automated failover, backup orchestration, and advanced observability. It seamlessly integrates all pgElephant components (pgbalancer, pgBackRest, pgraft, FauxDB, pgSentinel) into a unified cluster management platform.
- Multi-Cluster Management: Manage multiple PostgreSQL clusters from a single control plane
- Automated Failover: Intelligent failover with configurable policies and safety checks
- Blue-Green Deployments: Zero-downtime upgrades and migrations
- Backup Orchestration: Automated backup scheduling with pgBackRest and WAL-G integration
- Configuration Management: Centralized configuration with drift detection and auto-remediation
- Connection Pooling: Integrated pgbalancer management with automatic configuration updates
- Dual API: gRPC for performance + REST for convenience with OpenAPI specs
- PostgreSQL Persistence: Production-ready state storage with migrations
- OpenTelemetry: Complete observability with traces, metrics, and logs
- WebSocket Streaming: Real-time cluster status and event notifications
- Kubernetes Native: CRDs, operator, and Helm charts for K8s deployments
- Multi-Cloud: Works on VMs, bare metal, and Kubernetes across all cloud providers
- Smart Reconciliation: Continuous state reconciliation with configurable intervals
- Health Scoring: Advanced health metrics for intelligent decision-making
- Quorum-Based Operations: Safe promotions with quorum requirements
- Automated Healing: Self-healing clusters with automatic node recovery
- Point-in-Time Recovery: PITR support with backup/restore orchestration
- Monitoring Integration: Built-in Prometheus, Grafana, and pgSentinel integration
- mTLS: Mutual TLS between control plane and agents
- RBAC: Role-based access control with fine-grained permissions
- Vault Integration: Secret management with HashiCorp Vault
- Audit Logging: Complete audit trail for compliance
- JWT Authentication: Modern token-based authentication with refresh tokens
β‘ Deploy a complete cluster in one command:
cd pgControlPlane
./scripts/deploy-full-cluster.sh --name mycluster --nodes 3This single command deploys:
- β 3-node PostgreSQL cluster with automated failover
- β pgbalancer for connection pooling and load balancing
- β pgBackRest for automated backups
- β pgSentinel for real-time monitoring
- β FauxDB for MongoDB compatibility layer
- β Control plane agents on all nodes
See QUICKSTART.md for detailed instructions.
- Go 1.22+
- PostgreSQL 14+ (optional, managed by control plane)
- Docker & Docker Compose (for local deployment)
- Kubernetes 1.27+ (optional, for production K8s deployment)
# Download latest release
curl -L https://github.com/pgElephant/pgControlPlane/releases/latest/download/pgcp-linux-amd64.tar.gz | tar xz
sudo mv pgcp /usr/local/bin/
# Verify installation
pgcp versiongit clone https://github.com/pgElephant/pgControlPlane
cd pgControlPlane
make build
sudo make install# Deploy the complete pgElephant stack
cd pgControlPlane
docker-compose -f deployments/complete-stack.yaml up -d
# Verify all services are running
docker-compose -f deployments/complete-stack.yaml ps
# Access the cluster
psql postgresql://postgres:postgres@localhost:5435/productionServices included:
- pgControlPlane API: http://localhost:8080
- pgbalancer: localhost:5435
- pgSentinel Dashboard: http://localhost:3000
- Grafana: http://localhost:3001 (admin/admin)
- Prometheus: http://localhost:9091
- 3 PostgreSQL nodes with streaming replication
# Add Helm repository
helm repo add pgelephant https://pgelephant.github.io/charts
helm repo update
# Install with Helm
helm install pgcontrolplane pgelephant/pgcontrolplane \
--namespace pgcontrolplane \
--create-namespace \
--set database.url="postgres://user:pass@host:5432/pgcp"# Set up database
createdb pgcontrolplane
make migrate
# Configure
export PGCP_DATABASE_URL="postgres://localhost:5432/pgcontrolplane"
export PGCP_LOG_LEVEL="info"
# Run control plane
make run
# Or run with Docker Compose
docker-compose up -dOption 1: Automated Deployment (Recommended)
# Deploy complete cluster with all components
./scripts/deploy-full-cluster.sh \
--name production \
--nodes 3 \
--version 16.1 \
--replication async
# Output includes connection info and all service URLsOption 2: API-based Deployment
# Provision via orchestrator API
curl -X POST http://localhost:8080/api/v1/orchestrator/provision \
-H "Content-Type: application/json" \
-d '{
"name": "production",
"postgres_version": "16.1",
"node_count": 3,
"region": "us-east-1",
"enable_pgbalancer": true,
"enable_pgbackrest": true,
"enable_pgraft": false,
"enable_fauxdb": true,
"enable_pgsentinel": true,
"replication_mode": "async",
"backup_schedule": "0 2 * * *",
"instance_type": "m5.large",
"storage_gb": 100,
"extensions": ["pg_stat_statements", "pg_stat_insights"]
}'
# Check cluster status
curl http://localhost:8080/api/v1/clusters/production/statusOption 3: Kubernetes CRD
apiVersion: controlplane.pgelephant.com/v1
kind: PgCluster
metadata:
name: production
spec:
name: production
postgresVersion: "16.1"
nodeCount: 3
enablePgBalancer: true
enablePgBackRest: true
enablePgSentinel: truepgControlPlane seamlessly orchestrates all pgElephant components:
- Purpose: Intelligent connection pooling and load distribution
- Features: Round-robin/least-connected balancing, REST API control, MQTT clustering
- Integration: Auto-configured with backend nodes, health checks, automatic failover
- Access: Clients connect through pgbalancer for optimal performance
- Purpose: Automated backup and point-in-time recovery
- Features: Full/incremental/differential backups, encryption, compression
- Integration: Scheduled backups, retention policies, restore automation
- Storage: S3, Azure Blob, GCS, or local filesystem
- Purpose: Strong consistency with distributed consensus
- Features: Leader election, log replication, etcd-compatible API
- Integration: Alternative to streaming replication for CP guarantees
- Use Case: Financial systems, inventory management, critical data
- Purpose: Comprehensive cluster monitoring and alerting
- Features: Query analytics, replication monitoring, performance insights
- Integration: Auto-discovers all nodes, tracks metrics, generates alerts
- Dashboard: Beautiful web UI with real-time updates
- Purpose: MongoDB API compatibility for PostgreSQL
- Features: MongoDB wire protocol, JSON document storage, MongoDB query language
- Integration: Transparently translates MongoDB requests to PostgreSQL
- Use Case: Migrate MongoDB applications to PostgreSQL without code changes
- Purpose: Deep query performance analysis
- Features: Query tracking, execution plans, performance trends
- Integration: Installed on all nodes, data aggregated by pgSentinel
- Benefits: Identify slow queries, optimize performance
βββββββββββββββ
β Clients β
ββββββββ¬βββββββ
β
ββββββββΌββββββββββ ββββββββββββββββ
β pgbalancer ββββββββ€ pgControlPlaneβ
β (Port 5433) β β (Port 8080) β
βββββ¬ββββ¬ββββ¬βββββ βββββββββ¬ββββββββ
β β β β
βββββΌββββΌββββΌβββββ βββββββββββΌββββββββββ
β PostgreSQL ββββββ€ Agents β
β Nodes (1-N) β β (on each node) β
β + pg_stat_ β βββββββββββββββββββββ
β insights β
ββββββββββββββββββ β
β β
βββββββββΌβββββββββββ ββββββββββΌββββββββββ
β pgBackRest β β pgSentinel β
β (Backups) β β (Monitoring) β
ββββββββββββββββββββ ββββββββββββββββββββ
β
βββββββββΌβββββββββ
β FauxDB β
β (Testing) β
ββββββββββββββββββ
βββββββββββββββββββ
β Clients β
β Applications β
ββββββββββ¬βββββββββ
β
ββββββββββββββββΌβββββββββββββββ
β β β
βββββββββββΌβββββββββ β ββββββββββΌββββββββββ
β pgbalancer β β β pgControlPlane β
β Connection Pool β β β Control API β
β Load Balancer β β β (Port 8080) β
β (Port 5433) βββββββΌββββββ€ β
βββββββββββ¬βββββββββ β ββββββββββ¬ββββββββββ
β β β
ββββββββββββββΌβββββββββββββββ β
β β β
β βββββββββββΌββββββββββββββββββββββββ β
β β PostgreSQL Cluster Nodes β β
β β βββββββββββββββββββββββββββ β β
β β β Node 1 (Primary) ββββββΌβββββ€
β β β + Agent β β β
β β β + pg_stat_insights β β β
β β βββββββββββββ¬ββββββββββββββ β β
β β β Replication β β
β β βββββββββββββΌββββββββββββββ β β
β β β Node 2 (Replica) ββββββΌβββββ€
β β β + Agent β β β
β β β + pg_stat_insights β β β
β β βββββββββββββ¬ββββββββββββββ β β
β β β Replication β β
β β βββββββββββββΌββββββββββββββ β β
β β β Node 3 (Replica) ββββββΌβββββ€
β β β + Agent β β β
β β β + pg_stat_insights β β β
β β βββββββββββββββββββββββββββ β β
β ββββββββββ¬βββββββββββββββββββββββββ β
β β β
β ββββββββββΌβββββββββββ βββββββββββββΌβββββββββββ
β β pgBackRest β β pgSentinel β
β β Backup System β β Monitoring Hub β
β β - Full/Incr β β - Dashboard β
β β - PITR β β - Alerts β
β β - S3 Storage β β - Analytics β
β βββββββββββββββββββββ ββββββββββββ¬ββββββββββββ
β β
β ββββββββββββΌβββββββββββ
β β FauxDB β
β β MongoDB Compat β
ββββββββββββββββββββββββββββββ€ - Wire Protocol β
β - JSON Documents β
β - Query Trans. β
βββββββββββββββββββββββ
Legend:
βββ Data Flow ββββ Management/Control
β Replication βββ Component/Service
ββββββββββββββββββββββββββ pgControlPlane βββββββββββββββββββββββββββ
β β
β ββββββββββββββββββββββ API Layer ββββββββββββββββββββββββ β
β β β β
β β ββββββββββββ ββββββββββββ ββββββββββββββββ β β
β β β REST β β gRPC β β WebSocket β β β
β β β (8080) β β (9090) β β (8081) β β β
β β ββββββ¬ββββββ ββββββ¬ββββββ ββββββββ¬ββββββββ β β
β βββββββββΌββββββββββββββββΌββββββββββββββββββΌββββββββββββ β
β β β β β
β βββββββββΌββββββββββββββββΌββββββββββββββββββΌββββββββββββ β
β β Service Layer β β
β β β β
β β ββββββββββββββββ ββββββββββββββββ βββββββββββββ β β
β β β Cluster β β Reconciler β β Agent β β β
β β β Manager β β Loop β βCoordinatorβ β β
β β ββββββββββββββββ ββββββββββββββββ βββββββββββββ β β
β β β β
β β ββββββββββββββββ ββββββββββββββββ βββββββββββββ β β
β β β Backup β β Orchestrator β β WebSocket β β β
β β β Manager β β (Deploy) β β Manager β β β
β β ββββββββββββββββ ββββββββββββββββ βββββββββββββ β β
β ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββ β
β β PostgreSQL State Store (Control Plane DB) β β
β β β β
β β Tables: clusters | nodes | agents | backups β β
β β events | configurations β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββ Observability Stack ββββββββββββββββββ β
β β β β
β β β’ OpenTelemetry β’ Prometheus β’ Metrics (Port 2112) β β
β β β’ Traces β’ Logs β’ Grafana Dashboards β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- API Layer: Accepts requests via REST, gRPC, or WebSocket
- Service Layer: Business logic for cluster operations
- Persistence Layer: PostgreSQL for state + event sourcing
- Agent Communication: mTLS-secured commands to agents
- Observability: OpenTelemetry spans and metrics throughout
- CRUD operations for clusters and nodes
- Health check orchestration
- Topology management
- Continuous reconciliation loop (default: 30s)
- Detects and repairs drift
- Automated failover when needed
- Configuration synchronization
- Agent registration and heartbeat
- Command dispatch with retry logic
- Health monitoring and pruning
- Scheduled backup orchestration
- PITR support
- Backup verification
- Retention policy enforcement
# Server
PGCP_HTTP_PORT=8080
PGCP_GRPC_PORT=9090
PGCP_WS_PORT=8081
# Database
PGCP_DATABASE_URL=postgres://localhost:5432/pgcontrolplane
PGCP_DATABASE_MAX_CONNS=100
PGCP_DATABASE_MAX_IDLE_CONNS=10
# Security
PGCP_JWT_SECRET=your-secret-key
PGCP_JWT_EXPIRY=24h
PGCP_TLS_CERT=/path/to/cert.pem
PGCP_TLS_KEY=/path/to/key.pem
PGCP_MTLS_CA=/path/to/ca.pem
# Reconciliation
PGCP_RECONCILE_INTERVAL=30s
PGCP_PROMOTION_TIMEOUT=30s
PGCP_SAFE_PROMOTE=true
# Agent
PGCP_AGENT_TTL=5m
PGCP_AGENT_PRUNE_INTERVAL=1m
# Observability
PGCP_LOG_LEVEL=info
PGCP_LOG_FORMAT=json
PGCP_METRICS_PORT=2112
PGCP_TRACING_ENDPOINT=http://jaeger:14268/api/traces
# Features
PGCP_AUTO_FAILOVER=true
PGCP_AUTO_HEALING=true
PGCP_CONFIG_DRIFT_DETECTION=true# config.yaml
server:
http_port: 8080
grpc_port: 9090
ws_port: 8081
read_timeout: 30s
write_timeout: 30s
database:
url: postgres://localhost:5432/pgcontrolplane
max_connections: 100
max_idle_connections: 10
connection_timeout: 10s
security:
jwt:
secret: ${JWT_SECRET}
expiry: 24h
tls:
enabled: true
cert_file: /etc/pgcp/tls/cert.pem
key_file: /etc/pgcp/tls/key.pem
ca_file: /etc/pgcp/tls/ca.pem
rbac:
enabled: true
reconciler:
interval: 30s
promotion_timeout: 30s
safe_promote: true
max_concurrent_reconciles: 5
agents:
ttl: 5m
prune_interval: 1m
command_timeout: 2m
observability:
logging:
level: info
format: json
metrics:
enabled: true
port: 2112
tracing:
enabled: true
endpoint: http://jaeger:14268/api/traces
sample_rate: 0.1
features:
auto_failover: true
auto_healing: true
config_drift_detection: true
backup_orchestration: trueFull OpenAPI/Swagger documentation available at /api/docs
# Login
POST /api/v1/auth/login
{
"username": "admin",
"password": "secret"
}
# Response
{
"access_token": "eyJ...",
"refresh_token": "eyJ...",
"expires_in": 86400
}# List clusters
GET /api/v1/clusters
# Get cluster
GET /api/v1/clusters/{id}
# Create cluster
POST /api/v1/clusters
{
"name": "production",
"region": "us-east-1",
"postgres_version": "16.1",
"replication_mode": "sync",
"auto_failover": true
}
# Update cluster
PUT /api/v1/clusters/{id}
# Delete cluster
DELETE /api/v1/clusters/{id}
# Get cluster status
GET /api/v1/clusters/{id}/status
# Get cluster topology
GET /api/v1/clusters/{id}/topology
# Get cluster metrics
GET /api/v1/clusters/{id}/metrics# List nodes
GET /api/v1/clusters/{cluster_id}/nodes
# Add node
POST /api/v1/clusters/{cluster_id}/nodes
{
"host": "192.168.1.10",
"port": 5432,
"role": "replica",
"priority": 100
}
# Remove node
DELETE /api/v1/clusters/{cluster_id}/nodes/{node_id}
# Promote node
POST /api/v1/clusters/{cluster_id}/nodes/{node_id}/promote
{
"force": false
}# List backups
GET /api/v1/clusters/{cluster_id}/backups
# Create backup
POST /api/v1/clusters/{cluster_id}/backups
{
"type": "full",
"compression": true
}
# Restore backup
POST /api/v1/clusters/{cluster_id}/restore
{
"backup_id": "backup-123",
"point_in_time": "2024-01-15T10:30:00Z"
}See api/proto/controlplane.proto for full service definitions
service ControlPlane {
rpc CreateCluster(CreateClusterRequest) returns (Cluster);
rpc GetCluster(GetClusterRequest) returns (Cluster);
rpc ListClusters(ListClustersRequest) returns (ListClustersResponse);
rpc UpdateCluster(UpdateClusterRequest) returns (Cluster);
rpc DeleteCluster(DeleteClusterRequest) returns (Empty);
rpc AddNode(AddNodeRequest) returns (Node);
rpc RemoveNode(RemoveNodeRequest) returns (Empty);
rpc PromoteNode(PromoteNodeRequest) returns (PromoteNodeResponse);
rpc StreamClusterEvents(StreamClusterEventsRequest) returns (stream ClusterEvent);
}// Connect
const ws = new WebSocket('ws://localhost:8081/api/v1/ws');
// Subscribe to cluster events
ws.send(JSON.stringify({
action: 'subscribe',
cluster_id: 'production'
}));
// Receive events
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log('Event:', data);
};Prometheus metrics exposed on port 2112:
# Control Plane Metrics
pgcp_clusters_total
pgcp_nodes_total
pgcp_reconcile_runs_total
pgcp_reconcile_duration_seconds
pgcp_promotions_total
pgcp_promotions_failed_total
pgcp_failovers_total
pgcp_agent_commands_total
pgcp_agent_commands_duration_seconds
pgcp_backup_operations_total
pgcp_backup_size_bytes
# Per-Cluster Metrics
pgcp_cluster_health_score
pgcp_cluster_replication_lag_seconds
pgcp_cluster_nodes_up
pgcp_cluster_nodes_down
Structured JSON logs with correlation IDs:
{
"level": "info",
"timestamp": "2024-01-15T10:30:45Z",
"correlation_id": "req-123-abc",
"component": "reconciler",
"cluster_id": "production",
"message": "promoting node to primary",
"node_id": "node-2",
"reason": "primary_down"
}OpenTelemetry traces for all operations:
Promote Node
βββ Check Quorum (15ms)
βββ Validate Candidate (8ms)
βββ Send Promote Command (120ms)
β βββ Agent Call (100ms)
β βββ Retry Logic (20ms)
βββ Update State (25ms)
βββ Notify Watchers (10ms)
Total: 178ms
version: '3.8'
services:
pgcontrolplane:
image: pgelephant/pgcontrolplane:latest
ports:
- "8080:8080"
- "9090:9090"
- "2112:2112"
environment:
PGCP_DATABASE_URL: postgres://pgcp:secret@postgres:5432/pgcontrolplane
PGCP_LOG_LEVEL: info
depends_on:
- postgres
postgres:
image: postgres:16
environment:
POSTGRES_DB: pgcontrolplane
POSTGRES_USER: pgcp
POSTGRES_PASSWORD: secret
volumes:
- pgcp-data:/var/lib/postgresql/data
volumes:
pgcp-data:apiVersion: v1
kind: Namespace
metadata:
name: pgcontrolplane
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: pgcontrolplane
namespace: pgcontrolplane
spec:
replicas: 3
selector:
matchLabels:
app: pgcontrolplane
template:
metadata:
labels:
app: pgcontrolplane
spec:
containers:
- name: pgcontrolplane
image: pgelephant/pgcontrolplane:latest
ports:
- containerPort: 8080
name: http
- containerPort: 9090
name: grpc
- containerPort: 2112
name: metrics
env:
- name: PGCP_DATABASE_URL
valueFrom:
secretKeyRef:
name: pgcp-secrets
key: database-url
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /readyz
port: 8080
initialDelaySeconds: 5
periodSeconds: 5# Minimal single-node cluster for development
export ENABLE_PGSENTINEL=false
export ENABLE_FAUXDB=false
export INSTANCE_TYPE=t3.small
./scripts/deploy-full-cluster.sh --name dev --nodes 1# 5-node cluster with synchronous replication
./scripts/deploy-full-cluster.sh \
--name production \
--nodes 5 \
--version 16.1 \
--replication sync \
--region us-east-1# Strong consistency with pgraft
./scripts/deploy-full-cluster.sh \
--name financial \
--nodes 5 \
--with-raft# Primary region
./scripts/deploy-full-cluster.sh \
--name primary \
--nodes 3 \
--region us-east-1
# Standby region
./scripts/deploy-full-cluster.sh \
--name standby \
--nodes 3 \
--region us-west-2# Full stack with all testing tools
export ENABLE_FAUXDB=true
export ENABLE_PGSENTINEL=true
./scripts/deploy-full-cluster.sh \
--name testing \
--nodes 3
# Run automated tests
curl -X POST http://localhost:5000/api/tests/runSee the examples/ directory for more:
basic-cluster/- Simple 3-node cluster setupha-cluster/- High-availability configurationkubernetes/- Complete Kubernetes deployment- Full documentation at QUICKSTART.md
# Run unit tests
make test
# Run integration tests
make test-integration
# Run end-to-end tests
make test-e2e
# Run with coverage
make test-coverage
# Run benchmarks
make benchWe welcome contributions! Please see CONTRIBUTING.md for details.
Apache License 2.0 - see LICENSE file for details.
- Website: https://www.pgelephant.com/pgcontrolplane
- Documentation: https://docs.pgelephant.com/pgcontrolplane
- GitHub: https://github.com/pgElephant/pgControlPlane
- Discord: https://discord.gg/pgelephant
- Blog: https://www.pgelephant.com/blog
Built with β€οΈ by the pgElephant team and contributors.
Special thanks to the PostgreSQL community and the following projects:
- PostgreSQL
- pgBackRest
- Patroni
- etcd
- OpenTelemetry