A high-performance, fault-tolerant distributed job processing system built in Go. This is a sophisticated distributed system that implements multiple consensus algorithms, advanced load balancing strategies, and enterprise-grade monitoring capabilities.
This system was built to solve real distributed computing challenges. While most job processors are simple single-node affairs, this one handles the complexities of distributed coordination, leader election, and fault tolerance that you'd encounter in production environments.
βββββββββββββββββββ    βββββββββββββββββββ    βββββββββββββββββββ
β   Node 1        β    β   Node 2        β    β   Node 3        β
β  (Leader)       βββββΊβ                 βββββΊβ                 β
β                 β    β                 β    β                 β
β βββββββββββββββ β    β βββββββββββββββ β    β βββββββββββββββ β
β β  Workers    β β    β β  Workers    β β    β β  Workers    β β
β β  ββWorker-1 β β    β β  ββWorker-1 β β    β β  ββWorker-1 β β
β β  ββWorker-2 β β    β β  ββWorker-2 β β    β β  ββWorker-2 β β
β β  ββWorker-N β β    β β  ββWorker-N β β    β β  ββWorker-N β β
β βββββββββββββββ β    β βββββββββββββββ β    β βββββββββββββββ β
βββββββββββββββββββ    βββββββββββββββββββ    βββββββββββββββββββ
         β                       β                       β
         βββββββββββββββββββββββββΌββββββββββββββββββββββββ
                                 β
                    βββββββββββββββββββ
                    β   Redis Queue   β
                    β βββββββββββββββ β
                    β β Jobs Queue  β β
                    β β Processing  β β
                    β β Completed   β β
                    β β Failed      β β
                    β β Delayed     β β
                    β βββββββββββββββ β
                    βββββββββββββββββββ
                             β
                    βββββββββββββββββββ
                    β   MongoDB       β
                    β βββββββββββββββ β
                    β β Jobs        β β
                    β β Nodes       β β
                    β β Elections   β β
                    β β Metrics     β β
                    β βββββββββββββββ β
                    βββββββββββββββββββ
- Multi-Node Architecture: True distributed processing across multiple nodes
 - Leader Election: Three consensus algorithms (Bully, Raft, Gossip)
 - Fault Tolerance: Automatic failover and cluster reconfiguration
 - Dynamic Membership: Nodes can join/leave the cluster seamlessly
 
- Priority-Based Scheduling: High, Normal, Low priority job execution
 - Smart Retry Logic: Exponential backoff with jitter to prevent thundering herd
 - Delayed Job Execution: Schedule jobs for future execution
 - Job State Tracking: Complete lifecycle management (pending to running to completed/failed)
 
- Round Robin: Fair distribution across workers
 - Least Loaded: Route to workers with fewest active jobs
 - Random: Distribute jobs randomly for maximum throughput
 - Priority-Based: Match high-priority jobs to least-loaded workers
 
- JWT-Based Authentication: Role-based access control (Admin, User, Worker)
 - TLS Encryption: End-to-end encrypted communication
 - Configurable Security: Can be disabled for development environments
 
- Prometheus Metrics: Comprehensive system metrics collection
 - Grafana Dashboards: Pre-configured visualization dashboards
 - Health Checks: Built-in health endpoints for orchestration
 - Real-time Statistics: Live job queue and worker statistics
 
- Hot Configuration: Environment variable-based configuration
 - Docker Ready: Multi-service Docker Compose setups
 - Comprehensive Testing: Unit, integration, and benchmark tests
 - Client Libraries: Go client example with HTTP API
 
- Docker and Docker Compose
 - MongoDB (or use Docker Compose)
 - Redis (or use Docker Compose)
 
git clone <repository-url>
cd distributed-job-processor-goCopy the example configuration file and customize it:
cp .env.example .env
# Edit .env with your preferred settingsThe .env.example file contains all available configuration options with detailed explanations. Here are the key sections:
The configuration is organized into logical sections:
π₯οΈ Server Configuration
SERVER_PORT=8080              # HTTP API port
SERVER_HOST=0.0.0.0          # Bind address (0.0.0.0 = all interfaces)
WORKER_COUNT=10              # Worker threads per node
NODE_ID=node-1               # Unique node identifier (MUST be unique!)ποΈ Database Configuration
# MongoDB (persistent job storage)
MONGODB_URI=mongodb://localhost:27017
MONGODB_DATABASE=jobprocessor
MONGODB_TIMEOUT=30s
# Redis (job queue)
REDIS_ADDR=localhost:6379
REDIS_PASSWORD=                # Leave empty if no password
REDIS_DB=0                    # Use different numbers for environmentsπ³οΈ Election Algorithm Settings
ELECTION_ALGORITHM=bully      # bully, raft, or gossip
ELECTION_TIMEOUT=10s         # How long to wait for election
ELECTION_INTERVAL=30s        # Heartbeat frequencyβοΈ Load Balancing Configuration
LOAD_BALANCER_STRATEGY=round_robin  # round_robin, least_loaded, random, priorityπ Retry Policy Settings
RETRY_POLICY=exponential     # fixed or exponential
MAX_RETRIES=3               # Maximum retry attempts
RETRY_BASE_DELAY=1s         # Initial retry delay
RETRY_MAX_DELAY=60s         # Maximum retry delay
RETRY_MULTIPLIER=2.0        # Exponential backoff multiplier
RETRY_JITTER_FACTOR=0.1     # Randomness to prevent thundering herdπ Security Configuration
AUTH_ENABLED=false          # Enable JWT authentication
JWT_SECRET=your-secret-key  # Change this in production!
TLS_ENABLED=false          # Enable TLS encryption
TLS_CERT_FILE=             # Path to certificate file
TLS_KEY_FILE=              # Path to private key fileπ Monitoring Settings
METRICS_ENABLED=true       # Enable Prometheus metrics
METRICS_PORT=9090         # Metrics endpoint portDevelopment Setup
AUTH_ENABLED=false
TLS_ENABLED=false
WORKER_COUNT=5
ELECTION_TIMEOUT=5sProduction Setup
AUTH_ENABLED=true
TLS_ENABLED=true
WORKER_COUNT=20
JWT_SECRET=your-secure-production-secret-generated-with-openssl
TLS_CERT_FILE=/etc/ssl/certs/app.crt
TLS_KEY_FILE=/etc/ssl/private/app.key
MONGODB_URI=mongodb+srv://user:pass@prod-cluster.mongodb.netSmall Cluster (Bully Algorithm)
ELECTION_ALGORITHM=bully
ELECTION_TIMEOUT=10s
ELECTION_INTERVAL=30s
# Best for: 2-10 stable nodesMedium Cluster (Raft Consensus)
ELECTION_ALGORITHM=raft
ELECTION_TIMEOUT=5s
ELECTION_INTERVAL=2s
# Best for: 3-7 nodes requiring strong consistencyLarge Cluster (Gossip Protocol)
ELECTION_ALGORITHM=gossip
ELECTION_TIMEOUT=15s
ELECTION_INTERVAL=1s
# Best for: 10+ nodes with dynamic membershipπ‘ Pro Tip: Start with the default configuration and adjust based on your specific needs. The system is designed to work well out of the box with sensible defaults.
# For Bully Algorithm (2 nodes)
docker-compose up
# For Raft Algorithm (3 nodes)
docker-compose -f docker-compose.raft.yml up
# For Gossip Algorithm (3 nodes)
docker-compose -f docker-compose.gossip.yml up# Create a job
curl -X POST http://localhost:8080/api/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "type": "email",
    "priority": 5,
    "payload": {
      "recipient": "user@example.com",
      "subject": "Test Job"
    },
    "max_retries": 3
  }'
# Check system status
curl http://localhost:8080/api/v1/stats
# View health status
curl http://localhost:8080/healthThis system implements three different consensus algorithms, each optimized for different scenarios:
Best for: Small, stable clusters (2-10 nodes)
- Simple priority-based leader election
 - Fast convergence with stable membership
 - Lower overhead for small clusters
 
# Configuration
ELECTION_ALGORITHM=bully
ELECTION_TIMEOUT=10s
ELECTION_INTERVAL=30sBest for: Medium clusters requiring strong consistency (3-7 nodes)
- Industry-standard consensus protocol
 - Strong consistency guarantees
 - Handles network partitions gracefully
 - Requires majority quorum
 
# Configuration
ELECTION_ALGORITHM=raft
ELECTION_TIMEOUT=5s
ELECTION_INTERVAL=2sBest for: Large, dynamic clusters (10+ nodes)
- Scalable to hundreds of nodes
 - Eventually consistent
 - Built-in failure detection
 - Handles network partitions well
 
# Configuration
ELECTION_ALGORITHM=gossip
ELECTION_TIMEOUT=15s
ELECTION_INTERVAL=1s| Variable | Default | Description | 
|---|---|---|
SERVER_PORT | 
8080 | 
HTTP server port | 
SERVER_HOST | 
0.0.0.0 | 
Server bind address | 
WORKER_COUNT | 
10 | 
Number of worker threads per node | 
NODE_ID | 
node-1 | 
Unique identifier for this node | 
| Variable | Default | Description | 
|---|---|---|
MONGODB_URI | 
mongodb://localhost:27017 | 
MongoDB connection string | 
MONGODB_DATABASE | 
jobprocessor | 
Database name | 
MONGODB_TIMEOUT | 
30s | 
Connection timeout | 
REDIS_ADDR | 
localhost:6379 | 
Redis server address | 
REDIS_PASSWORD | 
`` | Redis password (if required) | 
REDIS_DB | 
0 | 
Redis database number | 
| Variable | Default | Description | 
|---|---|---|
ELECTION_ALGORITHM | 
bully | 
Algorithm: bully, raft, gossip | 
ELECTION_TIMEOUT | 
10s | 
Election timeout duration | 
ELECTION_INTERVAL | 
30s | 
Heartbeat/election interval | 
| Variable | Default | Description | 
|---|---|---|
RETRY_POLICY | 
exponential | 
Policy: fixed, exponential | 
MAX_RETRIES | 
3 | 
Maximum retry attempts | 
RETRY_BASE_DELAY | 
1s | 
Initial retry delay | 
RETRY_MAX_DELAY | 
60s | 
Maximum retry delay | 
RETRY_MULTIPLIER | 
2.0 | 
Exponential backoff multiplier | 
RETRY_JITTER_FACTOR | 
0.1 | 
Jitter factor (0.0-1.0) | 
| Variable | Default | Description | 
|---|---|---|
AUTH_ENABLED | 
false | 
Enable JWT authentication | 
JWT_SECRET | 
your-secret-key | 
JWT signing secret | 
TLS_ENABLED | 
false | 
Enable TLS encryption | 
TLS_CERT_FILE | 
`` | TLS certificate file path | 
TLS_KEY_FILE | 
`` | TLS private key file path | 
POST /api/v1/jobs
Content-Type: application/json
{
  "type": "email",
  "priority": 5,
  "payload": {
    "recipient": "user@example.com",
    "subject": "Hello World"
  },
  "max_retries": 3,
  "scheduled_at": "2024-01-01T12:00:00Z"
}GET /api/v1/jobs/{job_id}GET /api/v1/jobs?status=pending&type=email&limit=50DELETE /api/v1/jobs/{job_id}GET /api/v1/statsGET /api/v1/workersGET /api/v1/nodesGET /api/v1/leaderGET /api/v1/election/algorithms/{algorithm}GET /healthGET /metrics  # Port 9090POST /auth/login
Content-Type: application/json
{
  "username": "admin",
  "password": "admin123"
}Default Users:
admin/admin123(Admin role)user/user123(User role)worker/worker123(Worker role)
# Single node development
go run cmd/server/main.go
# Or with Docker
docker build -t job-processor .
docker run -p 8080:8080 -p 9090:9090 job-processordocker-compose up
# Nodes: localhost:8080, localhost:8081docker-compose -f docker-compose.raft.yml up
# Nodes: localhost:8080, localhost:8081, localhost:8082docker-compose -f docker-compose.gossip.yml up
# Nodes: localhost:8080, localhost:8081, localhost:8082All deployments include:
- Prometheus: 
http://localhost:9093(or 9092) - Grafana: 
http://localhost:3000(admin/admin) - Redis: 
localhost:6379 
# Unit tests
go test ./tests -v
# Integration tests (requires MongoDB & Redis)
go test ./tests -v -tags=integration
# Benchmarks
go test ./tests -bench=. -benchmem
# Election algorithm tests
go test ./tests -run=TestElection -v- Unit Tests: Load balancing, retry policies, job processing
 - Integration Tests: Full system workflows, API endpoints
 - Election Tests: All three consensus algorithms
 - Benchmarks: Performance testing for critical paths
 
BenchmarkRoundRobinSelection-8        1000000    1043 ns/op      0 B/op    0 allocs/op
BenchmarkLeastLoadedSelection-8        500000    2856 ns/op      0 B/op    0 allocs/op
BenchmarkRetryPolicyCalculation-8    5000000     267 ns/op      0 B/op    0 allocs/op
βββ cmd/server/          # Main application entry point
βββ pkg/job/             # Job definitions and interfaces
βββ internal/
β   βββ auth/           # Authentication & authorization
β   βββ config/         # Configuration management
β   βββ election/       # Consensus algorithms
β   βββ loadbalancer/   # Load balancing strategies
β   βββ logger/         # Structured logging
β   βββ metrics/        # Prometheus metrics
β   βββ queue/          # Redis queue implementation
β   βββ retry/          # Retry policy implementations
β   βββ server/         # HTTP server & API handlers
β   βββ storage/        # MongoDB storage layer
β   βββ tls/           # TLS configuration
β   βββ worker/        # Worker pool management
βββ tests/              # Test suites
βββ examples/           # Client examples
βββ scripts/            # Development scripts
βββ monitoring/         # Grafana dashboards & Prometheus config
type EmailProcessor struct{}
func (e *EmailProcessor) Process(ctx context.Context, job *job.Job) error {
    recipient := job.Payload["recipient"].(string)
    subject := job.Payload["subject"].(string)
    
    // Send email logic here
    return sendEmail(recipient, subject)
}
func (e *EmailProcessor) Type() string {
    return "email"
}
// Register the processor
server.RegisterProcessor(&EmailProcessor{})- Fork the repository
 - Create a feature branch (
git checkout -b feature/amazing-feature) - Add tests for your changes
 - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
 
- The system is optimized for throughput over latency
 - Raft algorithm provides strongest consistency with highest overhead
 - Gossip protocol scales best with eventual consistency
 - Redis pipelining is used for queue operations
 - MongoDB operations are optimized with proper indexing
 
The system exposes comprehensive metrics via Prometheus:
job_processing_duration_seconds: Job processing timeselection_leader_changes_total: Leadership changesworker_active_gauge: Active workers per nodequeue_depth_gauge: Jobs in each queue statenode_uptime_seconds: Node uptime tracking
Leader Election Issues
- Check MongoDB connectivity between nodes
 - Verify 
NODE_IDis unique per node - Ensure election timeouts are appropriate for network latency
 
Job Processing Issues
- Verify Redis connectivity
 - Check if job processors are registered
 - Monitor worker pool status via 
/api/v1/workers 
Performance Issues
- Increase 
WORKER_COUNTfor CPU-bound jobs - Tune retry policies to avoid overwhelming downstream services
 - Monitor queue depth and adjust cluster size accordingly
 
# Test MongoDB connectivity
./scripts/test-mongodb.sh
# Monitor cluster status
./scripts/monitor.sh
# Test different algorithms
./scripts/test-algorithms.shThis distributed job processor demonstrates enterprise-grade distributed systems engineering. It handles the complex challenges of distributed coordination, fault tolerance, and scalability that you'd encounter in production environments. Whether you're processing background jobs, implementing workflow orchestration, or building distributed systems, this codebase provides a solid foundation with battle-tested patterns.
For questions, issues, or contributions, please check the issue tracker or reach out to swe.robertkibet@gmail.com.