Skip to content

Latest commit

Β 

History

History
654 lines (484 loc) Β· 14.3 KB

File metadata and controls

654 lines (484 loc) Β· 14.3 KB

Configuration Guide

Comprehensive guide for configuring Charon before and after deployment.


Configuration Files Overview

Charon uses two primary configuration mechanisms:

  1. .env file - Environment variables for local development and scripts
  2. terraform/terraform.tfvars - Terraform variable overrides for deployment

Security Best Practices:

  • Never commit these files to version control (both in .gitignore)
  • Set file permissions to 600 (owner read/write only)
  • Use example files as templates: .env.example, terraform.tfvars.example
  • Rotate credentials periodically (90 days recommended)

Creating .env File

The .env file is used by local scripts and development tools.

Required Environment Variables

Minimal .env for local development:

# Kubernetes configuration
export KUBECONFIG=~/.kube/config

# Domain configuration
export DOMAIN_NAME=example.org
export HEADSCALE_BASE_DOMAIN=vpn.example.org

# Cloudflare DNS
export CLOUDFLARE_API_TOKEN=your-cloudflare-api-token
export CLOUDFLARE_ZONE_ID=your-zone-id

# GitHub authentication
export GITHUB_TOKEN=github_pat_xxxxx

Optional Environment Variables

Extended configuration:

# Namespace overrides
export CORE_NAMESPACE=core
export MONITORING_NAMESPACE=monitoring
export APPS_NAMESPACE=apps

# Service URLs
export NETBOX_URL=https://netbox.example.org
export GRAFANA_URL=https://grafana.example.org
export FREEIPA_URL=https://ipa.example.org

# Python development
export PYTHONPATH="${PYTHONPATH}:$(pwd)/scripts"

# Terraform workspace
export TF_WORKSPACE=production

Security Best Practices

File permissions:

chmod 600 .env

Loading environment variables:

# In bash/zsh
source .env

# Or use direnv (recommended)
# Add to .envrc:
source_env .env

Credential management:

  • Use password managers (1Password, Bitwarden) to generate secure tokens
  • Never share .env files via email or chat
  • Rotate tokens on team member departure
  • Use separate tokens for production and development

Configuring terraform.tfvars

The terraform.tfvars file contains all deployment configuration.

Core Configuration

Minimal terraform.tfvars:

# Kubernetes cluster
kubeconfig_path = "~/.kube/config"

# Domain configuration
domain_name          = "example.org"
headscale_base_domain = "vpn.example.org"

# Required secrets
cloudflare_api_token = "your-cloudflare-token"
github_token         = "github_pat_xxxxx"

# Service passwords
netbox_superuser_password = "secure-random-password-32-chars"
freeipa_admin_password   = "secure-random-password-32-chars"
grafana_admin_password   = "secure-random-password-32-chars"

Generate secure passwords:

# Using openssl
openssl rand -base64 32

# Using pwgen
pwgen -s 32 1

# Using 1Password CLI
op item create --category=password --title="NetBox Admin" --generate-password=32,letters,digits

Service-Specific Settings

NetBox Configuration

# NetBox settings
netbox_enabled            = true
netbox_storage           = "10Gi"
netbox_superuser_name    = "admin"
netbox_superuser_email   = "admin@example.org"
netbox_superuser_password = "secure-password"

# NetBox plugins (external repository)
netbox_plugins_repo   = "https://github.com/vegcom/netbox-plugins-bundled.git"
netbox_plugins_branch = "main"
netbox_plugins_token  = "github_pat_xxxxx"  # Read access to plugin repo

# NetBox Tailscale integration
netbox_tailscale_enabled = true

Headscale VPN Configuration

# Headscale settings
headscale_enabled       = true
headscale_version      = "latest"
headscale_nodeport     = 30080  # Public HTTP port
headscale_grpc_nodeport = 30443 # Internal gRPC port
headscale_ip_prefix    = "100.64.0.0/10"  # VPN subnet (CGNAT range)
headscale_base_domain  = "vpn.example.org"
headscale_magic_dns    = true
headscale_storage      = "1Gi"

# Public URL for Headscale (used by clients)
headscale_server_url = "http://node-ip:30080"

FreeIPA LDAP Configuration

# FreeIPA settings
freeipa_enabled       = true
freeipa_domain        = "ipa.example.org"
freeipa_realm         = "IPA.EXAMPLE.ORG"  # Uppercase
freeipa_admin_password = "secure-password"
freeipa_storage       = "10Gi"

Grafana Monitoring Configuration

# Grafana settings
grafana_enabled         = true
grafana_admin_user     = "admin"
grafana_admin_password = "secure-password"
grafana_storage        = "5Gi"

Redmine Project Management

# Redmine settings
redmine_enabled         = true
redmine_db_password    = "secure-password"
redmine_storage        = "5Gi"
redmine_tailscale_enabled = true

Open-WebUI AI Interface

# Open-WebUI settings
openwebui_enabled      = true
openwebui_storage     = "5Gi"
openwebui_ollama_base_url = "http://ollama.inference.svc.cluster.local:11434"

Resource Allocation

CPU and memory limits:

# NetBox resources
netbox_cpu_request    = "500m"
netbox_cpu_limit      = "2000m"
netbox_memory_request = "1Gi"
netbox_memory_limit   = "4Gi"

# Grafana resources
grafana_cpu_request    = "250m"
grafana_cpu_limit      = "1000m"
grafana_memory_request = "512Mi"
grafana_memory_limit   = "2Gi"

# FreeIPA resources (requires more RAM)
freeipa_cpu_request    = "1000m"
freeipa_cpu_limit      = "2000m"
freeipa_memory_request = "2Gi"
freeipa_memory_limit   = "4Gi"

Storage configuration:

# Persistent volume sizes
netbox_storage    = "10Gi"
freeipa_storage   = "10Gi"
grafana_storage   = "5Gi"
redmine_storage   = "5Gi"
gitea_storage     = "10Gi"
loki_storage      = "20Gi"  # Monitoring logs

Network Configuration

Ingress and DNS:

# Domain names
domain_name = "example.org"

# Service hostnames (auto-generated from domain_name)
# netbox.example.org
# grafana.example.org
# ipa.example.org
# etc.

# Cloudflare integration
cloudflare_api_token = "your-token"
cloudflare_proxied   = true  # DDoS protection

Headscale VPN networking:

# VPN subnet (CGNAT private range)
headscale_ip_prefix = "100.64.0.0/10"

# MagicDNS domain
headscale_base_domain = "vpn.example.org"

# DNS servers for VPN clients
headscale_dns_servers = ["1.1.1.1", "8.8.8.8"]  # TODO: Make configurable

Enabling/Disabling Services

Control which services are deployed using boolean flags:

Core Infrastructure (Always Recommended)

headscale_enabled  = true  # VPN mesh network
cert_manager_enabled = true  # TLS certificate automation

Infrastructure Services

netbox_enabled    = true   # DCIM and IPAM
freeipa_enabled   = true   # LDAP and Kerberos
grafana_enabled   = true   # Monitoring dashboards
loki_enabled      = true   # Log aggregation
prometheus_enabled = true   # Metrics collection

Application Services

redmine_enabled   = true   # Project management
gitea_enabled     = true   # Git hosting
vaultwarden_enabled = true   # Password manager
paperless_enabled = true   # Document management
jellyfin_enabled  = true   # Media server
immich_enabled    = true   # Photo management
openwebui_enabled = true   # AI chat interface

Example: Minimal Deployment

# Only essential services
headscale_enabled = true
netbox_enabled    = true
grafana_enabled   = true

# Disable everything else
freeipa_enabled    = false
redmine_enabled    = false
gitea_enabled      = false
vaultwarden_enabled = false
paperless_enabled  = false
jellyfin_enabled   = false
immich_enabled     = false
openwebui_enabled  = false

Customizing Resource Limits

Understanding Resource Requests vs Limits

  • Request: Guaranteed resources (used for scheduling)
  • Limit: Maximum resources (enforced, pods killed if exceeded)

Tuning for Different Cluster Sizes

Small Cluster (3 nodes, 8GB RAM each):

# Conservative limits
netbox_memory_request  = "512Mi"
netbox_memory_limit    = "2Gi"
grafana_memory_request = "256Mi"
grafana_memory_limit   = "1Gi"
freeipa_memory_request = "1Gi"
freeipa_memory_limit   = "2Gi"

Medium Cluster (6 nodes, 16GB RAM each):

# Balanced limits
netbox_memory_request  = "1Gi"
netbox_memory_limit    = "4Gi"
grafana_memory_request = "512Mi"
grafana_memory_limit   = "2Gi"
freeipa_memory_request = "2Gi"
freeipa_memory_limit   = "4Gi"

Large Cluster (10+ nodes, 32GB+ RAM each):

# Generous limits
netbox_memory_request  = "2Gi"
netbox_memory_limit    = "8Gi"
grafana_memory_request = "1Gi"
grafana_memory_limit   = "4Gi"
freeipa_memory_request = "4Gi"
freeipa_memory_limit   = "8Gi"

GPU Configuration (for AI workloads)

# VLLM inference with GPU
vllm_enabled = true
vllm_gpu_count = 1
vllm_gpu_node_selector = "gpu"
vllm_gpu_node_value   = "nvidia-rtx-3090"

# Ollama CPU inference
ollama_enabled = true
ollama_cpu_count = 4
ollama_memory_limit = "8Gi"

Domain and Hostname Configuration

Primary Domain Configuration

# Base domain for all services
domain_name = "example.org"

This creates:

  • netbox.example.org
  • grafana.example.org
  • ipa.example.org
  • redmine.example.org
  • etc.

Custom Hostnames (Advanced)

# Override default hostname patterns
netbox_hostname  = "dcim.example.org"  # Instead of netbox.example.org
grafana_hostname = "monitoring.example.org"  # Instead of grafana.example.org

VPN Domain Configuration

# Separate domain for VPN mesh
headscale_base_domain = "vpn.example.org"

Creates MagicDNS names:

  • netbox.vpn.example.org
  • grafana.vpn.example.org
  • Individual node names: node-01.vpn.example.org

Storage Configuration

Storage Classes

Charon uses dynamic provisioning with storage classes:

# Default storage class (auto-detected from cluster)
# K3s default: local-path
# Linode LKE: linode-block-storage
# AWS EKS: gp2/gp3

Per-Service Storage Sizing

# Database-heavy services need more storage
netbox_storage  = "20Gi"  # DCIM data, network diagrams
gitea_storage   = "50Gi"  # Git repositories
immich_storage  = "100Gi" # Photo storage

# Log aggregation needs lots of space
loki_storage = "50Gi"  # Retention depends on cluster size

# Application data
freeipa_storage    = "10Gi"  # LDAP directory
grafana_storage    = "5Gi"   # Dashboards and config
vaultwarden_storage = "1Gi"   # Encrypted vault data

Storage Expansion

To increase storage for a running service:

# In terraform.tfvars
netbox_storage = "30Gi"  # Increased from 10Gi

# Apply changes
# terraform apply

Note: Storage can only be expanded, not shrunk. StatefulSet PVCs must be manually deleted to reduce size.


Configuration Validation

Pre-Deployment Checks

# Validate Terraform configuration
cd terraform
terraform validate

# Check for syntax errors
terraform fmt -check

# Review planned changes
terraform plan

Post-Deployment Verification

# Verify all services configured correctly
kubectl get pods -A

# Check ConfigMaps applied
kubectl get configmaps -n core

# Verify Secrets created
kubectl get secrets -n core

Advanced Configuration

Lifecycle Automation

# Tailscale lifecycle automation (external repository)
lifecycle_automation_repo   = "https://github.com/vegcom/tailscale-lifecycle-automation.git"
lifecycle_automation_branch = "main"
lifecycle_automation_token  = "github_pat_xxxxx"

Image Build Configuration

# NetBox custom image build
netbox_plugins_repo   = "https://github.com/vegcom/netbox-plugins-bundled.git"
netbox_plugins_branch = "main"
netbox_plugins_token  = "github_pat_xxxxx"

LDAP Integration

# FreeIPA LDAP connection details
freeipa_ldap_base_dn = "dc=ipa,dc=example,dc=org"
freeipa_bind_dn      = "uid=admin,cn=users,cn=accounts,dc=ipa,dc=example,dc=org"
freeipa_bind_password = "secure-password"

Configuration Examples

Example 1: Production Deployment

# Production-ready configuration
domain_name = "mycompany.org"

# All core services enabled
headscale_enabled = true
netbox_enabled    = true
freeipa_enabled   = true
grafana_enabled   = true
loki_enabled      = true

# Production resource limits
netbox_memory_limit  = "4Gi"
freeipa_memory_limit = "4Gi"
grafana_memory_limit = "2Gi"

# Large storage allocations
netbox_storage  = "50Gi"
freeipa_storage = "20Gi"
loki_storage    = "100Gi"

# Security: All services use Tailscale VPN
netbox_tailscale_enabled   = true
redmine_tailscale_enabled  = true
grafana_tailscale_enabled  = true

Example 2: Development Environment

# Minimal development setup
domain_name = "dev.example.org"

# Only essential services
headscale_enabled = true
netbox_enabled    = true
grafana_enabled   = false
freeipa_enabled   = false

# Minimal resource usage
netbox_memory_request = "256Mi"
netbox_memory_limit   = "1Gi"

# Small storage
netbox_storage = "5Gi"

# No VPN overhead
netbox_tailscale_enabled = false

Example 3: AI/ML Focused

# AI inference cluster
domain_name = "ai.example.org"

# Minimal infrastructure
headscale_enabled = true
netbox_enabled    = false
freeipa_enabled   = false

# AI services enabled
openwebui_enabled = true
ollama_enabled    = true
vllm_enabled      = true

# GPU resources
vllm_gpu_count = 2
vllm_memory_limit = "16Gi"

# Large model storage
ollama_storage = "100Gi"
vllm_storage   = "200Gi"

Troubleshooting Configuration

Common Issues

Issue: Service not deploying

  • Check <service>_enabled = true in terraform.tfvars
  • Verify required dependencies are enabled (e.g., Headscale for VPN services)

Issue: Out of memory errors

  • Increase <service>_memory_limit
  • Check cluster available resources: kubectl top nodes

Issue: Persistent volume claim pending

  • Verify storage class exists: kubectl get storageclass
  • Check PVC status: kubectl get pvc -A

Issue: Certificate provisioning fails

  • Verify cloudflare_api_token is correct
  • Check cert-manager logs: kubectl logs -n cert-manager deploy/cert-manager

Related Documentation


Navigation: Documentation Index | Home