Skip to content

Commit d20471e

Browse files
committed
chore: update README
Signed-off-by: Michael Fornaro <20387402+xUnholy@users.noreply.github.com>
1 parent be884ce commit d20471e

File tree

2 files changed

+234
-75
lines changed

2 files changed

+234
-75
lines changed

CLAUDE.md

Lines changed: 224 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -12,29 +12,45 @@ This is a Kubernetes GitOps repository for a personal homelab cluster managed wi
1212
- **GitOps**: FluxCD with Flux Operator for declarative cluster management
1313
- **Container Runtime**: containerd
1414
- **Networking**: Cilium CNI with Istio service mesh
15-
- **Storage**: OpenEBS for container-attached storage
16-
- **Monitoring**: Prometheus, Grafana, Loki, Jaeger for observability
17-
- **Security**: Kyverno for policy management, Falco for runtime security
15+
- **Storage**: Rook-Ceph, OpenEBS, democratic-csi for container-attached storage
16+
- **Monitoring**: Prometheus, Grafana, Loki, Jaeger, Thanos for observability
17+
- **Security**: Kyverno, OPA Gatekeeper for policy management, Falco & Tetragon for runtime security
1818
- **Load Balancing**: MetalLB for bare metal load balancing
19+
- **Chaos Engineering**: Litmus for chaos testing
1920

2021
## Directory Structure
2122

2223
```
23-
├── kubernetes/ # Kubernetes manifests and configurations
24-
│ ├── apps/ # Application deployments (base + overlays)
25-
│ ├── bootstrap/ # Initial cluster bootstrap configuration
26-
│ ├── clusters/ # Per-cluster configurations
27-
│ ├── components/ # Shared components and alerts
28-
│ └── tenants/ # Multi-tenant configurations
29-
├── talos/ # Talos Linux configuration files
30-
│ ├── generated/ # Generated Talos configs (encrypted)
31-
│ ├── integrations/ # Integration configurations
32-
│ └── patches/ # Talos configuration patches
33-
├── terraform/ # Infrastructure as Code
34-
│ ├── cloudflare/ # Cloudflare DNS/CDN configuration
35-
│ └── gcp/ # Google Cloud Platform resources
36-
├── .taskfiles/ # Task automation definitions
37-
└── docs/ # Documentation
24+
├── kubernetes/ # Kubernetes manifests and configurations
25+
│ ├── apps/
26+
│ │ ├── base/ # Base application configurations (DRY principle)
27+
│ │ │ └── [system-name]/ # e.g., observability, kube-system, home-system
28+
│ │ │ ├── [app-name]/
29+
│ │ │ │ ├── app/ # HelmRelease, OCIRepository, secrets, values
30+
│ │ │ │ └── ks.yaml # Flux Kustomization with dependencies
31+
│ │ │ ├── namespace.yaml
32+
│ │ │ └── kustomization.yaml
33+
│ │ └── overlays/
34+
│ │ └── cluster-00/ # Cluster-specific overrides
35+
│ ├── bootstrap/
36+
│ │ └── helmfile.yaml # Bootstrap Flux Operator and dependencies
37+
│ ├── clusters/
38+
│ │ └── cluster-00/
39+
│ │ ├── flux-system/ # Flux Operator and FluxInstance configs
40+
│ │ ├── secrets/ # Cluster secrets (SOPS encrypted)
41+
│ │ └── ks.yaml # Root Kustomization
42+
│ ├── components/
43+
│ │ └── common/alerts/ # Shared monitoring alerts
44+
│ └── tenants/ # Multi-tenant configurations
45+
├── talos/ # Talos Linux configuration files
46+
│ ├── generated/ # Generated Talos configs (encrypted)
47+
│ ├── integrations/ # Cilium, cert-approver integrations
48+
│ └── patches/ # iSCSI, metrics patches
49+
├── terraform/ # Infrastructure as Code
50+
│ ├── cloudflare/ # Cloudflare DNS/CDN configuration
51+
│ └── gcp/ # GCP KMS, Thanos storage, Velero backups
52+
├── .taskfiles/ # Task automation definitions
53+
└── docs/ # Documentation
3854
```
3955

4056
## Common Commands
@@ -43,17 +59,29 @@ This is a Kubernetes GitOps repository for a personal homelab cluster managed wi
4359
The repository uses [Task](https://taskfile.dev) for automation. All commands should be run via `task`:
4460

4561
```bash
46-
# Core FluxCD operations
47-
task flux:bootstrap # Bootstrap FluxCD in the cluster
48-
task flux:secrets # Install cluster secrets and configs
62+
# FluxCD Operations
63+
task flux:bootstrap # Bootstrap Flux Operator via Helmfile
64+
task flux:secrets # Install cluster secrets (SOPS decrypt + apply)
65+
task fluxcd:bootstrap # Alternative bootstrap path
66+
task fluxcd:diff # Preview FluxCD operator changes
4967

50-
# Talos operations
51-
task talos:config # Decrypt and load Talos config
68+
# Talos Operations
69+
task talos:config # Decrypt and load talosconfig to ~/.talos/config
70+
71+
# Core Operations
72+
task core:gpg # Import SOPS PGP keys
73+
task core:lint # Run yamllint
5274

5375
# View available tasks
5476
task --list
5577
```
5678

79+
**Important Variables:**
80+
- `CLUSTER`: cluster-00 (default cluster ID)
81+
- `GITHUB_USER`: xunholy
82+
- `GITHUB_REPO`: k8s-gitops
83+
- `GITHUB_BRANCH`: main
84+
5785
### Pre-commit Hooks
5886
The repository uses pre-commit for code quality:
5987
```bash
@@ -67,72 +95,200 @@ Active hooks include:
6795
- Trailing whitespace and EOF fixes
6896

6997
### Secret Management
70-
Secrets are encrypted using [SOPS](https://github.com/mozilla/sops):
98+
Secrets are encrypted using [SOPS](https://github.com/mozilla/sops) with dual encryption (PGP + GCP KMS):
7199
```bash
72-
# Decrypt secrets (requires proper age key setup)
73-
sops -d path/to/encrypted.yaml
100+
# Edit encrypted files (automatically decrypts/encrypts)
101+
sops path/to/file.enc.yaml
74102

75-
# Edit encrypted files
76-
sops path/to/encrypted.yaml
103+
# Decrypt for viewing only
104+
sops -d path/to/file.enc.yaml
77105
```
78106

107+
**SOPS Configuration:**
108+
- **PGP Key**: `0635B8D34037A9453003FB7B93CAA682FF4C9014`
109+
- **Age Key**: `age19gj66fq5v2veu940ftyj4pkw0w5tgxgddlyqnd00pnjzyndevurqx70g4t`
110+
- **GCP KMS**: Used for stored PGP keys
111+
- Encrypted files use `.enc.yaml` or `.enc.age.yaml` suffix
112+
79113
## Key Technologies & Patterns
80114

81115
### GitOps with FluxCD
82-
- **Flux Operator**: Manages FluxCD installation via FluxInstance CRDs
83-
- **Kustomizations**: Define how to apply Kubernetes manifests
84-
- **HelmReleases**: Manage Helm chart deployments
85-
- **GitRepository/OCIRepository**: Source definitions for manifests
116+
This repository uses **Flux Operator** instead of traditional `flux bootstrap`:
117+
- **FluxInstance CRDs**: Declaratively manage FluxCD components
118+
- **OCIRepository**: Used for Helm charts instead of HelmRepository (e.g., `oci://ghcr.io/prometheus-community/charts`)
119+
- **Kustomizations**: Define manifest application with SOPS decryption, post-build substitution, and dependency chains
120+
- **HelmReleases**: Reference charts via `chartRef` pointing to OCIRepository
121+
- **Root Kustomization**: Located at `kubernetes/clusters/cluster-00/ks.yaml`
122+
123+
### Application Deployment Pattern
124+
Each application follows this structure:
125+
1. **Base configuration** in `kubernetes/apps/base/[system-name]/[app-name]/`:
126+
- `app/helmrelease.yaml`: Helm release definition
127+
- `app/ocirepository.yaml`: Chart source
128+
- `app/secret.enc.yaml`: Encrypted secrets
129+
- `app/values.yaml`: Helm values
130+
- `ks.yaml`: Flux Kustomization with `dependsOn`, SOPS settings, substitutions
131+
132+
2. **Cluster overlays** in `kubernetes/apps/overlays/cluster-00/`: Cluster-specific customizations using Kustomize patches
86133

87-
### Cluster Configuration
88-
- **Bootstrap**: Initial cluster setup in `kubernetes/bootstrap/`
89-
- **Apps**: Application deployments with base configurations and cluster-specific overlays
90-
- **Components**: Shared components like monitoring alerts
91-
- **Tenants**: Multi-tenant namespace configurations
134+
3. **System categories**: Apps organized into logical systems:
135+
- `kube-system`: Core Kubernetes (Cilium, metrics-server, reflector)
136+
- `network-system`: Networking (cert-manager, external-dns, oauth2-proxy, dex)
137+
- `observability`: Monitoring (Prometheus, Grafana, Loki, Jaeger, Thanos)
138+
- `security-system`: Security (Kyverno, Falco, Gatekeeper, Crowdsec)
139+
- `istio-system` & `istio-ingress`: Service mesh
140+
- `home-system`: Home automation & media
141+
- `rook-ceph`: Storage
142+
143+
### HelmRelease Global Defaults
144+
All HelmReleases are patched with these defaults via Kustomization:
145+
```yaml
146+
install:
147+
crds: CreateReplace
148+
createNamespace: true
149+
replace: true
150+
strategy: RetryOnFailure
151+
timeout: 10m
152+
rollback:
153+
recreate: true
154+
force: true
155+
cleanupOnFail: true
156+
upgrade:
157+
cleanupOnFail: true
158+
crds: CreateReplace
159+
remediation:
160+
remediateLastFailure: true
161+
retries: 3
162+
strategy: rollback
163+
```
92164
93165
### Security Practices
94-
- All secrets encrypted with SOPS using age encryption
95-
- Kyverno policies enforce security standards
96-
- Falco provides runtime security monitoring
97-
- Talos Linux provides immutable, minimal attack surface
166+
- **Dual encryption**: SOPS with PGP (primary) + GCP KMS backup
167+
- **Never commit unencrypted secrets**: All secrets use `.enc.yaml` suffix
168+
- **Policy enforcement**: Kyverno & OPA Gatekeeper
169+
- **Runtime security**: Falco & Tetragon
170+
- **Pod security labels**: Applied to all namespaces
171+
- **Immutable OS**: Talos Linux minimal attack surface
98172

99173
## Development Workflow
100174

101-
1. **Making Changes**:
102-
- Edit YAML manifests in appropriate directories
103-
- Ensure proper directory structure (base + overlays pattern)
104-
- Follow existing naming conventions
175+
### Bootstrap New Cluster
176+
```bash
177+
# 1. Set environment variables (CLUSTER_ID defaults to cluster-00)
178+
# 2. Bootstrap Flux Operator
179+
task fluxcd:bootstrap # Installs flux-operator, flux-instance, cert-manager, kustomize-mutating-webhook
180+
181+
# 3. Install cluster secrets
182+
task flux:secrets # Decrypts and applies sops-gpg, sops-age, cluster-secrets, github-auth, cluster-config
183+
184+
# 4. Configure Talos
185+
task talos:config # Decrypts talosconfig to ~/.talos/config
186+
```
187+
188+
### Making Changes to Applications
189+
1. **Edit base configuration** in `kubernetes/apps/base/[system-name]/[app-name]/`
190+
2. **Use overlays** for cluster-specific customization in `kubernetes/apps/overlays/cluster-00/`
191+
3. **Follow naming conventions**:
192+
- `ks.yaml`: Flux Kustomization resources
193+
- `kustomization.yaml`: Kustomize configuration
194+
- `*.enc.yaml`: SOPS encrypted files
195+
- `helmrelease.yaml`: Helm release definitions
196+
- `ocirepository.yaml`: OCI repository sources
197+
4. **Ensure secrets are encrypted** before committing (use `sops` command)
198+
5. **Run pre-commit hooks**: `pre-commit run --all-files`
199+
6. **FluxCD auto-reconciles** from main branch after push
200+
201+
### Adding New Applications
202+
1. Create directory structure: `kubernetes/apps/base/[system-name]/[app-name]/`
203+
2. Add `app/` directory with:
204+
- `helmrelease.yaml` (with `chartRef` to OCIRepository)
205+
- `ocirepository.yaml` (chart source)
206+
- `values.yaml` (Helm values)
207+
- `secret.enc.yaml` (if needed, encrypted with SOPS)
208+
- `kustomization.yaml`
209+
3. Create `ks.yaml` with:
210+
- `dependsOn` for dependency chain
211+
- `decryption` for SOPS secrets
212+
- `postBuild.substituteFrom` for ConfigMap/Secret references
213+
4. Add to parent `kustomization.yaml`
214+
5. Create overlay if cluster-specific customization needed
105215

106-
2. **Testing**:
107-
- Use `task` commands to validate configurations
108-
- Run pre-commit hooks before committing
109-
- FluxCD will automatically reconcile changes after push
216+
## Important Patterns & Conventions
110217

111-
3. **Secrets Management**:
112-
- Never commit unencrypted secrets
113-
- Use SOPS for any sensitive data
114-
- Reference encrypted secrets in `.sops.yaml`
218+
### File Naming
219+
- `ks.yaml`: Flux Kustomization resources (defines how to apply manifests)
220+
- `kustomization.yaml`: Kustomize configuration (defines what resources to include)
221+
- `*.enc.yaml`: SOPS-encrypted with PGP
222+
- `*.enc.age.yaml`: SOPS-encrypted with Age
223+
- `helmfile.yaml`: Helmfile configurations (used in bootstrap)
224+
- `helmrelease.yaml`: Helm release definitions
225+
- `ocirepository.yaml`: OCI repository sources for Helm charts
226+
- `namespace.yaml`: Namespace definitions with pod security labels
115227

116-
## File Patterns to Understand
228+
### Kustomization Labels
229+
- `substitution.flux/enabled=true`: Enables SOPS decryption and variable substitution
230+
- Patches applied globally to all Kustomizations for HelmRelease defaults
117231

118-
- `kustomization.yaml`: Kustomize configuration files
119-
- `*.enc.yaml`: SOPS-encrypted files
120-
- `helmfile.yaml`: Helmfile configurations for chart management
121-
- `app/`: Directory containing application-specific configurations
122-
- `resources/`: Directory for Kubernetes resource definitions
232+
### Namespace Conventions
233+
Labels applied to namespaces:
234+
- `pod-security.kubernetes.io/enforce: privileged` (or `restricted`/`baseline`)
235+
- `goldilocks.fairwinds.com/enabled: "true"` (monitoring)
236+
- `kustomize.toolkit.fluxcd.io/prune: disabled` (on flux-system)
237+
238+
### Dependency Management
239+
Flux Kustomizations use `dependsOn` to establish deployment order:
240+
```yaml
241+
dependsOn:
242+
- name: cert-manager
243+
namespace: flux-system
244+
```
123245

124246
## Important Notes
125247

126-
- The cluster uses cluster ID "cluster-00" as the default
127-
- Talos config is stored encrypted in `talos/generated/`
128-
- FluxCD manages all application deployments automatically
129-
- Changes to `main` branch trigger automatic reconciliation
130-
- The repository follows enterprise GitOps patterns suitable for production use
248+
- **Cluster ID**: "cluster-00" is the default cluster identifier
249+
- **Branch**: `main` is the primary branch (auto-reconciled by FluxCD)
250+
- **Talos configs**: Stored encrypted in `talos/generated/`
251+
- **Bootstrap method**: Uses Flux Operator (not traditional `flux bootstrap`)
252+
- **Chart sources**: Uses OCIRepository instead of HelmRepository
253+
- **Yamllint config**: Line length warning at 240 characters, 2-space indentation
254+
- **Renovate automation**: Auto-merge enabled for digests, ignores encrypted files
255+
- **Multi-cluster ready**: Designed with overlay pattern for multiple clusters
256+
- **Enterprise patterns**: Production-grade GitOps implementation showcasing CNCF ecosystem
131257

132258
## External Dependencies
133259

134-
- **Cloudflare**: DNS and CDN services
135-
- **Google Cloud Platform**: OAuth, backup storage
136-
- **GitHub**: Source control and authentication
137-
- **SOPS/age**: Secret encryption (requires age key setup)
260+
- **Cloudflare**: DNS management and CDN services
261+
- **Google Cloud Platform**:
262+
- GCP KMS for SOPS encryption
263+
- Google Cloud Storage for Thanos long-term metrics storage
264+
- Google Cloud Storage for Velero backups
265+
- OAuth for authentication
266+
- **GitHub**: Source control, authentication, and OCI registry for Helm charts
267+
- **SOPS/age**: Secret encryption (requires PGP and/or age key setup)
138268
- **Task**: Task runner (must be installed locally)
269+
- **Helmfile**: Used for bootstrap process
270+
- **Let's Encrypt**: Certificate generation for secure communication
271+
- **NextDNS**: Malware protection and ad-blocking
272+
- **UptimeRobot**: Service monitoring
273+
274+
## Troubleshooting with Flux MCP
275+
276+
This repository includes Cursor rules for troubleshooting Flux resources using the `flux-operator-mcp` tools. Key troubleshooting workflows:
277+
278+
### Analyzing HelmReleases
279+
1. Check helm-controller status with `get_flux_instance`
280+
2. Get HelmRelease resource and analyze spec, status, inventory, events
281+
3. Check `valuesFrom` ConfigMaps and Secrets
282+
4. Verify source (OCIRepository) status
283+
5. Analyze managed resources from inventory
284+
6. Check logs if resources are failing
285+
286+
### Analyzing Kustomizations
287+
1. Check kustomize-controller status with `get_flux_instance`
288+
2. Get Kustomization resource and analyze spec, status, inventory, events
289+
3. Check `substituteFrom` ConfigMaps and Secrets
290+
4. Verify source (GitRepository/OCIRepository) status
291+
5. Analyze managed resources from inventory
292+
293+
### Comparing Resources Across Clusters
294+
Use `get_kubernetes_contexts` and `set_kubernetes_context` to switch between clusters, then compare resource specs and status.

README.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -52,13 +52,16 @@ This repository leverages a range of cutting-edge open-source tools and platform
5252

5353
## 🔧 Hardware
5454

55-
| Device | Description | Quantity | CPU | RAM | Architecture | Operating System | Notes |
56-
| -------------------------------------------------------------------------------------- | ------------------------ | -------- | ------- | -------- | ------------ | ------------------------------------- | ----- |
57-
| [Protectli FW6E](https://protectli.com/product/fw6e/) | Router | 1 | 4 Cores | 16GB RAM | AMD64 | [VyOs](https://vyos.io/) | |
58-
| [Protectli VP2410](https://protectli.com/product/vp2410/) | Kubernetes Control Plane | 3 | 4 Cores | 8GB RAM | AMD64 | [Talos Linux](https://www.talos.dev/) | |
59-
| [Protectli FW2B](https://protectli.com/product/fw2b/) | Kubernetes Node(s) | 3 | 2 Cores | 8GB RAM | AMD64 | [Talos Linux](https://www.talos.dev/) | |
60-
| [Raspberry Pi 4 Model B](https://www.raspberrypi.org/products/raspberry-pi-4-model-b/) | Kubernetes Node(s) | 4 | 4 Cores | 8GB RAM | ARM64 | [Talos Linux](https://www.talos.dev/) | Decommisioned |
61-
| [Rock Pi 4 Model C](https://rockpi.org/rockpi4#) | Kubernetes Node(s) | 6 | 4 Cores | 4GB RAM | ARM64 | [Talos Linux](https://www.talos.dev/) | Decommisioned |
55+
| Device | Description | Quantity | CPU | RAM | Storage | Architecture | Operating System | Notes |
56+
| -------------------------------------------------------------------------------------- | ------------------ | -------- | -------- | ----- | --------------------------- | ------------ | ------------------------------------- | -------------- |
57+
| [Ubiquiti UDM-Pro-Max](https://ui.com/us/en/cloud-gateways/dream-machine-pro-max) | Router/Gateway | 1 | - | - | 8TB | - | UniFi OS | |
58+
| [Ubiquiti USW-Pro-Max-48-PoE](https://ui.com/switching/pro-max-48-poe) | Network Switch | 1 | - | - | - | - | UniFi OS | 48-port PoE |
59+
| [Asus NUC 14 Pro](https://www.asus.com/displays-desktops/nucs/nuc-mini-pcs/asus-nuc-14-pro/) | Kubernetes Cluster | 3 | 14 Cores | 48GN | 1TB NVMe + 1TB SSD | AMD64 | [Talos Linux](https://www.talos.dev/) | Ultra 5-125H |
60+
| [Protectli FW6E](https://protectli.com/product/fw6e/) | Router | 1 | 4 Cores | 16GB | - | AMD64 | [VyOs](https://vyos.io/) | Decommissioned |
61+
| [Protectli VP2410](https://protectli.com/product/vp2410/) | Kubernetes Node(s) | 3 | 4 Cores | 8GB | - | AMD64 | [Talos Linux](https://www.talos.dev/) | Decommissioned |
62+
| [Protectli FW2B](https://protectli.com/product/fw2b/) | Kubernetes Node(s) | 3 | 2 Cores | 8GB | - | AMD64 | [Talos Linux](https://www.talos.dev/) | Decommissioned |
63+
| [Raspberry Pi 4 Model B](https://www.raspberrypi.org/products/raspberry-pi-4-model-b/) | Kubernetes Node(s) | 4 | 4 Cores | 8GB | - | ARM64 | [Talos Linux](https://www.talos.dev/) | Decommissioned |
64+
| [Rock Pi 4 Model C](https://rockpi.org/rockpi4#) | Kubernetes Node(s) | 6 | 4 Cores | 4GB | - | ARM64 | [Talos Linux](https://www.talos.dev/) | Decommissioned |
6265

6366
## ☁️ Cloud Services
6467

0 commit comments

Comments
 (0)