Skip to content

fix: prevent StatefulSet infinite reconcile restart loop #132

Open
brightsparc wants to merge 2 commits intoClickHouse:mainfrom
introspection-org:julian/fix-desired-state
Open

fix: prevent StatefulSet infinite reconcile restart loop #132
brightsparc wants to merge 2 commits intoClickHouse:mainfrom
introspection-org:julian/fix-desired-state

Conversation

@brightsparc
Copy link
Contributor

@brightsparc brightsparc commented Mar 14, 2026

Fixes #131

Why

The KeeperCluster and ClickHouseCluster reconcilers enter an infinite restart loop because the desired StatefulSet spec omits fields that the Kubernetes API server fills with defaults. On each reconcile, DeepHashObject sees a diff between the desired state (nil/zero values) and the actual state (K8s-defaulted values), concludes config has changed, and force-restarts the pod via the restartedAt annotation. This annotation change itself creates another diff on the next reconcile, producing a self-reinforcing ~30s restart loop that prevents the keeper from ever stabilising.

What

Explicitly set K8s-defaulted fields in the desired StatefulSet and PodSpec so they match what the API server returns:

  • PodSpec: default terminationGracePeriodSeconds to 30, schedulerName to "default-scheduler", and securityContext to &PodSecurityContext{} when not specified by the user
  • StatefulSet: set rollingUpdate.partition to 0, rollingUpdate.maxUnavailable to 1, and persistentVolumeClaimRetentionPolicy to Retain/Retain
  • Probes: set successThreshold: 1 on DefaultLivenessProbeSettings (the K8s default — was zero-valued, causing a diff against the API server's defaulted 1)

Applied to both keeper and clickhouse template generators. Added 8 unit tests covering each defaulted field.

brightsparc and others added 2 commits March 13, 2026 19:36
Extract terminationGracePeriodSeconds default (30) to a named constant
in each package's constants.go. Add blank lines between if-blocks and
subsequent assignments to satisfy wsl_v5.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@GrigoryPervakov GrigoryPervakov self-assigned this Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KeeperCluster StatefulSet infinite restart loop due to K8s API server default field drift

2 participants