Skip to content

Conversation

@saryani
Copy link

@saryani saryani commented Dec 17, 2025

Add optional per-pod rollout delay for StatefulSets

Summary

This PR adds an optional, per-StatefulSet rollout delay that slows down pod restarts within a single StatefulSet, without changing maxUnavailable semantics or the existing rollout safety guarantees.

Background / Problem

This feature is motivated by issues we’ve seen running rollout-operator in a large, multi‑AZ Mimir deployment which affects Memcached, but it is not specific to Mimir or Memcached. Any workload that depends heavily on a cache (or other warm‑up–sensitive service) can suffer when many pods restart too quickly, even if maxUnavailable/PDB settings are respected.

As we know Memcached pods run with a sidecar that scrapes Memcached metrics; that sidecar is sometimes updated as part of broader changes to the Mimir/Loki/Traces stack. When that sidecar (or other shared components) is updated, the Memcached StatefulSet can be rolled, causing its pods to restart.

Each restart wipes in‑memory cache state, so pods come back “cold”. If several Memcached pods restart in rapid succession, we see a prolonged low cache‑hit period, and significant slowdowns in all components that rely on Memcached during cache warm‑up.

rollout-operator already ensures we stay within a safe number of unavailable pods, but it doesn’t control how quickly pods within that budget are recycled. For cache-heavy workloads, having a configurable delay between pod restarts allows existing pods to serve traffic and warm newly restarted ones before the next disruption, smoothing out the rollout impact.

Usage

To enable the feature for a particular StatefulSet (e.g. Memcached):

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: memcached
  labels:
    rollout-group: mimir
    rollout-delay: "30s"   # new label

Expected behaviour

  • Without the label (or with an invalid value), rollouts behave exactly as before.
  • With the label, the operator will:
    • Still respect maxUnavailable and/or other constraints and strategies.
    • Introduce a 30s delay between pod terminations in the same StatefulSet.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@56quarters 56quarters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is something that should be added to the rollout-operator. We've solved this on Mimir by using the minReadySeconds setting which does exactly this for Memcached statefulsets and have been using it for a few months.

Jsonnet example: grafana/mimir#12938

Helm example: grafana/mimir#13495

Docs: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#minimum-ready-seconds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants