Add optional per-pod rollout delay #354

saryani · 2025-12-17T14:30:39Z

Add optional per-pod rollout delay for StatefulSets

Summary

This PR adds an optional, per-StatefulSet rollout delay that slows down pod restarts within a single StatefulSet, without changing maxUnavailable semantics or the existing rollout safety guarantees.

Background / Problem

This feature is motivated by issues we’ve seen running rollout-operator in a large, multi‑AZ Mimir deployment which affects Memcached, but it is not specific to Mimir or Memcached. Any workload that depends heavily on a cache (or other warm‑up–sensitive service) can suffer when many pods restart too quickly, even if maxUnavailable/PDB settings are respected.

As we know Memcached pods run with a sidecar that scrapes Memcached metrics; that sidecar is sometimes updated as part of broader changes to the Mimir/Loki/Traces stack. When that sidecar (or other shared components) is updated, the Memcached StatefulSet can be rolled, causing its pods to restart.

Each restart wipes in‑memory cache state, so pods come back “cold”. If several Memcached pods restart in rapid succession, we see a prolonged low cache‑hit period, and significant slowdowns in all components that rely on Memcached during cache warm‑up.

rollout-operator already ensures we stay within a safe number of unavailable pods, but it doesn’t control how quickly pods within that budget are recycled. For cache-heavy workloads, having a configurable delay between pod restarts allows existing pods to serve traffic and warm newly restarted ones before the next disruption, smoothing out the rollout impact.

Usage

To enable the feature for a particular StatefulSet (e.g. Memcached):

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: memcached
  labels:
    rollout-group: mimir
    rollout-delay: "30s"   # new label

Expected behaviour

Without the label (or with an invalid value), rollouts behave exactly as before.
With the label, the operator will:
- Still respect maxUnavailable and/or other constraints and strategies.
- Introduce a 30s delay between pod terminations in the same StatefulSet.

Add optional per-pod rollout delay Updated README.me explaining the new feature

CLAassistant · 2025-12-17T14:30:48Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

56quarters

I don't think this is something that should be added to the rollout-operator. We've solved this on Mimir by using the minReadySeconds setting which does exactly this for Memcached statefulsets and have been using it for a few months.

Jsonnet example: grafana/mimir#12938

Helm example: grafana/mimir#13495

Docs: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#minimum-ready-seconds

saryani and others added 4 commits December 15, 2025 14:33

Add optional per-pod rollout delay

a43e83c

update README.md

f5d8eb7

drop ununsed comments

47a7e9d

Merge pull request #1 from saryani/feature/pod-rollout-delay

547295a

Add optional per-pod rollout delay Updated README.me explaining the new feature

56quarters requested changes Dec 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional per-pod rollout delay #354

Add optional per-pod rollout delay #354

Uh oh!

saryani commented Dec 17, 2025

Uh oh!

CLAassistant commented Dec 17, 2025

Uh oh!

56quarters left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add optional per-pod rollout delay #354

Are you sure you want to change the base?

Add optional per-pod rollout delay #354

Uh oh!

Conversation

saryani commented Dec 17, 2025

Add optional per-pod rollout delay for StatefulSets

Summary

Background / Problem

Usage

Expected behaviour

Uh oh!

CLAassistant commented Dec 17, 2025

Uh oh!

56quarters left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

56quarters left a comment •

edited

Loading