Splunk Operator: CPU changes break total cluster capacity + rolling updates are too slow

### Please select the type of request

Enhancement

### Tell us more

**Describe the request**

So here's the problem - when you change CPU requests on StatefulSet pods (indexers, SHs, etc), the operator just blindly applies the change without caring about what that does to your total cluster capacity.

If you double CPU per pod? Congrats, you're now using 2x total CPU. Halve it? You've just cut your capacity in half. Neither is what you probably wanted.

This is really annoying for license-based deployments or when you're trying to optimize costs. Every time you wanna resize pods, you have to manually recalculate replicas. It's tedious and error-prone.

Also, rolling updates on large clusters are painfully slow since pods update one at a time. With 50+ replicas, you're sitting around forever waiting for updates to finish.

**Expected behavior**

Would be nice to have:

1. **CPU-aware scaling** - operator should auto-adjust replica count when CPU per pod changes, so total CPU stays constant. Like if I go from 10 pods @ 4 CPU to 8 CPU per pod, just give me 5 pods instead of 10 pods @ 8 CPU.

2. **Parallel pod updates** - let me configure how many pods update at once. Either as a percentage (25% at a time) or absolute number (3 at a time). Default can stay at 1 for backward compat.

**Splunk setup on K8S**

Happens with any StatefulSet-based component - indexers, search heads, cluster manager, etc.

**Reproduction/Testing steps**

For the CPU issue:
1. Deploy indexer cluster with 10 replicas @ 4 CPU each (40 total CPU)
2. Update CPU request to 8 per pod
3. Watch as you now have 10 pods @ 8 CPU = 80 total CPU
4. Your cloud bill just doubled

For slow updates:
1. Deploy a large cluster (20+ replicas)
2. Change image or any pod template config
3. Go make coffee. Then another coffee. Then another one.
4. Still updating one pod at a time...

**K8s environment**

Any K8s 1.19+. Happens everywhere.

**Proposed changes (optional)**

Could use annotations to opt-in:
- Something like `operator.splunk.com/preserve-total-cpu: "true"` for CPU awareness
- Something like `operator.splunk.com/parallel-pod-updates: "0.25"` for 25% at a time

**Additional context**

This would be super helpful for:
- Cost optimization when resizing pods without changing overall footprint
- Large cluster maintenance where you don't want to wait all day for rolling updates

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Splunk Operator: CPU changes break total cluster capacity + rolling updates are too slow #1645

Please select the type of request

Tell us more

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Splunk Operator: CPU changes break total cluster capacity + rolling updates are too slow #1645

Description

Please select the type of request

Tell us more

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions