Robin stuck in infinite loop after purgeKeysOnRebalance scale-down

## Description

There is a test to reproduce it:

```
  make test-chaos GINKGO_EXTRA_OPTS='--focus="recovers when Robin ConfigMap has stale primaries from failed scale-down"'
```

When a RedkeyCluster with `purgeKeysOnRebalance=true` is scaled down (e.g. from 8 to 3 primaries), Robin can enter a permanent loop trying to reach "ghost nodes" — pods that no longer exist because the StatefulSet was recreated with fewer replicas. The cluster never recovers without manual intervention.

**Root cause**: The operator does not update Robin's ConfigMap (`redis-cluster-robin`) when scaling down with `purgeKeysOnRebalance=true`. The ConfigMap retains the old `primaries: 8` value while the CR specifies 3 primaries and the StatefulSet has only 3 replicas. Robin faithfully tries to reach 8 nodes because that's what its configuration says, but pods 3-7 don't exist.

## Steps to Reproduce

1. Create a RedkeyCluster with `purgeKeysOnRebalance=true` and e.g. 6 primaries
2. Scale up to 8 primaries (StatefulSet is deleted and recreated with 8 replicas)
3. While scaling/integrity-check is still in progress, delete some pods and scale down to 3 primaries
4. The StatefulSet is recreated with 3 replicas (pods 0-2 exist, pods 3-7 do not)

## Root Cause: Robin ConfigMap Not Updated

The operator fails to update the Robin ConfigMap before recreating the StatefulSet. Evidence from the stuck cluster:

```yaml
# kubectl get configmaps redis-cluster-robin -o json | jq -r '.data["application-configmap.yml"]'
metadata:
    namespace: chaos-5-bc6m5
redis:
    standalone: false
    reconciler:
        interval_seconds: 30
        operation_cleanup_interval_seconds: 30
    cluster:
        namespace: chaos-5-bc6m5
        name: redis-cluster
        primaries: 8           # <-- CR says 3, StatefulSet has 3 replicas
        replicas_per_primary: 0
        status: ScalingUp      # <-- operator never updated this either
        ephemeral: true
        health_probe_interval_seconds: 60
        healing_time_seconds: 60
        max_retries: 10
        back_off: 10s
    metrics:
        interval_seconds: 60
        redis_info_keys: []
```

The CR specifies `primaries: 3` and the StatefulSet was recreated with 3 replicas, but the Robin ConfigMap still shows `primaries: 8` and `status: ScalingUp`. Robin is doing exactly what it was told — trying to bring up 8 nodes — but 5 of them don't exist.

The operator likely updates the ConfigMap in a code path that is skipped or fails silently during the `purgeKeysOnRebalance=true` scale-down flow, particularly when a conflicting Robin operation (e.g. `CheckIntegrity`) is in progress at the time of the scale request.

## Observed Behavior

Robin retains `redis-cluster-3` through `redis-cluster-7` in its target node list (because its ConfigMap says `primaries: 8`) despite only 3 pods existing. It enters an infinite cycle:

1. Robin tries to initialize each ghost node sequentially, each failing with `failed to connect after 10 retries`
2. Robin attempts `CLUSTER MEET` with ghost nodes, which fails with `ERR Invalid node address specified: :6379` (empty IP since pod doesn't exist)
3. Robin alternates between `ScalingUp`, `ScalingUpError`, and `CheckingIntegrity` status
4. The operator is stuck in `ScalingDown` / `EndingFastScaling`, polling Robin every 30 seconds
5. The 3 running Redis pods have `cluster_slots_assigned:0` — slots were never distributed after recreation

Additionally, when the operator asks Robin to recreate the cluster during an ongoing `CheckIntegrity` operation, Robin responds: `Cluster cannot be recreated right now due to conflicting operation` (operation=CheckIntegrity, status=Running).

The CR `status.nodes` map still references old node IDs and IPs from before the StatefulSet recreation, while the actual running nodes have new IDs. Robin detected the ID/IP changes but was unable to complete the reconciliation.

This loop continues indefinitely (observed running 3+ hours with no recovery).

## Expected Behavior

The operator must update the Robin ConfigMap with the correct `primaries` count **before** (or at the same time as) recreating the StatefulSet during a `purgeKeysOnRebalance=true` scale-down. This ensures Robin targets the correct number of nodes.

Robin should also be resilient to stale configuration: when pods in its target list don't exist, it should detect the mismatch and either reload its configuration or report a clear error rather than retrying indefinitely.

## Environment

- Discovered during chaos testing with `purgeKeysOnRebalance=true`
- Namespace: `chaos-5-bc6m5` (chaos test run)
- Chaos test configuration: 10 iterations, scaling between 3-8 primaries with concurrent pod deletions and operator restarts
- Failure occurred at iteration 4 after scaling 8→3 primaries while pods were being deleted

## Relevant Logs

### Robin ConfigMap (stale)

```yaml
primaries: 8    # should be 3
status: ScalingUp
```

### Robin logs (ghost node connection failures)

```
Error initializing node redis-cluster-7: "failed to connect after 10 retries"
Error initializing node redis-cluster-6: "failed to connect after 10 retries"
Error initializing node redis-cluster-5: "failed to connect after 10 retries"
Error initializing node redis-cluster-4: "failed to connect after 10 retries"
Error initializing node redis-cluster-3: "failed to connect after 10 retries"
```

### Robin logs (CLUSTER MEET with empty IP)

```
ERR Invalid node address specified: :6379
```

### Operator logs (stuck polling loop)

```
Finishing fast scaling {"redkey-cluster": "chaos-5-bc6m5/redis-cluster"}
Waiting for cluster to be Ready in Robin {"redkey-cluster": "chaos-5-bc6m5/redis-cluster"}
```

### Redis CLUSTER INFO (0 slots assigned)

```
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_known_nodes:3
cluster_size:0
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Robin stuck in infinite loop after purgeKeysOnRebalance scale-down #48

Description

Steps to Reproduce

Root Cause: Robin ConfigMap Not Updated

Observed Behavior

Expected Behavior

Environment

Relevant Logs

Robin ConfigMap (stale)

Robin logs (ghost node connection failures)

Robin logs (CLUSTER MEET with empty IP)

Operator logs (stuck polling loop)

Redis CLUSTER INFO (0 slots assigned)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Robin stuck in infinite loop after purgeKeysOnRebalance scale-down #48

Description

Description

Steps to Reproduce

Root Cause: Robin ConfigMap Not Updated

Observed Behavior

Expected Behavior

Environment

Relevant Logs

Robin ConfigMap (stale)

Robin logs (ghost node connection failures)

Robin logs (CLUSTER MEET with empty IP)

Operator logs (stuck polling loop)

Redis CLUSTER INFO (0 slots assigned)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions