-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Description
Which component are you using?:
/area cluster-autoscaler
What version of the component are you using?:
v1.33.0
Component version:
What k8s version are you using (kubectl version)?: 1.34.0
kubectl version Output
$ kubectl version
What environment is this in?: Cluster Autoscaler in management cluster, along with Cluster API+CAPA objects, workcluster is setup by CAPI+CAPA on ec2 instances (not EKS)
What did you expect to happen?:
Scaling down of node
What happened instead?:
while Cluster Autoscaler was able to scale-up, and also mark the appropriate node for scale-down, the node was never removed/MachineDeployment replica remained 1 (should become 0 as there is no pod on the node)
How to reproduce it (as minimally and precisely as possible):
- create a Management Cluster (on Kind)
- create a workload cluster on ec2 instance (not EKS) using Cluster API
- Create MachineDeployments as follows for all 3 AZs
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
name: tpfm-k0-md-storage-ap-south-1a
namespace: default
annotations:
# CA discovery bounds (required)
cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "0"
cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "5"
# Scale-from-zero capacity (set to *allocatable* for t3a.medium)
capacity.cluster-autoscaler.kubernetes.io/cpu: "2"
capacity.cluster-autoscaler.kubernetes.io/memory: "3748Mi"
# Scale-from-zero scheduling predicates (so CA knows a new node matches your pods)
capacity.cluster-autoscaler.kubernetes.io/labels: "node-role.tpfm.io=storage"
capacity.cluster-autoscaler.kubernetes.io/taints: "node-role.tpfm.io=storage:NoSchedule"
spec:
clusterName: tpfm-k0
replicas: 0
selector:
matchLabels: null
- Install Cluster Autoscaler with helm chart values
cloudProvider: clusterapi
# Choose a tag matching your workload Kubernetes minor when available
image:
tag: v1.33.0
# clusterapi-specific
clusterAPIMode: kubeconfig-incluster
clusterAPIKubeconfigSecret: ca-workload-kubeconfig
clusterAPIWorkloadKubeconfigPath: /etc/kubernetes/workload
clusterAPIConfigMapsNamespace: kube-system
# Autodiscovery: search CAPI objects in mgmt cluster "default" ns, by labels
autoDiscovery:
namespace: default
labels:
- cluster.x-k8s.io/cluster-name: tpfm-k0
extraArgs:
v: 7
kubeconfig: /etc/kubernetes/workload/tpfm-k0.kubeconfig
clusterapi-cloud-config-authoritative: "true"
balance-similar-node-groups: "true"
balancing-label_1: node-role.tpfm.io
# balancing-ignore-label_1: "topology.kubernetes.io/zone"
# balancing-ignore-label_2: "failure-domain.beta.kubernetes.io/zone"
expander: "least-waste"
# Scale UP responsiveness
new-pod-scale-up-delay: "0s"
max-node-provision-time: "5m"
# Scale DOWN responsiveness
scale-down-unneeded-time: "2m" # default ~10m, How long a node should be unneeded before it is eligible for scale down
scale-down-delay-after-add: "2m" # default ~10m, How long after scale up that scale down evaluation resumes
# scale-down-delay-after-delete: "30s" # How long after node deletion that scale down evaluation resumes, defaults to scanInterval
scale-down-delay-after-failure: "2m" # default ~3m
scale-down-utilization-threshold: "0.6" # default 0.5 → marks nodes underutilized sooner
max-scale-down-parallelism: "20" # delete more empty nodes per loop (safe if churn is OK)
rbac:
additionalRules:
- apiGroups:
- infrastructure.cluster.x-k8s.io
resources:
- awsmachinetemplates
verbs:
- get
- list
- watch
Anything else we need to know?:
- cluster autoscaler logs show the correct node is marked for scale-down
tdin│I0915 14:46:25.853378 1 actuator.go:175] Scale-down: removing empty node "ip-10-0-233-80.ap-south-1.compute.internal" │
stdin│I0915 14:46:25.854400 1 clusterapi_controller.go:790] node "ip-10-0-233-80.ap-south-1.compute.internal" is in nodegroup "MachineDeployment/default/tpfm-k0-md-storage-ap-south-1c" │
stdin│I0915 14:46:25.854984 1 actuator.go:295] Scale-down: waiting 5s before trying to delete nodes │
stdin│I0915 14:46:25.906719 1 round_trippers.go:632] "Response" status="200 OK" milliseconds=53 │
stdin│I0915 14:46:25.906959 1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"82103712-160b-4c59-939f-2a│
9964446011", APIVersion:"v1", ResourceVersion:"93959", FieldPath:""}): type: 'Normal' reason: 'ScaleDownEmpty' Scale-down: removing empty node "ip-10-0-233-80.ap-south-1.compute.internal" │
stdin│I0915 14:46:25.907027 1 round_trippers.go:527] "Request" verb="PATCH" url="https://default-tpfm-k0-apiserver.com:6443/api/v1/namespaces/kube-system/e│
vents/cluster-autoscaler-status.1865706f86b05b8d" headers=< │
stdin│ Accept: application/vnd.kubernetes.protobuf, */*
- the machine object is marked with
annotations:
cluster.x-k8s.io/delete-machine: 2025-09-15 14:16:56.862721311 +0000 UTC m=+11157.135416490
but the replica on MachineDeployment remains 1
- in the status configmap of the cluster autoscaler in the workload cluster,
scaleDown:
status: CandidatesPresent
candidates: 1
lastProbeTime: "2025-09-15T14:50:26.294111768Z"
lastTransitionTime: "2025-09-15T11:19:24.712973162Z"
- since, Cluster Autoscaler actually added the node, I believe RBAC is not a problem
any help is appreciated, thanks