Enable kubernetes_node_scale benchmark (up to 5k nodes) on AWS EKS with Karpenter#6512
Enable kubernetes_node_scale benchmark (up to 5k nodes) on AWS EKS with Karpenter#6512kiryl-filatau wants to merge 11 commits intoGoogleCloudPlatform:masterfrom
Conversation
| # Output can be quite large, so we'll conditionally suppress it. | ||
| ['get', resource_type, '-o', 'json'], | ||
| timeout=60 * 5, # 5 minutes for large clusters (e.g. 1000 pods) | ||
| suppress_logging=NUM_PODS.value > 20, |
| def _PostCreate(self): | ||
| """Performs post-creation steps for the cluster.""" | ||
| super()._PostCreate() | ||
| # Karpenter controller resources: default 1/1Gi; scale up when node_scale target is set. |
There was a problem hiding this comment.
Can we just not specify anything & let Karpenter decide? Or is this indeed necessary? It seems clever but a little annoying / bad user experience by Karpenter.
There was a problem hiding this comment.
These are the resources for the Karpenter controller pod (the node where Karpenter itself runs). Karpenter doesn’t manage that node, so it can’t “decide” these values, we have to set them. For runs with ~10 nodes, 1/1Gi is sufficient; we only increase when node_scale is 500+ or 1000+.
| 'v' | ||
| + full_version.strip().strip('"').split(f'{self.cluster_version}-v')[1] | ||
| ) | ||
| # NodePool CPU limit: scale with benchmark target (nodes * 2 + 5%), min 1000. |
There was a problem hiding this comment.
Does the machine type matter here as well? If I am using a larger machine type, do I need to also set a larger cpu limit? This again seems a little annoying to have to set manually (but maybe makes senses given Karpenter can be machine type agnostic).
There was a problem hiding this comment.
Makes sense to include machine type adjustment, I’ll think about how to cover it.
Thanks.
| suppress_failure=lambda stdout, stderr, retcode: ( | ||
| 'no matching resources found' in stderr.lower() | ||
| or 'timed out' in stderr.lower() | ||
| or 'context deadline exceeded' in stderr.lower() |
There was a problem hiding this comment.
These look very similar to the RETRYABLE_KUBECTL_ERRORS list:
Just use kubectl.RunRetryableKubectlCommmand instead & get these for free. If that code is missing some of these (like 'timed out') then consider adding. It looks suppress_failure is supported too, so you can mix both - which would probably be good for 'no matching resources found' as that sounds like a wait/this command specific error message to ignore.
There was a problem hiding this comment.
@hubatish
Updated: EKS cleanup now uses RunRetryableKubectlCommand with suppress_failure only for "no resources found" style messages, retryable list extended and matching is case insensitive, please check.
| ), | ||
| ) | ||
| max_retries = 5 | ||
| backoff_seconds = 10 |
There was a problem hiding this comment.
While this backoff logic looks pretty reasonable, prefer reusing backoff logic in vm_util.Retry. Which means moving this code to a subfunction & adding said decorator.
| """Stop watching the cluster for node add/remove events.""" | ||
| polled_events = self._cluster.GetEvents() | ||
|
|
||
| # Resolve machine type only for current nodes; use "unknown" for the rest. |
There was a problem hiding this comment.
O this makes sense. Was this causing the cluster to take a long time querying everything?
There was a problem hiding this comment.
Yep, it was the main reason.
| if name in _current_node_names: | ||
| machine_type = _GetMachineTypeFromNodeName(self._cluster, name) | ||
| else: | ||
| machine_type = "unknown" |
There was a problem hiding this comment.
Something around here is probably what is causing the TypeError.
Summary
Enables running the kubernetes_node_scale benchmark (0→5k→0→5k nodes) on AWS EKS with Karpenter. The benchmark scales a deployment with pod anti-affinity, measures scale-up, scale-down, and a second scale-up, then tears down the cluster.
Main changes
Kubernetes_node_scale benchmark — Template and scaling logic (scale up, scale down, phases), metrics collection, and timeouts tuned for large runs.
EKS + Karpenter — Nodepool template (instance types including
t, CPU limit derived from scale target), EKS/Karpenter cluster lifecycle and cleanup.Karpenter scaling by node count — NodePool CPU limit is computed from
kubernetes_scale_num_nodes:max(1000, ceil(nodes × 2 × 1.05))(e.g. 10 nodes → 1000, 5k → 10500). Controller pod resources scale with the same flag:One configuration works for both small and 5k-node runs.
Teardown robustness — Orphan ENI deletion in
_CleanupKarpenter: retry with backoff on AWS throttle (RequestLimitExceeded), treat "ENI not found" as success; usessuppress_failurefor these cases.Tracker — Single
get nodespass in_StopWatchingForNodeChanges; resolve machine type only for current nodes, use"unknown"for others to avoid thousands of kubectl calls on 5k-node runs.Tests —
kubernetes_scale_benchmark_testmocks updated to return valid kubectl-o jsonoutput ({"items": [...]}) so tests pass afterGetStatusConditionsForResourceTypewas switched from jsonpath to full JSON.