Skip to content

MachineHealthCheck remediation for percentage value in remediation.triggerIf.unhealthyLessThanOrEqualTo (previous name maxUnhealthy) should allow remediating at least one machine #12904

@AndiDog

Description

@AndiDog

CAPI currently rounds down (math.Floor) the number of machines calculated from a percentage value (set by a parameter to GetScaledValueFromIntOrPercent):

func getMaxUnhealthy(mhc *clusterv1.MachineHealthCheck) (int, error) {
	maxUnhealthy, err := intstr.GetScaledValueFromIntOrPercent(ptr.To(ptr.Deref(mhc.Spec.Remediation.TriggerIf.UnhealthyLessThanOrEqualTo, defaultMaxUnhealthy)), int(ptr.Deref[int32](mhc.Status.ExpectedMachines, 0)), false)
	if err != nil {
		return 0, err
	}
	return maxUnhealthy, nil
}

So if I set 20% and have 3 machines, the controller determines "maxUnhealthy = floor(3 * 20%) = 0 machines" and therefore won't allow any remediation, as shown in the conditions:

  v1beta2:
    conditions:
    - lastTransitionTime: "2025-10-28T11:18:13Z"
      message: 'Remediation is not allowed, the number of not started or unhealthy
        machines exceeds maxUnhealthy (total: 3, unhealthy: 1, maxUnhealthy: 20%)'
      observedGeneration: 2
      reason: TooManyUnhealthy
      status: "False"
      type: RemediationAllowed

I'd argue that by setting a positive percentage value here, as a user, I still intend to see remediation happening even if there are 5 machines or less. Particularly in a MachinePool scenario (see remediation support PR), where the number of machines may go up and down, I can't really set a fixed number (compared to a percentage). Choosing one percentage value as reasonable default would be nicer, but only if that still does the remediation.

Rounding up solves this, or one could always allow a minimum of 1 machine if a positive percentage/numeric value is configured.

/area machinepool

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/machinehealthcheckIssues or PRs related to machinehealthchecksarea/machinepoolIssues or PRs related to machinepoolsneeds-kindIndicates a PR lacks a `kind/foo` label and requires one.needs-priorityIndicates an issue lacks a `priority/foo` label and requires one.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions