Skip to content

Conversation

@adamancini
Copy link
Member

Summary

This PR adds a support bundle analyzer that detects PodDisruptionBudget-related pod eviction failures in k0scontroller logs. These failures cause autopilot upgrades to hang, making the cluster unable to complete upgrades.

Problem

When customers deploy PodDisruptionBudgets with restrictive settings (e.g., minAvailable too high or maxUnavailable too low), the k0s autopilot cannot evict pods during node draining. This causes cluster upgrades to hang indefinitely. Previously, identifying this issue required manually grepping through k0scontroller logs.

Solution

The new analyzer:

  • Scans k0scontroller logs for two specific error patterns:
    • error when evicting pods
    • Cannot evict pod as it would violate the pod's disruption budget
  • Triggers a fail severity when detected
  • Provides comprehensive remediation guidance including:
    • kubectl commands to identify problematic PDBs
    • Explanation of minAvailable/maxUnavailable settings
    • Step-by-step instructions to temporarily adjust or remove PDBs
    • Warnings about availability implications

Testing

  • Verify analyzer syntax in host-support-bundle.tmpl.yaml
  • Test with support bundle containing PDB eviction errors
  • Confirm failure message displays correctly
  • Verify pass condition when no errors present

Impact

This change significantly reduces time-to-resolution for PDB-blocked upgrade issues by surfacing the problem automatically in support bundle analysis.

Related Issues

Related to recent troubleshoot repo work on partial discovery failures with Kubernetes API in cluster-resources collector.

Add analyzer to detect PodDisruptionBudget-related pod eviction failures
in k0scontroller logs. This helps identify when PDBs are blocking autopilot
upgrades, causing cluster upgrades to hang.

The analyzer:
- Scans k0scontroller logs for eviction error patterns
- Detects "error when evicting pods" and PDB violation messages
- Provides comprehensive remediation guidance
- Includes kubectl commands and step-by-step resolution steps

This makes troubleshooting PDB-blocked upgrades significantly faster for
support engineers and customers.
@github-actions
Copy link

github-actions bot commented Dec 8, 2025

This PR has been released (on staging) and is available for download with a embedded-cluster-smoke-test-staging-app license ID.

Online Installer:

curl "https://staging.replicated.app/embedded/embedded-cluster-smoke-test-staging-app/ci/appver-dev-442d2e2" -H "Authorization: $EC_SMOKE_TEST_LICENSE_ID" -o embedded-cluster-smoke-test-staging-app-ci.tgz

Airgap Installer (may take a few minutes before the airgap bundle is built):

curl "https://staging.replicated.app/embedded/embedded-cluster-smoke-test-staging-app/ci-airgap/appver-dev-442d2e2?airgap=true" -H "Authorization: $EC_SMOKE_TEST_LICENSE_ID" -o embedded-cluster-smoke-test-staging-app-ci.tgz

Happy debugging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants