-
Notifications
You must be signed in to change notification settings - Fork 134
OCPBUGS-62626: only report Progressing=True when progressing towards new configuration #1264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@flavianmissi: This pull request references Jira Issue OCPBUGS-62626, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: flavianmissi The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/payload-job periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade |
|
@flavianmissi: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/0ac02f00-b32e-11f0-8ae7-8e4f57a461ba-0 |
|
payload job failed during setup. /payload-job periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade |
|
@flavianmissi: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c3b27b70-b3fc-11f0-8a25-dce41c0f7de4-0 |
|
Looks like the tests covering the Progressing=True issue where merged yesterday, I think that's why they didn't show up on my latest payload run, so I'll have to try again. /payload-job periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade |
|
@flavianmissi: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/19d72d10-b49e-11f0-8428-5cc249faa40f-0 |
|
Let us try this: /payload-job-with-prs periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade openshift/origin#30438 |
|
@hongkailiu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/11b2ee60-b522-11f0-9eef-fe936f49c238-0 |
|
The result from the job #1264 (comment) is looking good: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/openshift-origin-30438-openshift-cluster-image-registry-operator-1264-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade/1983682929573761024/artifacts/e2e-gcp-ovn-rt-upgrade/openshift-e2e-test/artifacts/junit/e2e-monitor-tests__20251030-013050.xml | rg 'clusteroperator/image-registry should stay Progressing=False' -A1 -B1
<testcase name="[Monitor:legacy-cvo-invariants][bz-Etcd] clusteroperator/etcd should stay Progressing=False while MCO is Progressing=True" time="0"></testcase>
<testcase name="[Monitor:legacy-cvo-invariants][bz-Image Registry] clusteroperator/image-registry should stay Progressing=False while MCO is Progressing=True" time="0"></testcase>
<testcase name="[Monitor:legacy-cvo-invariants][bz-Routing] clusteroperator/ingress should stay Progressing=False while MCO is Progressing=True" time="4153.084">
|
|
/retest |
|
/retitle OCPBUGS-62626: only report Progressing=True when progressing towards new configuration |
|
/jira refresh |
|
@flavianmissi: This pull request references Jira Issue OCPBUGS-62626, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (xiuwang+1@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
e869359 to
5aeceae
Compare
…duling The OperatorProgressing condition API definition states that operators must not report Progressing when reconciling to previously known state, such as when nodes are rebooted and pods are restarted, or when daemonsets adjust to node reboot or cluster scale-up events. The NodeCADaemonController was violating this API by reporting Progressing=True with reason "Unavailable" whenever ds.Status.NumberUnavailable > 0, which occurs during normal pod rescheduling operations (node reboots, cluster scale-up, etc.). This caused the the IR operator to switch between Progressing=True and Progressing=False during machine-config upgrade windows, generating several unexpected state transitions in CI. This commit removes the logic that reports Progressing=True based on NumberUnavailable. Now the controller only reports Progressing=True when Generation != ObservedGeneration, which indicates an actual daemonset update is in progress, not just a pod rescheduling. Co-Authored-By: Claude <noreply@anthropic.com>
The OperatorProgressing condition API definition states that operators must not report Progressing when reconciling to previously known state, such as when nodes are rebooted and pods are restarted, or when daemonsets adjust to node reboot or cluster scale-up events. The IR operator was violating the OperatorProgressing condition semantics by reporting Progressing=True whenever the image registry Deployment was not complete, even when just reconciling to a previously known state. This commit adds a check for deploy.Generation != deploy.Status.ObservedGeneration before reporting DeploymentNotCompleted. This ensures we only report Progressing=True during actual Deployment updates (when Generation has been bumped but not yet observed), not during normal reconciliation events like: * pod rescheduling after node reboots * pods restarting after crashes * replicas scaling up/down to match existing desired count Co-Authored-By: Claude <noreply@anthropic.com>
5aeceae to
30eb3f3
Compare
|
@flavianmissi: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |

No description provided.