-
Notifications
You must be signed in to change notification settings - Fork 19
Correct compliance on no-op dryrun updates #404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct compliance on no-op dryrun updates #404
Conversation
Previously, the policy status would incorrectly have a noncompliant event followed immediately by a compliant event when a dryrun update would not make any changes on that object. This would cause the policy status to repeatedly update if the policy was reconciling often - for example if any of its watched objects were updating. Now only the correct "compliant" event should be emitted in that case. Refs: - https://issues.redhat.com/browse/ACM-25694 Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
|
|
||
| history, _, err := unstructured.NestedSlice(managedPlc.Object, "status", "history") | ||
| g.Expect(err).NotTo(HaveOccurred()) | ||
| g.Expect(history).To(HaveLen(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the change, the history for this policy would have 2 entries.
|
This flake(?) happened: That's at least twice now, in a relatively new test, so it will be worth investigating soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is changing code logic for an enforce policy, but as far as I can tell the policy in the issue is in inform, so I'm not actually sure this would resolve it? If the object is constantly being updated, I'd expect the policy to churn, right? Or am I missing something here?
I don't think this code change is just for The policy "churn" isn't that it is constantly sending actual updates to a managed object, it's only that the policy's status is flipping to noncompliant and then compliant whenever it reconciles. The bug is that the noncompliant status is incorrect: the log notes that the dry-run request shows no changes are needed, but it's reporting as "found but not as specified" anyway. It should only be reporting that the resource is compliant. In a policy for an object that doesn't have other updates streaming in, it doesn't really churn: for example in the test, it was just emitting those two events. But if something irrelevant updates in that managed object, like an annotation or the status, that will trigger a reconcile on the policy. So previously in that case, it was re-emitting both the (incorrect) noncompliant event, and the compliant event. It should only be trying to emit the compliant event, and then stop itself from doing that, because that exactly matches the previous/current status. Does that make sense? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense--sorry, I got lost in the conditionals. So the updated block is happening regardless of remediationAction. I'm just really baffled on why this flow wouldn't be working and needs to be overridden. And I'm also concerned we might be covering up some edge case perhaps around a policy with a status check or changing existing behavior by overriding the boolean especially since this would be backported to 2.14. But also things seem to be working as expected and are passing the tests which also includes status checking, so I'm not sure I need to be overthinking it any more. 😅
I also noticed this commit, which might explain why they have annotations: {} in there? stolostron/config-policy-controller@34c02b8
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dhaiducek, JustinKuli The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
3fd6da7
into
open-cluster-management-io:main
|
You're right to question it! And those conditions get hairy, especially in GitHub, and with my other small changes cluttering things up... The main change is: - // Assume object is compliant by inverting the value of throwSpecViolation
+ // treat the object as compliant, with no updates needed
- r.setEvaluatedObject(obj.policy, obj.existingObj, !throwSpecViolation, "")
+ r.setEvaluatedObject(obj.policy, obj.existingObj, true, "")
- matchesAfterDryRun = true
- return throwSpecViolation, "", diff, updateNeeded, updatedObj, matchesAfterDryRun
+ return false, "", diff, false, updatedObj, trueAnd I suspect the old code was incorrectly copied or trying to be clever, and we didn't notice in our specific cases because there would be another event emitted matching the status we were expecting. In particular, continuing to work with |
|
Right, I eventually noticed the rest were no-ops since the value for For a future PR, it looks like |
Previously, the policy status would incorrectly have a noncompliant event followed immediately by a compliant event when a dryrun update would not make any changes on that object. This would cause the policy status to repeatedly update if the policy was reconciling often - for example if any of its watched objects were updating. Now only the correct "compliant" event should be emitted in that case. NOTE: This is a backport, but not just a cherry-pick. This incorporates some of the changes from these upstream PRs, with updates for tests: - open-cluster-management-io/config-policy-controller#404 - open-cluster-management-io/config-policy-controller#377 Refs: - https://issues.redhat.com/browse/ACM-25742 Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
Previously, the policy status would incorrectly have a noncompliant event followed immediately by a compliant event when a dryrun update would not make any changes on that object. This would cause the policy status to repeatedly update if the policy was reconciling often - for example if any of its watched objects were updating.
Now only the correct "compliant" event should be emitted in that case.
Refs: