-
Notifications
You must be signed in to change notification settings - Fork 23
SECURESIGN-3167 CTLog recovery #1406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's GuideThis PR enhances the CTLog operator to self-heal its configuration secret by validating and recreating it when missing or invalid, adds utility and parsing helpers for managing CTLog config secrets in tests and controller logic, and introduces end-to-end integration tests that simulate secret loss and verify operator recovery behavior and TreeID preservation. Sequence diagram for CTLog config secret validation and recoverysequenceDiagram
participant Operator
participant Kubernetes
participant Secret
Operator->>Kubernetes: Get CTLog config secret
alt Secret not found
Operator->>Operator: Log info (missing secret)
Operator->>Operator: Record event (CTLogConfigMissing)
Operator->>Kubernetes: Recreate config secret
else Error accessing secret
Operator->>Operator: Log error
Operator->>Operator: Record event (CTLogConfigError)
Operator->>Kubernetes: Recreate config secret
else Secret exists
Operator->>Secret: Validate secret data
alt Secret valid
Operator->>Operator: Continue (no action)
else Secret invalid
Operator->>Operator: Log info (invalid secret)
Operator->>Operator: Record event (CTLogConfigInvalid)
Operator->>Kubernetes: Recreate config secret
end
end
Class diagram for CTLog config secret validation helperclassDiagram
class ctlogUtils {
+IsSecretDataValid(secretData map[string][]byte, expectedTrillianAddr string) bool
}
ctlogUtils : IsSecretDataValid checks secret data for valid Trillian address
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- The IsSecretDataValid helper relies on naive string matching of the protobuf text output, which can be brittle—consider using a TextUnmarshaler or protobuf parser to robustly extract and validate the backend_spec field.
- CanHandle always returning true forces reconciliation even when nothing has changed; refining its logic (e.g., comparing observed generation or secret checksums) could reduce unnecessary reconcile loops.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The IsSecretDataValid helper relies on naive string matching of the protobuf text output, which can be brittle—consider using a TextUnmarshaler or protobuf parser to robustly extract and validate the backend_spec field.
- CanHandle always returning true forces reconciliation even when nothing has changed; refining its logic (e.g., comparing observed generation or secret checksums) could reduce unnecessary reconcile loops.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
PR Code Suggestions ✨Latest suggestions up to 018dfb5
Previous suggestions✅ Suggestions up to commit 3e1cde2
|
||||||||||||||||||||||||||||
| return true | ||
| case instance.Spec.ServerConfigRef != nil: | ||
| return !equality.Semantic.DeepEqual(instance.Spec.ServerConfigRef, instance.Status.ServerConfigRef) | ||
| case c.ObservedGeneration != instance.Generation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If custom ServerConfigRef is unchanged, generation changes are ignored and periodic validation doesn't run.
| default: | ||
| return instance.Generation != c.ObservedGeneration | ||
| // Always run Handle() to validate the secret: exists and is valid | ||
| return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When default is true than some of prev validation are not needed.
| if apierrors.IsNotFound(err) { | ||
| i.Logger.Info("Server config secret is missing, will recreate", | ||
| "secret", instance.Status.ServerConfigRef.Name) | ||
| i.Recorder.Event(instance, corev1.EventTypeWarning, "CTLogConfigMissing", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide to all logs and events the name of the secret map. The name is generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is on multiple places so please fix it for all of them.
|
|
||
| trillianUrl := fmt.Sprintf("%s:%d", instance.Spec.Trillian.Address, *instance.Spec.Trillian.Port) | ||
|
|
||
| // Validate existing secret before attempting recreation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract whole block of validation existing secret to new function which will return error based on actual state. Then in Handle function simple call it if no error found return i.Continue() other way use infrormation from the error to create event and log message.
| "secret", instance.Status.ServerConfigRef.Name) | ||
| i.Recorder.Event(instance, corev1.EventTypeWarning, "CTLogConfigMissing", | ||
| "Config secret is missing, will recreate") | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for k8s api error other than not found will be better to return failure reconcilation. The reason is that there could be a lot of different api error like access rejection which creating a new object will not solve the problem and it cause other issues.
| "reason", "Trillian configuration mismatch") | ||
| i.Recorder.Event(instance, corev1.EventTypeWarning, "CTLogConfigInvalid", | ||
| "Config secret has invalid Trillian configuration, will recreate") | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why checking list if root certificate is not handled by IsSecretDataValid and checking only size of list is insufficient it will need to compare that exactly correct certificates are used from status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to solve that by adding annotation which will contains information based on which data it has been generated. We have it for example in Fulcio's generate_cert action and we are using them to check if data has been generated from same spec.
For that you can use CRDs generation or compare exactly spec.
User description
https://issues.redhat.com/browse/SECURESIGN-3167
Summary by Sourcery
Enable CTLog operator to self-heal missing or invalid server configuration secrets by validating and regenerating them, supported by a new validation utility and comprehensive e2e recovery tests.
New Features:
Enhancements:
Tests:
PR Type
Enhancement, Tests
Description
Add CTLog config secret validation and self-healing recovery mechanism
Implement automatic detection and recreation of missing or invalid config secrets
Add integration tests for CTLog recovery scenarios with Trillian address validation
Introduce helper functions for config secret management and Trillian address extraction
Diagram Walkthrough
File Walkthrough
server_config.go
Add config secret validation and recovery logicinternal/controller/ctlog/actions/server_config.go
CanHandle()to always validate secret existence and validityinstead of checking generation
for missing or invalid configurations
and validates Trillian address configuration
config secrets with proper logging and events
ctlog_config.go
Add secret validation helper functioninternal/controller/ctlog/utils/ctlog_config.go
IsSecretDataValid()function to validate CTLog config secretscontain correct Trillian backend address
non-empty configuration
config data
configurations
ctlog_recovery_test.go
Add CTLog recovery integration teststest/e2e/ctlog_recovery_test.go
scenarios
configuration
automatic recreation
with correct configuration
recovery
ctlog.go
Add config secret test helper functionstest/e2e/support/tas/ctlog/ctlog.go
GetConfigSecret()helper to retrieve config secrets by nameDeleteConfigSecret()helper to delete config secrets for testingGetTrillianAddressFromSecret()helper to extract Trillianaddress from config secret data