Skip to content

Backup SCC of Pod#376

Open
msfrucht wants to merge 7 commits intoopenshift:oadp-devfrom
msfrucht:backup_scc_of_pod
Open

Backup SCC of Pod#376
msfrucht wants to merge 7 commits intoopenshift:oadp-devfrom
msfrucht:backup_scc_of_pod

Conversation

@msfrucht
Copy link

@msfrucht msfrucht commented Mar 3, 2026

Description

Most SCCs are referenced via their Pod annotation which was not included as a related item.

Custom SCCs are typically included via ClusterRole and ClusterRolebinding with a use verb. These cannot be found by simply going through the list of SCCs .users fields.

This change helps resolve part of this by including the SCCs in the backup of Pods in-use by the application.

SCCs that may be needed temporarily in Jobs/CronJobs are more difficult to determine and should be handled by explicitly including SCC in the backup types.


Other Changes

Moves security client into clients like the rest and allows for unit test overriding.

Add one new entry in go.mod for the new unit tests.


Questions

  1. Does Velero handle duplicate objects?

If not, this will need to be extensively revised as the same SCC can be referenced many times.

  1. Should backup continue when SCC is not found?

SCC is evaluated as scheduling. Removing an SCC after a Pod starts will not cause Pod failure. I have wrote the changes such that SCC not found will not cause an error to handle that.

Summary by CodeRabbit

  • New Features

    • Backups now capture OpenShift Security Context Constraints (SCC) referenced by pod annotations so related security config is preserved.
    • Exposed a constant for the SCC pod annotation key.
  • Tests

    • Added comprehensive tests for SCC-related pod backup scenarios (exists, missing, access errors, client init failures).
  • Chores

    • Updated dependency references and centralized security client usage for consistency.

msfrucht added 2 commits March 3, 2026 13:08
Signed-off-by: MICHAEL FRUCHTMAN <msfrucht@us.ibm.com>
Signed-off-by: Michael Fruchtman <msfrucht@us.ibm.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Centralized OpenShift SecurityV1 client added; pods with openshift.io/scc annotation are resolved to SecurityContextConstraints and included as additional backup items; serviceaccount plugin now uses centralized client; SCC annotation constant and unit tests for pod SCC handling were added.

Changes

Cohort / File(s) Summary
Module file
go.mod
Added an indirect dependency entry for github.com/evanphx/json-patch v5.6.0+incompatible (duplicate require introduced).
Clients infrastructure
velero-plugins/clients/clients.go
Added exported SecurityClient hook and package-level cached securityClient/securityClientError plus getSecurityClient()/newSecurityClient() to centralize creation/access of the OpenShift SecurityV1 client.
Pod backup & tests
velero-plugins/pod/backup.go, velero-plugins/pod/backup_test.go
Pod backup now reads openshift.io/scc annotation, resolves the referenced SCC via the centralized security client (new addSCC helper) and appends a ResourceIdentifier for the SCC to additional items. Added table-driven tests and a fake security client helper covering found, not-found, error, and client-init-failure scenarios.
Constants
velero-plugins/common/types.go
Added exported SCCPodAnnotation constant set to "openshift.io/scc".
ServiceAccount plugin
velero-plugins/serviceaccount/backup.go
Removed the file-local SecurityClient()/newSecurityClient() and cached client variables; UpdateSCCMap now calls clients.SecurityClient() directly.

Sequence Diagram(s)

sequenceDiagram
    participant PodPlugin as Pod Backup Plugin
    participant ClientsPkg as Clients Package
    participant SCCAPI as OpenShift SCC API
    participant Velero as Velero

    PodPlugin->>PodPlugin: Read Pod metadata
    PodPlugin->>PodPlugin: Check `openshift.io/scc` annotation
    alt annotation present
        PodPlugin->>ClientsPkg: SecurityClient()
        ClientsPkg->>SCCAPI: Get(SecurityContextConstraints name)
        alt SCC found
            SCCAPI-->>ClientsPkg: SCC object
            ClientsPkg-->>PodPlugin: SCC object
            PodPlugin->>Velero: Add SCC ResourceIdentifier to additional items
        else SCC not found
            SCCAPI-->>ClientsPkg: NotFound
            ClientsPkg-->>PodPlugin: NotFound -> log warning, continue
        else other error
            SCCAPI-->>ClientsPkg: Error
            ClientsPkg-->>PodPlugin: Return error
        end
    else no annotation
        PodPlugin->>Velero: Continue without SCC
    end
    PodPlugin-->>Velero: Return pod + additional items (or error)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Test assertions lack meaningful failure messages and don't use t.Run() subtest pattern, violating established codebase conventions. Add failure messages to assertions and refactor to use t.Run() subtests following the pattern used throughout the repository.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Backup SCC of Pod' directly and concisely describes the main change: adding SecurityContextConstraints (SCC) backup support for pods.
Stable And Deterministic Test Names ✅ Passed All test names in the new test file are stable and deterministic with no generated suffixes, timestamps, UUIDs, or random values.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 3, 2026
@openshift-ci
Copy link

openshift-ci bot commented Mar 3, 2026

Hi @msfrucht. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@velero-plugins/pod/backup_test.go`:
- Around line 89-172: The test currently only checks the length of items
returned from BackupPlugin.Execute but ignores validating the actual resource
identifiers; update the assertions to compare the returned items'
velero.ResourceIdentifier values against the test.expectedIdentifiers for each
case (including Name and GroupResource fields) instead of only asserting Len;
locate the items variable in the loop after calling backupPlugin.Execute and
convert/inspect the items' identifiers (the same shape as expectedIdentifiers)
and use assert.Equal or equivalent to assert deep equality with
test.expectedIdentifiers, ensuring cases like "SCC exists and is added to
additional items" and error cases validate exact Name and GroupResource values.
- Around line 151-160: Save the current clients.SecurityClient before you
override it in the test loop and restore it after the iteration to avoid leaking
global state: capture original := clients.SecurityClient, then set
clients.SecurityClient = getNewSecurityClient(...), and register a cleanup to
restore it with t.Cleanup(func() { clients.SecurityClient = original }) (or
manually assign it back at the end of the iteration) so the global
clients.SecurityClient is reset between tests.

In `@velero-plugins/pod/backup.go`:
- Around line 49-53: The code reads pod.Annotations[common.SCCPodAnnotation]
into sccName and calls p.addSCC(sccName) even when the annotation exists but is
an empty string; change the flow in the block that handles the SCC annotation so
you first trim/check sccName for empty (or whitespace-only) value and if empty
skip calling p.addSCC and log/Infof that the annotation was present but empty
for pod.Name, otherwise proceed to call p.addSCC(sccName) as before; update
references in this block (pod.Annotations, common.SCCPodAnnotation, sccName,
p.addSCC) accordingly.
- Around line 85-89: The returned velero.ResourceIdentifier is using
scc.GroupVersionKind().Group which relies on TypeMeta; replace that call with
the fixed group string "security.openshift.io" so GroupResource.Group becomes
"security.openshift.io" (leave Resource: "securitycontextconstraints" unchanged)
— update the code surrounding the ResourceIdentifier construction (the scc
usage) accordingly to match the pattern used in
velero-plugins/serviceaccount/backup.go.

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between b1a2756 and 6ace8d8.

📒 Files selected for processing (6)
  • go.mod
  • velero-plugins/clients/clients.go
  • velero-plugins/common/types.go
  • velero-plugins/pod/backup.go
  • velero-plugins/pod/backup_test.go
  • velero-plugins/serviceaccount/backup.go

Comment on lines +49 to +53
if sccName, exists := pod.Annotations[common.SCCPodAnnotation]; exists {
p.Log.Infof("[pod-backup] Pod %s has SCC annotation with value: %s", pod.Name, sccName)

sccIdentifier, localerr := p.addSCC(sccName)
if localerr == nil {
Copy link

@coderabbitai coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle empty SCC annotation values before lookup.

If the annotation key exists but value is empty, Line 52 will call the API with an invalid name and can fail the item backup unnecessarily.

Proposed fix
-	if sccName, exists := pod.Annotations[common.SCCPodAnnotation]; exists {
+	if sccName, exists := pod.Annotations[common.SCCPodAnnotation]; exists {
+		if strings.TrimSpace(sccName) == "" {
+			p.Log.Warnf("[pod-backup] Pod %s has empty SCC annotation value", pod.Name)
+		} else {
 		p.Log.Infof("[pod-backup] Pod %s has SCC annotation with value: %s", pod.Name, sccName)
 
 		sccIdentifier, localerr := p.addSCC(sccName)
 		if localerr == nil {
 			additionalItems = append(additionalItems, sccIdentifier)
 		} else {
 			// continuing backup despite error, SCCs can be deleted after Pod creation
 			if errors.IsNotFound(localerr) {
 				p.Log.Warnf("[pod-backup] SCC %s for pod %s not found: %s", sccName, pod.Name, localerr.Error())
 			} else {
 				p.Log.Errorf("[pod-backup] Error adding SCC %s for pod %s: %s", sccName, pod.Name, localerr.Error())
 				// error will go out of scope otherwise
 				err = localerr
 			}
 		}
+		}
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if sccName, exists := pod.Annotations[common.SCCPodAnnotation]; exists {
p.Log.Infof("[pod-backup] Pod %s has SCC annotation with value: %s", pod.Name, sccName)
sccIdentifier, localerr := p.addSCC(sccName)
if localerr == nil {
if sccName, exists := pod.Annotations[common.SCCPodAnnotation]; exists {
if strings.TrimSpace(sccName) == "" {
p.Log.Warnf("[pod-backup] Pod %s has empty SCC annotation value", pod.Name)
} else {
p.Log.Infof("[pod-backup] Pod %s has SCC annotation with value: %s", pod.Name, sccName)
sccIdentifier, localerr := p.addSCC(sccName)
if localerr == nil {
additionalItems = append(additionalItems, sccIdentifier)
} else {
// continuing backup despite error, SCCs can be deleted after Pod creation
if errors.IsNotFound(localerr) {
p.Log.Warnf("[pod-backup] SCC %s for pod %s not found: %s", sccName, pod.Name, localerr.Error())
} else {
p.Log.Errorf("[pod-backup] Error adding SCC %s for pod %s: %s", sccName, pod.Name, localerr.Error())
// error will go out of scope otherwise
err = localerr
}
}
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@velero-plugins/pod/backup.go` around lines 49 - 53, The code reads
pod.Annotations[common.SCCPodAnnotation] into sccName and calls
p.addSCC(sccName) even when the annotation exists but is an empty string; change
the flow in the block that handles the SCC annotation so you first trim/check
sccName for empty (or whitespace-only) value and if empty skip calling p.addSCC
and log/Infof that the annotation was present but empty for pod.Name, otherwise
proceed to call p.addSCC(sccName) as before; update references in this block
(pod.Annotations, common.SCCPodAnnotation, sccName, p.addSCC) accordingly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the SCC isn't found the operation continues with a warning.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msfrucht, understood — the IsNotFound path does log a warning and continue gracefully. The subtle distinction in my suggestion is that an empty annotation value ("") passed to the Kubernetes API Get() call may produce a validation / bad-request error rather than a 404 NotFound, so errors.IsNotFound() would return false and the error would propagate upward via err = localerr. That said, this is a minor edge case (OpenShift itself sets the annotation, so an empty value would be unusual), and I'll defer to your judgment on whether it warrants the extra guard.

@sseago
Copy link
Contributor

sseago commented Mar 3, 2026

Does Velero handle duplicate objects?
If not, this will need to be extensively revised as the same SCC can be referenced many times.

Yes. This is fine. Velero already keeps a synchronized map that's updated each time an item is backed up. If BackupItem is called a second time for the item, it'sa no-op.

Should backup continue when SCC is not found?
SCC is evaluated as scheduling. Removing an SCC after a Pod starts will not cause Pod failure. I have wrote the changes such that SCC not found will not cause an error to handle that.

Velero never bails on a backup if something isn't found. If a referenced SCC is found, you have two choices:

  1. Just log a warning and don't add the SCC as an additional item (or even an info log, if we don't think users will care)
  2. Error out on the pod backup, but other items will continue to be backed up, and backup will be partially failed.

Sounds like 1) is what we want. It's not a fatal error, it's perhaps a warning at most.

Signed-off-by: Michael Fruchtman <msfrucht@us.ibm.com>
@msfrucht
Copy link
Author

msfrucht commented Mar 3, 2026

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
velero-plugins/pod/backup_test.go (3)

25-31: Consider unexporting test-only types.

ErrorType and its constants (NotFoundError, AccessDenied, NoError) are exported but only used within this test file. Unexporting them (errorType, notFoundError, etc.) would avoid polluting the package's public API.

Proposed fix
-type ErrorType string
+type errorType string
 
 const (
-	NotFoundError ErrorType = "NotFound"
-	AccessDenied  ErrorType = "AccessDenied"
-	NoError       ErrorType = "NoError"
+	notFoundError errorType = "NotFound"
+	accessDenied  errorType = "AccessDenied"
+	noError       errorType = "NoError"
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@velero-plugins/pod/backup_test.go` around lines 25 - 31, The test defines
exported identifiers ErrorType, NotFoundError, AccessDenied and NoError but
they’re only used in backup_test.go; rename the type to unexported errorType and
the constants to notFoundError, accessDenied and noError, and update all
references in the test file from ErrorType/NotFoundError/AccessDenied/NoError to
the new unexported names so the package public API is not polluted.

155-177: Use t.Run for test isolation and clearer output.

Running subtests with t.Run(test.name, ...) provides better test isolation, clearer failure messages, and the ability to run individual test cases.

Proposed fix
 	for _, test := range tests {
-		pod.Annotations = test.annotations
-
-		objs := []*securityv1.SecurityContextConstraints{scc}
-
-		podData, err := runtime.DefaultUnstructuredConverter.ToUnstructured(&pod)
-		assert.NoError(t, err)
-		unstructuredPod := unstructured.Unstructured{Object: podData}
-		clients.SecurityClient = getNewSecurityClient(test.inducedErrorType, test.securityClientError, objs...)
-
-		backupPlugin := &BackupPlugin{
-			Log: logrus.WithField("plugin", "pod-backup-test"),
-		}
-
-		_, items, err := backupPlugin.Execute(&unstructuredPod, &velerov1.Backup{})
-		if test.shouldErr {
-			assert.Error(t, err, "Test %s errored when should not %v", test.name, err)
-		} else {
-			assert.NoError(t, err, "Test %s errored when should not %v", test.name, err)
-		}
-		assert.Len(t, items, test.expectedSccCount, "Test %s Expected %d additional items for the SCC", test.name, test.expectedSccCount)
-		assert.ElementsMatch(t, items, test.expectedIdentifiers, "Test %s expected additional items to match expected identifiers", test.name)
+		t.Run(test.name, func(t *testing.T) {
+			testPod := pod.DeepCopy()
+			testPod.Annotations = test.annotations
+
+			objs := []*securityv1.SecurityContextConstraints{scc}
+
+			podData, err := runtime.DefaultUnstructuredConverter.ToUnstructured(testPod)
+			assert.NoError(t, err)
+			unstructuredPod := unstructured.Unstructured{Object: podData}
+			clients.SecurityClient = getNewSecurityClient(test.inducedErrorType, test.securityClientError, objs...)
+
+			backupPlugin := &BackupPlugin{
+				Log: logrus.WithField("plugin", "pod-backup-test"),
+			}
+
+			_, items, err := backupPlugin.Execute(&unstructuredPod, &velerov1.Backup{})
+			if test.shouldErr {
+				assert.Error(t, err)
+			} else {
+				assert.NoError(t, err)
+			}
+			assert.Len(t, items, test.expectedSccCount)
+			assert.ElementsMatch(t, items, test.expectedIdentifiers)
+		})
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@velero-plugins/pod/backup_test.go` around lines 155 - 177, Wrap each
iteration of the tests loop in a subtest using t.Run(test.name, func(t
*testing.T) { ... }) to isolate test cases: move the body that sets
pod.Annotations, converts to unstructured, sets clients.SecurityClient, creates
BackupPlugin, calls backupPlugin.Execute and all assertions into the t.Run
closure and use the closure's t for assertions; this ensures clearer output and
isolated execution for the tests that exercise BackupPlugin.Execute and the
assertions about items and errors.

53-62: Objects are re-created on each SecurityClient() invocation.

The returned function creates SCC objects every time it's called. If clients.SecurityClient() is invoked multiple times during a single test execution, this will attempt to recreate the same objects, potentially causing conflicts or unexpected behavior.

Consider creating objects once outside the returned function:

Proposed fix
 func getNewSecurityClient(errorType ErrorType, withSecurityClientError error, objects ...*securityv1.SecurityContextConstraints) func() (security.SecurityV1Interface, error) {
 	cs := fakeSecurityClient.NewSimpleClientset()
 
 	// add error if enabled
 	switch errorType {
 	case NotFoundError:
 		cs.Fake.PrependReactor("get", "securitycontextconstraints", func(action ktesting.Action) (handled bool, ret runtime.Object, err error) {
 			return true, nil, errors.NewNotFound(schema.GroupResource{Group: "security.openshift.io", Resource: "securitycontextconstraints"}, "test-scc")
 		})
 	case AccessDenied:
 		cs.Fake.PrependReactor("get", "securitycontextconstraints", func(action ktesting.Action) (handled bool, ret runtime.Object, err error) {
 			return true, nil, errors.NewForbidden(schema.GroupResource{Group: "security.openshift.io", Resource: "securitycontextconstraints"}, "test-scc", nil)
 		})
 	case NoError:
 		// do nothing, client will return objects as normal
 	}
 
+	// Create objects once during setup
+	client := cs.SecurityV1()
+	for _, object := range objects {
+		_, localError := client.SecurityContextConstraints().Create(context.Background(), object, metav1.CreateOptions{})
+		if localError != nil {
+			return func() (security.SecurityV1Interface, error) {
+				return nil, localError
+			}
+		}
+	}
+
 	return func() (security.SecurityV1Interface, error) {
-		client := cs.SecurityV1()
-		for _, object := range objects {
-			_, localError := client.SecurityContextConstraints().Create(context.Background(), object, metav1.CreateOptions{})
-			if localError != nil {
-				return nil, localError
-			}
-		}
 		return client, withSecurityClientError
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@velero-plugins/pod/backup_test.go` around lines 53 - 62, The current closure
returned by SecurityClient() recreates SCC objects on every invocation; move the
creation loop out of the returned func so objects are created once.
Specifically, call client := cs.SecurityV1() and iterate over objects calling
client.SecurityContextConstraints().Create(...) (capturing any create error into
a variable like withSecurityClientError) before returning the func; then have
the returned closure simply return the previously-built client and the captured
error (instead of re-running client.SecurityContextConstraints().Create inside
the closure).
velero-plugins/pod/backup.go (1)

35-36: Simplify error declaration.

var err error = nil can be simplified to var err error since the zero value of error is already nil.

Proposed fix
 	var additionalItems []velero.ResourceIdentifier
-	var err error = nil
+	var err error
 	pod := corev1API.Pod{}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@velero-plugins/pod/backup.go` around lines 35 - 36, The variable declaration
uses an unnecessary explicit nil assignment; update the declaration of err from
"var err error = nil" to simply "var err error" (near the additionalItems/err
declarations) to rely on the error zero value and simplify the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@velero-plugins/pod/backup_test.go`:
- Around line 170-174: The assertion messages are inverted: in the
test.shouldErr branch (where you call assert.Error) update the message to
indicate the test expected an error (e.g., "Test %s expected error but got none:
%v" or "Test %s expected an error, got: %v") and in the else branch
(assert.NoError) change the message to indicate an unexpected error (e.g., "Test
%s unexpected error: %v"); modify the assert.Error and assert.NoError calls that
reference test.shouldErr, test.name, t, and err to use these corrected messages
so the output matches the test intent.

---

Nitpick comments:
In `@velero-plugins/pod/backup_test.go`:
- Around line 25-31: The test defines exported identifiers ErrorType,
NotFoundError, AccessDenied and NoError but they’re only used in backup_test.go;
rename the type to unexported errorType and the constants to notFoundError,
accessDenied and noError, and update all references in the test file from
ErrorType/NotFoundError/AccessDenied/NoError to the new unexported names so the
package public API is not polluted.
- Around line 155-177: Wrap each iteration of the tests loop in a subtest using
t.Run(test.name, func(t *testing.T) { ... }) to isolate test cases: move the
body that sets pod.Annotations, converts to unstructured, sets
clients.SecurityClient, creates BackupPlugin, calls backupPlugin.Execute and all
assertions into the t.Run closure and use the closure's t for assertions; this
ensures clearer output and isolated execution for the tests that exercise
BackupPlugin.Execute and the assertions about items and errors.
- Around line 53-62: The current closure returned by SecurityClient() recreates
SCC objects on every invocation; move the creation loop out of the returned func
so objects are created once. Specifically, call client := cs.SecurityV1() and
iterate over objects calling client.SecurityContextConstraints().Create(...)
(capturing any create error into a variable like withSecurityClientError) before
returning the func; then have the returned closure simply return the
previously-built client and the captured error (instead of re-running
client.SecurityContextConstraints().Create inside the closure).

In `@velero-plugins/pod/backup.go`:
- Around line 35-36: The variable declaration uses an unnecessary explicit nil
assignment; update the declaration of err from "var err error = nil" to simply
"var err error" (near the additionalItems/err declarations) to rely on the error
zero value and simplify the code.

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 6ace8d8 and 99dde3c.

📒 Files selected for processing (2)
  • velero-plugins/pod/backup.go
  • velero-plugins/pod/backup_test.go

Signed-off-by: Michael Fruchtman <msfrucht@us.ibm.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
velero-plugins/pod/backup.go (2)

74-83: Consider adding a context timeout for the API call.

Using context.Background() without a timeout means the backup could hang indefinitely if the API server is unresponsive. Consider using a context with a deadline for resilience.

Suggested improvement
+import "time"
+
 func (p *BackupPlugin) addSCC(sccName string) (velero.ResourceIdentifier, error) {
 	securityClient, err := clients.SecurityClient()
 	if err != nil {
 		return velero.ResourceIdentifier{}, err
 	}
 
-	scc, err := securityClient.SecurityContextConstraints().Get(context.Background(), sccName, metav1.GetOptions{})
+	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+	defer cancel()
+	scc, err := securityClient.SecurityContextConstraints().Get(ctx, sccName, metav1.GetOptions{})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@velero-plugins/pod/backup.go` around lines 74 - 83, The call to
securityClient.SecurityContextConstraints().Get in addSCC currently uses
context.Background() and can hang; change it to use a context with timeout
(e.g., context.WithTimeout) and a cancel() to enforce a deadline (choose a
sensible default like 10s or use a configurable value), pass the resulting ctx
into Get, and ensure defer cancel() is called immediately after creating the
context to avoid leaks.

36-36: Redundant nil initialization.

In Go, var err error already defaults to nil. The explicit assignment is unnecessary.

Suggested simplification
-	var err error = nil
+	var err error
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@velero-plugins/pod/backup.go` at line 36, Remove the redundant explicit nil
assignment for the error variable: replace the declaration "var err error = nil"
with a simple "var err error" (or use short declaration where appropriate) so
the zero-value nil is relied upon; locate the declaration of err in the relevant
function in backup.go and update it accordingly (symbol: err).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@velero-plugins/pod/backup.go`:
- Around line 74-83: The call to securityClient.SecurityContextConstraints().Get
in addSCC currently uses context.Background() and can hang; change it to use a
context with timeout (e.g., context.WithTimeout) and a cancel() to enforce a
deadline (choose a sensible default like 10s or use a configurable value), pass
the resulting ctx into Get, and ensure defer cancel() is called immediately
after creating the context to avoid leaks.
- Line 36: Remove the redundant explicit nil assignment for the error variable:
replace the declaration "var err error = nil" with a simple "var err error" (or
use short declaration where appropriate) so the zero-value nil is relied upon;
locate the declaration of err in the relevant function in backup.go and update
it accordingly (symbol: err).

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 99dde3c and 0fc4d62.

📒 Files selected for processing (1)
  • velero-plugins/pod/backup.go

Signed-off-by: Michael Fruchtman <msfrucht@us.ibm.com>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these unused ? Lets remove them

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. Sorry, missed it in normalizing the security client.

p.Log.Info("[pod-backup] Entering Pod backup plugin")

var additionalItems []velero.ResourceIdentifier
var err error = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: var err error

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved. Sorry, sometimes my python sneaks in. There is no such thing as default values in python, all values must be declared.

Signed-off-by: Michael Fruchtman <msfrucht@us.ibm.com>
Signed-off-by: Michael Fruchtman <msfrucht@us.ibm.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
velero-plugins/pod/backup_test.go (1)

155-177: Consider using t.Run for subtests.

The current approach works but using t.Run(test.name, func(t *testing.T) {...}) would provide:

  • Clearer test output with each subtest named individually
  • Independent failure handling per subtest
  • Ability to run specific subtests via -run flag
♻️ Proposed refactor using t.Run
 	for _, test := range tests {
-		pod.Annotations = test.annotations
-
-		objs := []*securityv1.SecurityContextConstraints{scc}
-
-		podData, err := runtime.DefaultUnstructuredConverter.ToUnstructured(&pod)
-		assert.NoError(t, err)
-		unstructuredPod := unstructured.Unstructured{Object: podData}
-		clients.SecurityClient = getNewSecurityClient(test.inducedErrorType, test.securityClientError, objs...)
-
-		backupPlugin := &BackupPlugin{
-			Log: logrus.WithField("plugin", "pod-backup-test"),
-		}
-
-		_, items, err := backupPlugin.Execute(&unstructuredPod, &velerov1.Backup{})
-		if test.shouldErr {
-			assert.Error(t, err, "Test %s NOT errored when should %v", test.name, err)
-		} else {
-			assert.NoError(t, err, "Test %s errored when should succeed %v", test.name, err)
-		}
-		assert.Len(t, items, test.expectedSccCount, "Test %s Expected %d additional items for the SCC", test.name, test.expectedSccCount)
-		assert.ElementsMatch(t, items, test.expectedIdentifiers, "Test %s expected additional items to match expected identifiers", test.name)
+		t.Run(test.name, func(t *testing.T) {
+			testPod := pod
+			testPod.Annotations = test.annotations
+
+			objs := []*securityv1.SecurityContextConstraints{scc}
+
+			podData, err := runtime.DefaultUnstructuredConverter.ToUnstructured(&testPod)
+			assert.NoError(t, err)
+			unstructuredPod := unstructured.Unstructured{Object: podData}
+			clients.SecurityClient = getNewSecurityClient(test.inducedErrorType, test.securityClientError, objs...)
+
+			backupPlugin := &BackupPlugin{
+				Log: logrus.WithField("plugin", "pod-backup-test"),
+			}
+
+			_, items, err := backupPlugin.Execute(&unstructuredPod, &velerov1.Backup{})
+			if test.shouldErr {
+				assert.Error(t, err)
+			} else {
+				assert.NoError(t, err)
+			}
+			assert.Len(t, items, test.expectedSccCount)
+			assert.ElementsMatch(t, items, test.expectedIdentifiers)
+		})
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@velero-plugins/pod/backup_test.go` around lines 155 - 177, Wrap each
table-driven iteration in t.Run(test.name, func(t *testing.T) { ... }) to create
named subtests; move the body of the for loop into the closure, ensure you
shadow the loop variable (e.g., tc := test) before using it inside the closure
to avoid capture issues, and call BackupPlugin.Execute and the assertions
(assert.NoError/assert.Error/assert.Len/assert.ElementsMatch) using the
closure's t. Keep the setup steps (pod.Annotations,
runtime.DefaultUnstructuredConverter, clients.SecurityClient =
getNewSecurityClient(...), and backupPlugin := &BackupPlugin{...}) inside the
t.Run block so each subtest runs independently and reports its own result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@velero-plugins/pod/backup_test.go`:
- Around line 155-177: Wrap each table-driven iteration in t.Run(test.name,
func(t *testing.T) { ... }) to create named subtests; move the body of the for
loop into the closure, ensure you shadow the loop variable (e.g., tc := test)
before using it inside the closure to avoid capture issues, and call
BackupPlugin.Execute and the assertions
(assert.NoError/assert.Error/assert.Len/assert.ElementsMatch) using the
closure's t. Keep the setup steps (pod.Annotations,
runtime.DefaultUnstructuredConverter, clients.SecurityClient =
getNewSecurityClient(...), and backupPlugin := &BackupPlugin{...}) inside the
t.Run block so each subtest runs independently and reports its own result.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4e90d2be-61a3-4785-afaa-2ea5ba782b9f

📥 Commits

Reviewing files that changed from the base of the PR and between 0fc4d62 and c8e73ec.

📒 Files selected for processing (3)
  • velero-plugins/pod/backup.go
  • velero-plugins/pod/backup_test.go
  • velero-plugins/serviceaccount/backup.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • velero-plugins/pod/backup.go
  • velero-plugins/serviceaccount/backup.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
velero-plugins/clients/clients.go (1)

253-254: The exported function variable pattern is working as intended for test overrides, but refactoring to an exported wrapper function would improve API clarity.

This is an intentional test override mechanism used only in velero-plugins/pod/backup_test.go. Production code (backup.go, backup.go) safely calls it as a function. However, the current pattern is unconventional—moving to an exported function wrapper makes the API boundary more explicit and reduces the surface area for accidental misuse:

♻️ Clearer API shape
-// this setup allows for fake client overrides in unit tests
-var SecurityClient = getSecurityClient
+var securityClientProvider = getSecurityClient
+
+// SecurityClient returns an openshift SecurityV1Client.
+func SecurityClient() (securityv1.SecurityV1Interface, error) {
+	return securityClientProvider()
+}

Update test assignments to use clients.securityClientProvider instead of clients.SecurityClient.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@velero-plugins/clients/clients.go` around lines 253 - 254, Replace the
exported function-variable pattern by introducing an internal provider variable
and an exported wrapper function: rename the current var SecurityClient =
getSecurityClient to securityClientProvider = getSecurityClient (or similar
unexported identifier) and add an exported function SecurityClient() that simply
returns securityClientProvider(); update unit tests to assign the fake provider
to clients.securityClientProvider instead of reassigning clients.SecurityClient,
and ensure all production call sites continue invoking SecurityClient() as a
function.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@velero-plugins/clients/clients.go`:
- Around line 257-261: The lazy-initialized client getters (e.g.,
getSecurityClient, getCoreClient, getImageClient, getDiscoveryClient,
getRouteClient, getBuildClient, getAppsClient, getOCPAppsClient,
getOCPConfigClient, getOCPImageRegistryConfigClient) use an unsynchronized
nil-check-then-init pattern that races under concurrent calls; replace that
pattern by introducing a sync.Once per client (e.g., securityClientOnce) and
move the newXxxClient() call into the Once.Do closure so initialization runs
exactly once; add the "sync" import and ensure each getter returns the stored
client and error after the Once.Do to preserve behavior.

---

Nitpick comments:
In `@velero-plugins/clients/clients.go`:
- Around line 253-254: Replace the exported function-variable pattern by
introducing an internal provider variable and an exported wrapper function:
rename the current var SecurityClient = getSecurityClient to
securityClientProvider = getSecurityClient (or similar unexported identifier)
and add an exported function SecurityClient() that simply returns
securityClientProvider(); update unit tests to assign the fake provider to
clients.securityClientProvider instead of reassigning clients.SecurityClient,
and ensure all production call sites continue invoking SecurityClient() as a
function.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 71592b3b-2eff-4ba4-bee8-b119ad91181c

📥 Commits

Reviewing files that changed from the base of the PR and between c8e73ec and d01f396.

📒 Files selected for processing (1)
  • velero-plugins/clients/clients.go

Comment on lines +257 to +261
func getSecurityClient() (securityv1.SecurityV1Interface, error) {
if securityClient == nil && securityClientError == nil {
securityClient, securityClientError = newSecurityClient()
}
return securityClient, securityClientError
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file structure and the specific lines mentioned
fd -t f -p "clients\.go" velero-plugins/clients/ | head -5

Repository: openshift/openshift-velero-plugin

Length of output: 110


🏁 Script executed:

# Check the actual content around lines 257-261
head -n 270 velero-plugins/clients/clients.go | tail -n 30

Repository: openshift/openshift-velero-plugin

Length of output: 949


🏁 Script executed:

# Search for unsynchronized lazy init patterns
rg -n -C2 'if .*== nil && .*== nil \{' velero-plugins/clients/clients.go

Repository: openshift/openshift-velero-plugin

Length of output: 2448


🏁 Script executed:

# Check for sync package imports and usage
rg -n 'sync\.Once|sync\.Mutex|"sync"' velero-plugins/clients/clients.go

Repository: openshift/openshift-velero-plugin

Length of output: 59


🏁 Script executed:

# Find all client getter functions to check consistency
rg -n 'func get.*Client\(\)' velero-plugins/clients/clients.go

Repository: openshift/openshift-velero-plugin

Length of output: 147


Fix race condition in lazy-initialized client getters across the file.

Lines 258-259 show an unsynchronized check-then-initialize pattern that is vulnerable to concurrent access races. The same issue exists in at least 9 other client getter functions (CoreClient, ImageClient, DiscoveryClient, RouteClient, BuildClient, AppsClient, OCPAppsClient, OCPConfigClient, OCPImageRegistryConfigClient). When multiple goroutines concurrently call these functions, they can race between the nil check and the initialization assignment, resulting in nondeterministic state.

Use sync.Once to synchronize initialization across all client getter functions. The sync package is not currently imported and must be added.

🔧 Proposed fix (use sync.Once for thread-safe one-time init)
 import (
+	"sync"
 	ocpappsv1 "github.com/openshift/client-go/apps/clientset/versioned/typed/apps/v1"
 	buildv1 "github.com/openshift/client-go/build/clientset/versioned/typed/build/v1"
 	ocpconfigv1 "github.com/openshift/client-go/config/clientset/versioned/typed/config/v1"
@@
 // SCC client
 var securityClient securityv1.SecurityV1Interface
 var securityClientError error
+var securityClientOnce sync.Once
@@
 func getSecurityClient() (securityv1.SecurityV1Interface, error) {
-	if securityClient == nil && securityClientError == nil {
-		securityClient, securityClientError = newSecurityClient()
-	}
+	securityClientOnce.Do(func() {
+		securityClient, securityClientError = newSecurityClient()
+	})
 	return securityClient, securityClientError
 }
@@
 func init() {
@@
 	securityClient, securityClientError = nil, nil
+	securityClientOnce = sync.Once{}
 }

Apply the same pattern systematically to all 9+ remaining client getters.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@velero-plugins/clients/clients.go` around lines 257 - 261, The
lazy-initialized client getters (e.g., getSecurityClient, getCoreClient,
getImageClient, getDiscoveryClient, getRouteClient, getBuildClient,
getAppsClient, getOCPAppsClient, getOCPConfigClient,
getOCPImageRegistryConfigClient) use an unsynchronized nil-check-then-init
pattern that races under concurrent calls; replace that pattern by introducing a
sync.Once per client (e.g., securityClientOnce) and move the newXxxClient() call
into the Once.Do closure so initialization runs exactly once; add the "sync"
import and ensure each getter returns the stored client and error after the
Once.Do to preserve behavior.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 4, 2026
@openshift-ci
Copy link

openshift-ci bot commented Mar 4, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msfrucht, shubham-pampattiwar, sseago

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [shubham-pampattiwar,sseago]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sseago
Copy link
Contributor

sseago commented Mar 4, 2026

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 4, 2026
@openshift-ci
Copy link

openshift-ci bot commented Mar 4, 2026

@msfrucht: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/unit-test d01f396 link true /test unit-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. ok-to-test Indicates a non-member PR verified by an org member that is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants