Update todolist sample app for a more robust todo2 container and other stability improvements#2112
Conversation
Signed-off-by: Wesley Hayutin <weshayutin@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/e2e/sample-applications/mysql-persistent/mysql-persistent.yaml (1)
98-98: Replace:latesttag with a pinned digest for reproducible test runs.The
:latesttag is mutable and can introduce test flakes when the image is updated. This issue appears across multiple e2e sample application manifests; consider pinning to a specific@sha256:<digest>instead.♻️ Proposed fix
- image: quay.io/migtools/oadp-ci-todo2-go-testing-mariadb:latest + image: quay.io/migtools/oadp-ci-todo2-go-testing-mariadb@sha256:<pinned-digest>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/sample-applications/mysql-persistent/mysql-persistent.yaml` at line 98, The manifest currently pins the container image with a mutable tag "quay.io/migtools/oadp-ci-todo2-go-testing-mariadb:latest"; replace that tag with an immutable digest (e.g., "quay.io/migtools/oadp-ci-todo2-go-testing-mariadb@sha256:<digest>") by resolving the correct sha256 via the registry (quay or `skopeo/ctr/podman` inspect) and update this image reference; also scan other e2e sample manifests for the same pattern and pin those images similarly to ensure reproducible test runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/e2e/lib/apps.go`:
- Around line 496-516: The readyz loop treats err == nil as success but
MakeRequest (used with getRequestParameters) doesn't enforce HTTP success codes,
so HTTP 5xx/500 responses can falsely mark ready; modify MakeRequest to return
the HTTP status code (e.g., add a statusCode int return) or otherwise expose it,
then in the readyz check loop (the block using getRequestParameters and
MakeRequest) require statusCode == 200 (and optionally validate a known body
like "ok"/"ready" if the app defines one) before breaking; also update any tests
that call MakeRequest (and the proxy curl invocations in common_helpers.go) to
ensure curl uses --fail or that MakeRequest treats non-2xx responses as errors
so the readyz check only passes on a real healthy response.
---
Nitpick comments:
In `@tests/e2e/sample-applications/mysql-persistent/mysql-persistent.yaml`:
- Line 98: The manifest currently pins the container image with a mutable tag
"quay.io/migtools/oadp-ci-todo2-go-testing-mariadb:latest"; replace that tag
with an immutable digest (e.g.,
"quay.io/migtools/oadp-ci-todo2-go-testing-mariadb@sha256:<digest>") by
resolving the correct sha256 via the registry (quay or `skopeo/ctr/podman`
inspect) and update this image reference; also scan other e2e sample manifests
for the same pattern and pin those images similarly to ensure reproducible test
runs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e6a2acaa-fb4c-4049-afaf-5cdd83eafeb4
📒 Files selected for processing (6)
tests/e2e/lib/apps.gotests/e2e/sample-applications/mongo-persistent/mongo-persistent-csi.yamltests/e2e/sample-applications/mongo-persistent/mongo-persistent.yamltests/e2e/sample-applications/mysql-persistent/mysql-persistent-csi.yamltests/e2e/sample-applications/mysql-persistent/mysql-persistent-twovol-csi.yamltests/e2e/sample-applications/mysql-persistent/mysql-persistent.yaml
Signed-off-by: Wesley Hayutin <weshayutin@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
tests/e2e/sample-applications/mysql-persistent/mysql-persistent.yaml (1)
36-38:allowedCapabilitiesis good, but SCC is still effectively privileged.Line 36 adds a capability allowlist, but with
allowPrivilegedContainer: trueandallowPrivilegeEscalation: truein the same SCC, this does not materially enforce least privilege. Consider tightening those flags if this manifest is intended to model secure defaults.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/sample-applications/mysql-persistent/mysql-persistent.yaml` around lines 36 - 38, The SCC currently lists allowedCapabilities but still permits privileged behavior; update the SecurityContextConstraints/SCC by setting allowPrivilegedContainer: false and allowPrivilegeEscalation: false (or remove these true settings) so the capability allowlist is effective; locate the SCC resource containing allowedCapabilities, allowPrivilegedContainer, and allowPrivilegeEscalation and tighten those flags to false to enforce least privilege.tests/e2e/backup_restore_cli_suite_test.go (1)
178-191: Extract duplicated KOPIA pod-restart logic into a shared helper.This block is duplicated with
tests/e2e/backup_restore_suite_test.go(Line 269-282). A shared helper in packagee2e_testwill reduce drift between CLI and non-CLI restore paths.Refactor sketch
- if brCase.BackupRestoreType == lib.KOPIA { - log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace) - err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection( - context.Background(), - metav1.DeleteOptions{}, - metav1.ListOptions{LabelSelector: "e2e-app=true"}, - ) - gomega.Expect(err).ToNot(gomega.HaveOccurred()) - } + restartRestoredAppPodsIfNeeded(brCase.BackupRestoreType, brCase.Namespace)// package e2e_test (shared by both *_suite_test.go files) func restartRestoredAppPodsIfNeeded(backupRestoreType lib.BackupRestoreType, appNamespace string) { if backupRestoreType != lib.KOPIA { return } log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", appNamespace) err := kubernetesClientForSuiteRun.CoreV1().Pods(appNamespace).DeleteCollection( context.Background(), metav1.DeleteOptions{}, metav1.ListOptions{LabelSelector: "e2e-app=true"}, ) gomega.Expect(err).ToNot(gomega.HaveOccurred()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/backup_restore_cli_suite_test.go` around lines 178 - 191, Extract the duplicated KOPIA pod-restart logic into a shared helper named restartRestoredAppPodsIfNeeded(backupRestoreType lib.BackupRestoreType, appNamespace string) in package e2e_test; implement it to return early if backupRestoreType != lib.KOPIA, otherwise log the restart message and call kubernetesClientForSuiteRun.CoreV1().Pods(appNamespace).DeleteCollection(context.Background(), metav1.DeleteOptions{}, metav1.ListOptions{LabelSelector: "e2e-app=true"}) and assert no error with gomega.Expect(err).ToNot(gomega.HaveOccurred()); then replace the duplicated blocks in both backup_restore_cli_suite_test.go and backup_restore_suite_test.go with a single call to restartRestoredAppPodsIfNeeded(brCase.BackupRestoreType, brCase.Namespace).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/e2e/backup_restore_suite_test.go`:
- Around line 269-282: The DeleteCollection call used when
brCase.BackupRestoreType == lib.KOPIA
(kubernetesClientForSuiteRun.CoreV1().Pods(...).DeleteCollection) currently uses
context.Background() and can hang tests; replace it with a context created via
context.WithTimeout (choose a reasonable timeout like 30s), pass that context to
DeleteCollection, and defer the cancel() to ensure the call is bounded; make the
identical change in tests/e2e/backup_restore_cli_suite_test.go for the
corresponding DeleteCollection call as well.
In `@tests/e2e/sample-applications/mysql-persistent/mysql-persistent.yaml`:
- Line 101: Replace the mutable image tag used in the manifest: update the image
reference quay.io/migtools/oadp-ci-todo2-go-testing-mariadb:latest to an
immutable tag or digest (e.g.,
quay.io/migtools/oadp-ci-todo2-go-testing-mariadb:<specific-version> or
quay.io/migtools/oadp-ci-todo2-go-testing-mariadb@sha256:<digest>) so the e2e
manifest in mysql-persistent.yaml references a fixed artifact and tests become
deterministic.
---
Nitpick comments:
In `@tests/e2e/backup_restore_cli_suite_test.go`:
- Around line 178-191: Extract the duplicated KOPIA pod-restart logic into a
shared helper named restartRestoredAppPodsIfNeeded(backupRestoreType
lib.BackupRestoreType, appNamespace string) in package e2e_test; implement it to
return early if backupRestoreType != lib.KOPIA, otherwise log the restart
message and call
kubernetesClientForSuiteRun.CoreV1().Pods(appNamespace).DeleteCollection(context.Background(),
metav1.DeleteOptions{}, metav1.ListOptions{LabelSelector: "e2e-app=true"}) and
assert no error with gomega.Expect(err).ToNot(gomega.HaveOccurred()); then
replace the duplicated blocks in both backup_restore_cli_suite_test.go and
backup_restore_suite_test.go with a single call to
restartRestoredAppPodsIfNeeded(brCase.BackupRestoreType, brCase.Namespace).
In `@tests/e2e/sample-applications/mysql-persistent/mysql-persistent.yaml`:
- Around line 36-38: The SCC currently lists allowedCapabilities but still
permits privileged behavior; update the SecurityContextConstraints/SCC by
setting allowPrivilegedContainer: false and allowPrivilegeEscalation: false (or
remove these true settings) so the capability allowlist is effective; locate the
SCC resource containing allowedCapabilities, allowPrivilegedContainer, and
allowPrivilegeEscalation and tighten those flags to false to enforce least
privilege.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 060dfeb3-febd-4e79-98f4-b425fd3bddcb
📒 Files selected for processing (3)
tests/e2e/backup_restore_cli_suite_test.gotests/e2e/backup_restore_suite_test.gotests/e2e/sample-applications/mysql-persistent/mysql-persistent.yaml
Signed-off-by: Wesley Hayutin <weshayutin@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
tests/e2e/sample-applications/mongo-persistent/mongo-persistent-csi.yaml (1)
80-100: Recommend matching readiness behavior with the other Mongo restore checks.For consistency with the updated block variant and restore readiness flow, consider using
/readyzfor readiness in this variant as well.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/sample-applications/mongo-persistent/mongo-persistent-csi.yaml` around lines 80 - 100, The readiness behavior in this Mongo variant should match the other Mongo restore checks by using the /readyz endpoint; update the container's readinessProbe (the container with image quay.io/migtools/oadp-ci-todo2-go-testing-mongodb:latest) to probe /readyz instead of the current path (or add a readinessProbe if missing) so the restore readiness flow is consistent with the updated block variant.tests/e2e/sample-applications/mongo-persistent/mongo-persistent.yaml (1)
93-107: Consider aligning readiness endpoint with/readyzacross Mongo variants.This file still uses
/healthzprobes while the block variant moved readiness to/readyz. Keeping endpoint semantics uniform across variants reduces e2e drift.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/sample-applications/mongo-persistent/mongo-persistent.yaml` around lines 93 - 107, The readiness endpoint in this Kubernetes container spec uses /healthz while other Mongo variants use /readyz; update the readinessProbe configuration (look for the readinessProbe.path entry in this container spec) to use "/readyz" instead of "/healthz" so the readinessProbe aligns with other Mongo variants (leave other fields like livenessProbe, ports, env, volumeMounts unchanged).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/e2e/sample-applications/mongo-persistent/mongo-persistent-block.yaml`:
- Around line 97-100: The init-container currently installs e2fsprogs at runtime
using apt-get; instead, update the container image (used by this init-container)
to include e2fsprogs baked in and change the init logic to validate its presence
(e.g., check for mkfs.ext4 or e2fsck) rather than running apt-get. Locate the
init script that sets DEVICE and MOUNT_POINT and replace the apt-get install
lines with a runtime check that exits with a clear error if the required
binaries are missing, ensuring the test image is rebuilt with e2fsprogs
included.
- Around line 141-145: The startup snippet should fail fast and safely quote
variables: add "set -e" at the top of the script and change unquoted variable
uses to quoted ones (use "$DEVICE" and "$MOUNT_POINT" wherever referenced, e.g.,
in mkdir and mount calls) so mount failures abort the script instead of falling
through to exec /opt/todolist/entrypoint.sh; ensure the mount command is run as
mount "$DEVICE" "$MOUNT_POINT" (or followed by an explicit check/exit on
failure) before exec-ing entrypoint.sh.
---
Nitpick comments:
In `@tests/e2e/sample-applications/mongo-persistent/mongo-persistent-csi.yaml`:
- Around line 80-100: The readiness behavior in this Mongo variant should match
the other Mongo restore checks by using the /readyz endpoint; update the
container's readinessProbe (the container with image
quay.io/migtools/oadp-ci-todo2-go-testing-mongodb:latest) to probe /readyz
instead of the current path (or add a readinessProbe if missing) so the restore
readiness flow is consistent with the updated block variant.
In `@tests/e2e/sample-applications/mongo-persistent/mongo-persistent.yaml`:
- Around line 93-107: The readiness endpoint in this Kubernetes container spec
uses /healthz while other Mongo variants use /readyz; update the readinessProbe
configuration (look for the readinessProbe.path entry in this container spec) to
use "/readyz" instead of "/healthz" so the readinessProbe aligns with other
Mongo variants (leave other fields like livenessProbe, ports, env, volumeMounts
unchanged).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e9cf6dde-fd7b-4214-990e-405b4c7c23af
📒 Files selected for processing (3)
tests/e2e/sample-applications/mongo-persistent/mongo-persistent-block.yamltests/e2e/sample-applications/mongo-persistent/mongo-persistent-csi.yamltests/e2e/sample-applications/mongo-persistent/mongo-persistent.yaml
tests/e2e/sample-applications/mongo-persistent/mongo-persistent-block.yaml
Show resolved
Hide resolved
tests/e2e/sample-applications/mongo-persistent/mongo-persistent-block.yaml
Show resolved
Hide resolved
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/e2e/lib/apps.go (1)
498-529:⚠️ Potential issue | 🔴 Critical
skipReadyzcurrently skips restore data verification for mysql/mongo.Line 498 gates both the
/readyzprobe and the/todo-incompletefetch. WhenskipReadyzistrue,respDatais never populated formysql-persistent/mongo-persistent, so the comparison at Lines 550-553 can fail with empty response data. This breaks the intended “skip healthz only” behavior.Proposed fix
- if !shouldSkipReadyz && (namespace == "mysql-persistent" || namespace == "mongo-persistent") { - // ensure that the application endpoint is reachable - requestParams := getRequestParameters(appEndpointURL+"/readyz", proxyPodParams, GET, nil) - const maxReadyzAttempts = 5 - for attempt := 1; attempt <= maxReadyzAttempts; attempt++ { - log.Printf("readyz check attempt %d/%d: GET %s/readyz\n", attempt, maxReadyzAttempts, appEndpointURL) - respData, errResp, err = MakeRequest(*requestParams) - if err == nil { - log.Printf("readyz endpoint is alive (attempt %d/%d): %s\n", attempt, maxReadyzAttempts, respData) - break - } - if errResp != "" { - log.Printf("Request response error msg: %s\n", errResp) - } - if attempt == maxReadyzAttempts { - log.Printf("readyz endpoint did not become alive after %d attempts: %v\n", maxReadyzAttempts, err) - return err - } - backoff := time.Duration(attempt) * 5 * time.Second - log.Printf("readyz attempt %d/%d failed, retrying in %s: %v\n", attempt, maxReadyzAttempts, backoff, err) - time.Sleep(backoff) - } - // Make request to the "todo-incomplete" endpoint - requestParamsTodoIncomplete := getRequestParameters(appEndpointURL+"/todo-incomplete", proxyPodParams, GET, nil) - respData, errResp, err = MakeRequest(*requestParamsTodoIncomplete) - if err != nil { - if errResp != "" { - log.Printf("Request response error msg: %s\n", errResp) - } - return err - } - } + if namespace == "mysql-persistent" || namespace == "mongo-persistent" { + if !shouldSkipReadyz { + // ensure that the application endpoint is reachable + requestParams := getRequestParameters(appEndpointURL+"/readyz", proxyPodParams, GET, nil) + const maxReadyzAttempts = 5 + for attempt := 1; attempt <= maxReadyzAttempts; attempt++ { + log.Printf("readyz check attempt %d/%d: GET %s/readyz\n", attempt, maxReadyzAttempts, appEndpointURL) + respData, errResp, err = MakeRequest(*requestParams) + if err == nil { + log.Printf("readyz endpoint is alive (attempt %d/%d): %s\n", attempt, maxReadyzAttempts, respData) + break + } + if errResp != "" { + log.Printf("Request response error msg: %s\n", errResp) + } + if attempt == maxReadyzAttempts { + log.Printf("readyz endpoint did not become alive after %d attempts: %v\n", maxReadyzAttempts, err) + return err + } + backoff := time.Duration(attempt) * 5 * time.Second + log.Printf("readyz attempt %d/%d failed, retrying in %s: %v\n", attempt, maxReadyzAttempts, backoff, err) + time.Sleep(backoff) + } + } + // always fetch restore data, even when readyz is skipped + requestParamsTodoIncomplete := getRequestParameters(appEndpointURL+"/todo-incomplete", proxyPodParams, GET, nil) + respData, errResp, err = MakeRequest(*requestParamsTodoIncomplete) + if err != nil { + if errResp != "" { + log.Printf("Request response error msg: %s\n", errResp) + } + return err + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/lib/apps.go` around lines 498 - 529, The current if-block around shouldSkipReadyz also skips the "/todo-incomplete" verification for mysql-persistent/mongo-persistent; change logic so only the "/readyz" probe is skipped when shouldSkipReadyz is true but the "/todo-incomplete" request always runs. Concretely, keep the readyz retry loop guarded by shouldSkipReadyz (the block using appEndpointURL+"/readyz", maxReadyzAttempts, backoff and MakeRequest), then move or duplicate the getRequestParameters(appEndpointURL+"/todo-incomplete", proxyPodParams, GET, nil) + respData, errResp, err = MakeRequest(*requestParamsTodoIncomplete) handling outside that conditional so respData is populated for subsequent comparisons even when shouldSkipReadyz is true.
♻️ Duplicate comments (1)
tests/e2e/lib/apps.go (1)
505-507:⚠️ Potential issue | 🟠 Major
/readyzprobe can still false-pass on HTTP error responses.Line 505 treats
err == nilas readiness, but this was already flagged: transport success is not equivalent to health success. Please enforce HTTP success semantics (or a known ready payload) before breaking the retry loop.#!/bin/bash # Verify current readyz success criterion and request execution behavior (read-only). set -euo pipefail echo "== Verify readyz loop in tests/e2e/lib/apps.go ==" rg -n -C3 'readyz|err == nil|MakeRequest\(' tests/e2e/lib/apps.go echo echo "== Verify MakeRequest proxy curl flags in tests/e2e/lib/common_helpers.go ==" rg -n -C3 'curl|--silent|--show-error|--fail|MakeRequest\(' tests/e2e/lib/common_helpers.go🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/lib/apps.go` around lines 505 - 507, The readyz loop currently treats transport success (err == nil) as readiness; update the loop that calls MakeRequest for the readyz endpoint so it also validates HTTP success semantics or a known success payload before breaking: use the status code returned by MakeRequest (or the response object) and only break when the status is in the 2xx range (or respData contains the expected ready payload), otherwise log the non-2xx status/body and continue retrying; adjust the condition around err == nil in the readyz check (variables: MakeRequest, respData, attempt, maxReadyzAttempts) to require both no error and an explicit HTTP success/payload match.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tests/e2e/lib/apps.go`:
- Around line 498-529: The current if-block around shouldSkipReadyz also skips
the "/todo-incomplete" verification for mysql-persistent/mongo-persistent;
change logic so only the "/readyz" probe is skipped when shouldSkipReadyz is
true but the "/todo-incomplete" request always runs. Concretely, keep the readyz
retry loop guarded by shouldSkipReadyz (the block using
appEndpointURL+"/readyz", maxReadyzAttempts, backoff and MakeRequest), then move
or duplicate the getRequestParameters(appEndpointURL+"/todo-incomplete",
proxyPodParams, GET, nil) + respData, errResp, err =
MakeRequest(*requestParamsTodoIncomplete) handling outside that conditional so
respData is populated for subsequent comparisons even when shouldSkipReadyz is
true.
---
Duplicate comments:
In `@tests/e2e/lib/apps.go`:
- Around line 505-507: The readyz loop currently treats transport success (err
== nil) as readiness; update the loop that calls MakeRequest for the readyz
endpoint so it also validates HTTP success semantics or a known success payload
before breaking: use the status code returned by MakeRequest (or the response
object) and only break when the status is in the 2xx range (or respData contains
the expected ready payload), otherwise log the non-2xx status/body and continue
retrying; adjust the condition around err == nil in the readyz check (variables:
MakeRequest, respData, attempt, maxReadyzAttempts) to require both no error and
an explicit HTTP success/payload match.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7ad1748a-5ff3-4467-b6de-fd5de3d461a2
📒 Files selected for processing (2)
tests/e2e/lib/apps.gotests/e2e/virt_backup_restore_suite_test.go
There was a problem hiding this comment.
♻️ Duplicate comments (1)
tests/e2e/lib/apps.go (1)
502-508:⚠️ Potential issue | 🟠 MajorAdd response body validation to the readyz check.
The check at line 505 treats
err == nilas success, butMakeRequestuses curl with--silent --show-errorwithout--fail. HTTP error responses (5xx) will still return exit code 0, causing the readyz check to falsely pass when the endpoint is unhealthy.Consider validating the response body (e.g., checking for
"ok"or expected content):Proposed fix
for attempt := 1; attempt <= maxReadyzAttempts; attempt++ { log.Printf("readyz check attempt %d/%d: GET %s/readyz\n", attempt, maxReadyzAttempts, appEndpointURL) respData, errResp, err = MakeRequest(*requestParams) - if err == nil { + if err == nil && strings.TrimSpace(respData) == "ok" { log.Printf("readyz endpoint is alive (attempt %d/%d): %s\n", attempt, maxReadyzAttempts, respData) break },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/lib/apps.go` around lines 502 - 508, The readyz loop currently treats err == nil from MakeRequest as success, but MakeRequest (used in the loop around appEndpointURL) can return exit code 0 for HTTP 5xx; update the success check in the readyz check to validate the response body and/or HTTP status: after calling MakeRequest(*requestParams) inspect respData and errResp (or parsed status in errResp) and only break when respData contains the expected OK payload (e.g., "ok" or the service-specific healthy string) and there is no HTTP error status; otherwise log the failure and continue retrying. Ensure you update the check that uses respData, errResp, err and keep the existing logging around MakeRequest.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests/e2e/lib/apps.go`:
- Around line 502-508: The readyz loop currently treats err == nil from
MakeRequest as success, but MakeRequest (used in the loop around appEndpointURL)
can return exit code 0 for HTTP 5xx; update the success check in the readyz
check to validate the response body and/or HTTP status: after calling
MakeRequest(*requestParams) inspect respData and errResp (or parsed status in
errResp) and only break when respData contains the expected OK payload (e.g.,
"ok" or the service-specific healthy string) and there is no HTTP error status;
otherwise log the failure and continue retrying. Ensure you update the check
that uses respData, errResp, err and keep the existing logging around
MakeRequest.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a48a2d6f-ac68-4df0-b523-b2cb65761ad9
📒 Files selected for processing (1)
tests/e2e/lib/apps.go
Signed-off-by: Wesley Hayutin <weshayutin@gmail.com>
|
wooot.. let's go. https://prow.ci.openshift.org/pr-history/?org=openshift&repo=oadp-operator&pr=2112 |
|
/test 4.21-e2e-test-aws |
mpryc
left a comment
There was a problem hiding this comment.
/lgtm with small room to improve around apt-get
tests/e2e/sample-applications/mongo-persistent/mongo-persistent-block.yaml
Show resolved
Hide resolved
tests/e2e/sample-applications/mongo-persistent/mongo-persistent-csi.yaml
Show resolved
Hide resolved
Signed-off-by: Wesley Hayutin <weshayutin@gmail.com>
WalkthroughThis PR adds post-restore pod recovery for file-system backups (KOPIA/restic) to trigger fresh pod creation with proper networking, enhances restore verification logic with healthz checks and polling, updates function signatures to support optional readiness checks, and updates sample application manifests with new container images and environment variable configurations. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
There was a problem hiding this comment.
♻️ Duplicate comments (4)
tests/e2e/backup_restore_suite_test.go (1)
276-281:⚠️ Potential issue | 🟠 MajorUse a timeout-bounded context for pod DeleteCollection.
Line 277 uses
context.Background()for a network API call; this can hang the test flow under API/server stalls.🔧 Suggested fix
if brCase.BackupRestoreType == lib.KOPIA { log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace) + ctx, cancel := context.WithTimeout(context.Background(), 2*time.Minute) + defer cancel() err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection( - context.Background(), + ctx, metav1.DeleteOptions{}, metav1.ListOptions{LabelSelector: "e2e-app=true"}, ) gomega.Expect(err).ToNot(gomega.HaveOccurred()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/backup_restore_suite_test.go` around lines 276 - 281, Replace the unbounded context in the DeleteCollection call with a timeout-bounded context: create a context with a reasonable timeout using context.WithTimeout (and defer cancel()), then pass that ctx into kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection(...). Ensure you cancel the context to release resources and keep the DeleteCollection invocation and subsequent gomega.Expect(err).ToNot(gomega.HaveOccurred()) behavior unchanged.tests/e2e/backup_restore_cli_suite_test.go (1)
185-190:⚠️ Potential issue | 🟠 MajorBound the pod deletion call with a timeout context.
Line 186 uses
context.Background()forDeleteCollection; if the API call stalls, this can hang the test path.🔧 Suggested fix
if brCase.BackupRestoreType == lib.KOPIA { log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace) + ctx, cancel := context.WithTimeout(context.Background(), 2*time.Minute) + defer cancel() err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection( - context.Background(), + ctx, metav1.DeleteOptions{}, metav1.ListOptions{LabelSelector: "e2e-app=true"}, ) gomega.Expect(err).ToNot(gomega.HaveOccurred()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/backup_restore_cli_suite_test.go` around lines 185 - 190, The DeleteCollection call uses context.Background() which can hang; change it to a bounded context by creating a context with timeout (e.g., via context.WithTimeout) before calling kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection and ensure you defer the cancel() so the timeout is enforced and resources are cleaned up; update the call site that currently passes context.Background() to pass the new timed context instead.tests/e2e/sample-applications/mongo-persistent/mongo-persistent-block.yaml (1)
141-145:⚠️ Potential issue | 🟠 MajorMake the block-device startup script fail fast and quote variables.
Lines 143-144 run mount flow without strict mode and unquoted vars; mount failures can lead to misleading startup behavior.
🔧 Suggested fix
- | + set -eu DEVICE="/dev/block-pv" MOUNT_POINT="/var/lib/mongodb" - mkdir -p $MOUNT_POINT - mount $DEVICE $MOUNT_POINT + mkdir -p "$MOUNT_POINT" + mount "$DEVICE" "$MOUNT_POINT" exec /opt/todolist/entrypoint.sh🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/sample-applications/mongo-persistent/mongo-persistent-block.yaml` around lines 141 - 145, The startup script should fail fast and safely quote variables: enable strict mode (set -euo pipefail) at the top, quote DEVICE and MOUNT_POINT when creating the directory and when calling mount (use "$DEVICE" and "$MOUNT_POINT"), check the return value of mount and exit with a non‑zero status and an error log if mount fails, and only exec /opt/todolist/entrypoint.sh after a successful mount; update the block that references DEVICE, MOUNT_POINT and the final exec to follow these changes.tests/e2e/lib/apps.go (1)
519-527:⚠️ Potential issue | 🟠 MajorHTTP error responses can still be treated as healthy/ready in proxy mode.
Line 526 and Line 567 assume
err == nilreliably means HTTP success, but proxy requests can still returnerr == nilon HTTP 4xx/5xx when curl doesn’t fail on status. That leaves this block vulnerable to false readiness/data-availability decisions.🔧 Suggested hardening (root cause in request helper)
--- a/tests/e2e/lib/common_helpers.go +++ b/tests/e2e/lib/common_helpers.go @@ - curlInProxyCmd = "curl -X GET --silent --show-error " + params.URL + curlInProxyCmd = "curl -X GET --fail-with-body --silent --show-error " + params.URL @@ - curlInProxyCmd = fmt.Sprintf("curl -X POST -d %s --silent --show-error %s", body, params.URL) + curlInProxyCmd = fmt.Sprintf("curl -X POST -d %s --fail-with-body --silent --show-error %s", body, params.URL) @@ - curlInProxyCmd = fmt.Sprintf("curl -X POST --silent --show-error %s", params.URL) + curlInProxyCmd = fmt.Sprintf("curl -X POST --fail-with-body --silent --show-error %s", params.URL)Use this to verify the concern and affected paths:
#!/bin/bash # Verify curl proxy commands do not currently fail on HTTP status codes. rg -n -C2 --type go 'curl -X (GET|POST).*--silent --show-error' tests/e2e/lib/common_helpers.go # Verify readiness/data checks that currently depend on err/body heuristics. rg -n -C3 --type go 'healthz check attempt|isHealthzAlive|maxTodoAttempts == 1|todo-incomplete|errRespIndicatesHTTPError' tests/e2e/lib/apps.goExpected result: first command shows proxy curl invocations without
--fail/--fail-with-body; second command shows call sites in this file that rely on that behavior.Also applies to: 567-568, 632-650
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/lib/apps.go` around lines 519 - 527, The readiness check incorrectly treats err == nil as success for proxy-mode requests; update either the request helper or this caller so HTTP error statuses are not considered healthy: modify MakeRequest (used here) to make proxy curl fail on HTTP errors (add --fail or --fail-with-body to the curl args in getRequestParameters/MakeRequest) so it returns a non-nil error for 4xx/5xx, or, if you prefer caller-side hardening, explicitly inspect errResp for HTTP status >= 400 before accepting success and only call isHealthzAlive when status is 2xx; reference MakeRequest, getRequestParameters, isHealthzAlive and proxyPodParams to locate the code to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests/e2e/backup_restore_cli_suite_test.go`:
- Around line 185-190: The DeleteCollection call uses context.Background() which
can hang; change it to a bounded context by creating a context with timeout
(e.g., via context.WithTimeout) before calling
kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection and
ensure you defer the cancel() so the timeout is enforced and resources are
cleaned up; update the call site that currently passes context.Background() to
pass the new timed context instead.
In `@tests/e2e/backup_restore_suite_test.go`:
- Around line 276-281: Replace the unbounded context in the DeleteCollection
call with a timeout-bounded context: create a context with a reasonable timeout
using context.WithTimeout (and defer cancel()), then pass that ctx into
kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection(...).
Ensure you cancel the context to release resources and keep the DeleteCollection
invocation and subsequent gomega.Expect(err).ToNot(gomega.HaveOccurred())
behavior unchanged.
In `@tests/e2e/lib/apps.go`:
- Around line 519-527: The readiness check incorrectly treats err == nil as
success for proxy-mode requests; update either the request helper or this caller
so HTTP error statuses are not considered healthy: modify MakeRequest (used
here) to make proxy curl fail on HTTP errors (add --fail or --fail-with-body to
the curl args in getRequestParameters/MakeRequest) so it returns a non-nil error
for 4xx/5xx, or, if you prefer caller-side hardening, explicitly inspect errResp
for HTTP status >= 400 before accepting success and only call isHealthzAlive
when status is 2xx; reference MakeRequest, getRequestParameters, isHealthzAlive
and proxyPodParams to locate the code to change.
In `@tests/e2e/sample-applications/mongo-persistent/mongo-persistent-block.yaml`:
- Around line 141-145: The startup script should fail fast and safely quote
variables: enable strict mode (set -euo pipefail) at the top, quote DEVICE and
MOUNT_POINT when creating the directory and when calling mount (use "$DEVICE"
and "$MOUNT_POINT"), check the return value of mount and exit with a non‑zero
status and an error log if mount fails, and only exec
/opt/todolist/entrypoint.sh after a successful mount; update the block that
references DEVICE, MOUNT_POINT and the final exec to follow these changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f1f76b6a-4093-458a-bf4e-7a6b9f2a018a
📒 Files selected for processing (10)
tests/e2e/backup_restore_cli_suite_test.gotests/e2e/backup_restore_suite_test.gotests/e2e/lib/apps.gotests/e2e/sample-applications/mongo-persistent/mongo-persistent-block.yamltests/e2e/sample-applications/mongo-persistent/mongo-persistent-csi.yamltests/e2e/sample-applications/mongo-persistent/mongo-persistent.yamltests/e2e/sample-applications/mysql-persistent/mysql-persistent-csi.yamltests/e2e/sample-applications/mysql-persistent/mysql-persistent-twovol-csi.yamltests/e2e/sample-applications/mysql-persistent/mysql-persistent.yamltests/e2e/virt_backup_restore_suite_test.go
|
/lgtm |
|
@weshayutin: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| // For file-system backup restores (KOPIA/restic), the restored pods may have | ||
| // broken networking because OVN-Kubernetes doesn't fully wire the network | ||
| // namespace for pods recreated by Velero with a restore-wait init container. | ||
| // Deleting the pods lets the deployment controller create fresh ones with | ||
| // proper networking while preserving the restored PVC data. |
There was a problem hiding this comment.
um... ur the maintainer :)
There was a problem hiding this comment.
@kaovilai perhaps vmware-tanzu/velero#7457 ?
There was a problem hiding this comment.
@kaovilai I can easily recreate the issue now, perhaps we need a new clean upstream velero bug and doc :)
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kaovilai, mpryc, weshayutin The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |

Why the changes were made
Update todolist maria and todolist mongo to use new version
migtools/mig-demo-apps#9
Summary
oadp-ci-todolist-*container images to the new unifiedoadp-ci-todo2-go-testing-*images, updating environment variables (DB_BACKEND,MONGO_DATABASE), mount paths (/var/lib/mongodb), and device paths (/dev/block-pv) to match the new image contracts./healthzpolling gate toVerifyBackupRestoreDataso that post-restore data verification waits for the app to become alive before comparing todo data, with configurable retry and linear backoff.skipReadyzoption toVerifyBackupRestoreDatafor VM-based (virt) tests where the app route is not directly reachable until the VM completes cloud-init; in this path,/todo-incompleteis polled with up to 10 retries.Details
Image migration
quay.io/migtools/oadp-ci-todolist-mongo-go-testing:latestquay.io/migtools/oadp-ci-todo2-go-testing-mongodb:latestquay.io/migtools/oadp-ci-todolist-mariadb-go-testing:testingquay.io/migtools/oadp-ci-todo2-go-testing-mariadb:latestPost-restore verification improvements (
lib/apps.go)VerifyBackupRestoreDatanow accepts an optionalskipReadyz ...boolvariadic parameter./healthzup to 5 times (5s linear backoff) before fetching/todo-incomplete.skipReadyz=true): skips the healthz gate and instead polls/todo-incompleteup to 10 times (10s linear backoff) to wait for the VM to finish booting.isHealthzAlive()anderrRespIndicatesHTTPError()for cleaner response validation.KOPIA restore pod restart
backup_restore_suite_test.goandbackup_restore_cli_suite_test.go, after a KOPIA restore completes, pods with labele2e-app=trueare deleted so the deployment controller creates fresh pods with proper OVN-Kubernetes networking.Manifest updates
shtobash, installse2fsprogsat runtime for block device formatting, updated device path to/dev/block-pv.CHOWNandFOWNERcapabilities to the SCC and container security context to support the new image.DB_BACKENDenvironment variable, updated readiness probe to/readyzwhere applicable.Test Plan
mongo-persistent(CSI, file-system, block)mysql-persistent(CSI, file-system, two-vol)skipReadyzpolling pathSummary by CodeRabbit
New Features
Tests