Skip to content

STOR-2770: Generate Azure Disk and File CSI Driver Operator metrics serving certs by HCP controller#7970

Open
mpatlasov wants to merge 1 commit intoopenshift:mainfrom
mpatlasov:STOR-2770-Reconcile-Azure-Disk-and-File-Metrics-Serving-Certs
Open

STOR-2770: Generate Azure Disk and File CSI Driver Operator metrics serving certs by HCP controller#7970
mpatlasov wants to merge 1 commit intoopenshift:mainfrom
mpatlasov:STOR-2770-Reconcile-Azure-Disk-and-File-Metrics-Serving-Certs

Conversation

@mpatlasov
Copy link
Contributor

@mpatlasov mpatlasov commented Mar 16, 2026

service-ca operator might not be running in some HCP environments. Hence, we need CPO help to generate certs for us.

This PR must address hypershift-e2e-aks failures on AKS platform (see also openshift/cluster-storage-operator#662 ).

Summary by CodeRabbit

  • New Features

    • Added metrics services and serving-certificate support for Azure Disk and Azure File CSI driver operators.
  • Improvements

    • Extended PKI reconciliation to cover Azure-specific CSI components.
    • Improved error handling: certificate creation/update failures now surface; non-fatal cleanup failures are logged for troubleshooting.

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot
Copy link

@mpatlasov: An error was encountered searching for bug STOR-2770 on the Jira server at https://issues.redhat.com. No known errors were detected, please see the full error message for details.

Full error message. request failed. Please analyze the request body for more details. Status code: 403:

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

Details

In response to this:

service-ca operator might not be running in some HCP environments. Hence, we need CPO help to generate certs for us.

This PR must address hypershift-e2e-aks failures on AKS platform (see also openshift/cluster-storage-operator#662 ).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 6aac8c14-ce88-46e4-b6e5-7c5291f5b000

📥 Commits

Reviewing files that changed from the base of the PR and between eb28188 and 5750392.

📒 Files selected for processing (6)
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/manifests/azure_disk_csi_driver_operator.go
  • control-plane-operator/controllers/hostedcontrolplane/manifests/azure_file_csi_driver_operator.go
  • control-plane-operator/controllers/hostedcontrolplane/manifests/pki.go
  • control-plane-operator/controllers/hostedcontrolplane/pki/azure_disk_csi_driver_operator.go
  • control-plane-operator/controllers/hostedcontrolplane/pki/azure_file_csi_driver_operator.go
✅ Files skipped from review due to trivial changes (1)
  • control-plane-operator/controllers/hostedcontrolplane/manifests/pki.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • control-plane-operator/controllers/hostedcontrolplane/manifests/azure_file_csi_driver_operator.go
  • control-plane-operator/controllers/hostedcontrolplane/manifests/azure_disk_csi_driver_operator.go
  • control-plane-operator/controllers/hostedcontrolplane/pki/azure_file_csi_driver_operator.go
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go

📝 Walkthrough

Walkthrough

Adds Azure-specific PKI reconciliation to the HostedControlPlane controller: it removes service CA annotations (logged on failure), deletes old serving-cert secrets, creates headless Service objects for Azure Disk and Azure File CSI driver operators, and creates/updates serving-cert Secrets by calling two new pki reconcilers that build TLS serving certificates with DNS SANs. Also adds manifest helpers for the two Services and helpers for the corresponding Secret names. create/update failures are returned as errors from reconciliation.

Sequence Diagram(s)

sequenceDiagram
    participant Controller as HostedControlPlane Controller
    participant Manifests as Manifest Functions
    participant PKI as PKI Reconciler
    participant K8s as Kubernetes API

    Controller->>Controller: Start PKI reconciliation (Azure)
    Controller->>K8s: remove service CA annotation & delete old secret (log failures)
    K8s-->>Controller: Acknowledged

    Controller->>Manifests: AzureDiskCSIDriverOperatorMetricsService(ns)
    Manifests-->>Controller: Service object
    Controller->>K8s: Create/Update Service
    K8s-->>Controller: Service created/updated

    Controller->>Manifests: AzureDiskCSIDriverOperatorServingCertSecret(ns)
    Manifests-->>Controller: Secret object
    Controller->>PKI: ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret(secret, ca, owner)
    PKI->>PKI: Build DNS SANs & sign cert
    PKI-->>Controller: Signed cert secret
    Controller->>K8s: Create/Update Secret (errors returned)

    Note over Controller,PKI: Repeat analogous flow for Azure File CSI driver
Loading
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can scan for known vulnerabilities in your dependencies using OSV Scanner.

OSV Scanner will automatically detect and report security vulnerabilities in your project's dependencies. No additional configuration is required.

@openshift-ci openshift-ci bot requested review from bryan-cox and csrwng March 16, 2026 07:16
@openshift-ci openshift-ci bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/platform/azure PR/issue for Azure (AzurePlatform) platform and removed do-not-merge/needs-area labels Mar 16, 2026
@openshift-ci-robot
Copy link

@mpatlasov: An error was encountered searching for bug STOR-2770 on the Jira server at https://issues.redhat.com. No known errors were detected, please see the full error message for details.

Full error message. request failed. Please analyze the request body for more details. Status code: 403:

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

Details

In response to this:

service-ca operator might not be running in some HCP environments. Hence, we need CPO help to generate certs for us.

This PR must address hypershift-e2e-aks failures on AKS platform (see also openshift/cluster-storage-operator#662 ).

Summary by CodeRabbit

  • New Features

  • Added certificate reconciliation support for Azure Disk CSI Driver Operator, including dedicated metrics service.

  • Added certificate reconciliation support for Azure File CSI Driver Operator, including dedicated metrics service.

  • Improvements

  • Enhanced public key infrastructure management for Azure hosted control planes with improved certificate lifecycle handling.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`:
- Around line 1771-1789: Update the Azure File CSI Driver Operator block by
replacing the misleading comment that mentions "Cluster Node Tuning Operator"
with an accurate comment referencing the Azure File CSI Driver Operator, and
remove all temporary debug log statements that start with "ZZZZZ" (r.Log.Info
calls in this block). Keep the functional calls intact
(manifests.AzureFileCSIDriverOperatorServingCertSecret,
manifests.AzureFileCSIDriverOperatorMetricsService,
removeServiceCAAnnotationAndSecret, createOrUpdate, and
pki.ReconcileAzureFileCsiDriverOperatorMetricsServingCertSecret) and ensure only
meaningful, production-level logging remains (e.g., preserve error logging but
without the debug prefixes).
- Around line 1729-1747: Update the incorrect comment above the
AzureDiskCsiDriverOperatorServingCert block to reference the "Azure Disk CSI
Driver Operator metrics Serving Cert" instead of the Cluster Node Tuning
Operator, and remove all temporary debug log statements that use the "ZZZZZ"
prefix (calls to r.Log.Info around the calls to
removeServiceCAAnnotationAndSecret, createOrUpdate, and
pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret). Keep the
functional calls and error handling (removeServiceCAAnnotationAndSecret,
createOrUpdate, pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret,
AzureDiskCsiDriverOperatorServingCert, AzureDiskCsiDriverOperatorService,
rootCASecret, p.OwnerRef) intact, only delete the debug logs and fix the comment
text.
- Line 1697: Remove the development debug log call r.Log.Info("ZZZZZ: -1-
Platform specific certs ...") from the HostedControlPlaneReconciler
reconciliation flow (the r.Log.Info invocation in
hostedcontrolplane_controller.go), ensuring no other behavior or logging is
changed; simply delete the stray debug statement so normal logging remains.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: e5160d2d-64b4-4f86-a510-3e78e4abc30f

📥 Commits

Reviewing files that changed from the base of the PR and between f6d626c and 499336e.

📒 Files selected for processing (6)
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/manifests/azure_disk_csi_driver_operator.go
  • control-plane-operator/controllers/hostedcontrolplane/manifests/azure_file_csi_driver_operator.go
  • control-plane-operator/controllers/hostedcontrolplane/manifests/pki.go
  • control-plane-operator/controllers/hostedcontrolplane/pki/azure_disk_csi_driver_operator.go
  • control-plane-operator/controllers/hostedcontrolplane/pki/azure_file_csi_driver_operator.go

}

// Platform specific certs
r.Log.Info("ZZZZZ: -1- Platform specific certs ...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove debug logging statement.

This debug log statement with the "ZZZZZ" prefix appears to be a development artifact that should be removed before merging.

Proposed fix
-	r.Log.Info("ZZZZZ: -1- Platform specific certs ...")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
r.Log.Info("ZZZZZ: -1- Platform specific certs ...")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`
at line 1697, Remove the development debug log call r.Log.Info("ZZZZZ: -1-
Platform specific certs ...") from the HostedControlPlaneReconciler
reconciliation flow (the r.Log.Info invocation in
hostedcontrolplane_controller.go), ensuring no other behavior or logging is
changed; simply delete the stray debug statement so normal logging remains.

Comment on lines +1729 to +1747
r.Log.Info("ZZZZZ: -2- Platform specific certs: case hyperv1.AzurePlatform ...")
// Cluster Node Tuning Operator metrics Serving Cert
AzureDiskCsiDriverOperatorServingCert := manifests.AzureDiskCSIDriverOperatorServingCertSecret(hcp.Namespace)
AzureDiskCsiDriverOperatorService := manifests.AzureDiskCSIDriverOperatorMetricsService(hcp.Namespace)
err := removeServiceCAAnnotationAndSecret(ctx, r.Client, AzureDiskCsiDriverOperatorService, AzureDiskCsiDriverOperatorServingCert)
if err != nil {
r.Log.Error(err, "failed to remove service ca annotation and secret: %w")
}
r.Log.Info("ZZZZZ: -3- After removeServiceCAAnnotationAndSecret(); createOrUpdate() ...")
if _, err = createOrUpdate(ctx, r, AzureDiskCsiDriverOperatorServingCert, func() error {
r.Log.Info("ZZZZZ: -4- After createOrUpdate(); pki.Reconcile ...")
z := pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret(AzureDiskCsiDriverOperatorServingCert, rootCASecret, p.OwnerRef)
r.Log.Info("ZZZZZ: -5- pki.Reconcile returned: ", z)
return z
}); err != nil {
r.Log.Info("ZZZZZ: -6- err: ", err)
return fmt.Errorf("failed to reconcile azure-disk csi driver operator serving cert: %w", err)
}
r.Log.Info("ZZZZZ: -7- azure-disk csi driver operator: serving cert secret DONE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove debug logging and fix misleading comment.

  1. The comment on line 1730-1731 incorrectly states "Cluster Node Tuning Operator metrics Serving Cert" - this appears to be a copy-paste error from the Node Tuning Operator section. It should reference Azure Disk CSI Driver Operator.

  2. Multiple debug logging statements with "ZZZZZ" prefix should be removed before merging.

Proposed fix
 	case hyperv1.AzurePlatform:
-		r.Log.Info("ZZZZZ: -2- Platform specific certs: case hyperv1.AzurePlatform ...")
-		// Cluster Node Tuning Operator metrics Serving Cert
+		// Azure Disk CSI Driver Operator metrics Serving Cert
 		AzureDiskCsiDriverOperatorServingCert := manifests.AzureDiskCSIDriverOperatorServingCertSecret(hcp.Namespace)
 		AzureDiskCsiDriverOperatorService := manifests.AzureDiskCSIDriverOperatorMetricsService(hcp.Namespace)
 		err := removeServiceCAAnnotationAndSecret(ctx, r.Client, AzureDiskCsiDriverOperatorService, AzureDiskCsiDriverOperatorServingCert)
 		if err != nil {
 			r.Log.Error(err, "failed to remove service ca annotation and secret: %w")
 		}
-		r.Log.Info("ZZZZZ: -3- After removeServiceCAAnnotationAndSecret(); createOrUpdate() ...")
 		if _, err = createOrUpdate(ctx, r, AzureDiskCsiDriverOperatorServingCert, func() error {
-			r.Log.Info("ZZZZZ: -4- After createOrUpdate(); pki.Reconcile ...")
-			z := pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret(AzureDiskCsiDriverOperatorServingCert, rootCASecret, p.OwnerRef)
-			r.Log.Info("ZZZZZ: -5- pki.Reconcile returned: ", z)
-			return z
+			return pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret(AzureDiskCsiDriverOperatorServingCert, rootCASecret, p.OwnerRef)
 		}); err != nil {
-			r.Log.Info("ZZZZZ: -6- err: ", err)
 			return fmt.Errorf("failed to reconcile azure-disk csi driver operator serving cert: %w", err)
 		}
-		r.Log.Info("ZZZZZ: -7- azure-disk csi driver operator: serving cert secret DONE")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
r.Log.Info("ZZZZZ: -2- Platform specific certs: case hyperv1.AzurePlatform ...")
// Cluster Node Tuning Operator metrics Serving Cert
AzureDiskCsiDriverOperatorServingCert := manifests.AzureDiskCSIDriverOperatorServingCertSecret(hcp.Namespace)
AzureDiskCsiDriverOperatorService := manifests.AzureDiskCSIDriverOperatorMetricsService(hcp.Namespace)
err := removeServiceCAAnnotationAndSecret(ctx, r.Client, AzureDiskCsiDriverOperatorService, AzureDiskCsiDriverOperatorServingCert)
if err != nil {
r.Log.Error(err, "failed to remove service ca annotation and secret: %w")
}
r.Log.Info("ZZZZZ: -3- After removeServiceCAAnnotationAndSecret(); createOrUpdate() ...")
if _, err = createOrUpdate(ctx, r, AzureDiskCsiDriverOperatorServingCert, func() error {
r.Log.Info("ZZZZZ: -4- After createOrUpdate(); pki.Reconcile ...")
z := pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret(AzureDiskCsiDriverOperatorServingCert, rootCASecret, p.OwnerRef)
r.Log.Info("ZZZZZ: -5- pki.Reconcile returned: ", z)
return z
}); err != nil {
r.Log.Info("ZZZZZ: -6- err: ", err)
return fmt.Errorf("failed to reconcile azure-disk csi driver operator serving cert: %w", err)
}
r.Log.Info("ZZZZZ: -7- azure-disk csi driver operator: serving cert secret DONE")
// Azure Disk CSI Driver Operator metrics Serving Cert
AzureDiskCsiDriverOperatorServingCert := manifests.AzureDiskCSIDriverOperatorServingCertSecret(hcp.Namespace)
AzureDiskCsiDriverOperatorService := manifests.AzureDiskCSIDriverOperatorMetricsService(hcp.Namespace)
err := removeServiceCAAnnotationAndSecret(ctx, r.Client, AzureDiskCsiDriverOperatorService, AzureDiskCsiDriverOperatorServingCert)
if err != nil {
r.Log.Error(err, "failed to remove service ca annotation and secret: %w")
}
if _, err = createOrUpdate(ctx, r, AzureDiskCsiDriverOperatorServingCert, func() error {
return pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret(AzureDiskCsiDriverOperatorServingCert, rootCASecret, p.OwnerRef)
}); err != nil {
return fmt.Errorf("failed to reconcile azure-disk csi driver operator serving cert: %w", err)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`
around lines 1729 - 1747, Update the incorrect comment above the
AzureDiskCsiDriverOperatorServingCert block to reference the "Azure Disk CSI
Driver Operator metrics Serving Cert" instead of the Cluster Node Tuning
Operator, and remove all temporary debug log statements that use the "ZZZZZ"
prefix (calls to r.Log.Info around the calls to
removeServiceCAAnnotationAndSecret, createOrUpdate, and
pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret). Keep the
functional calls and error handling (removeServiceCAAnnotationAndSecret,
createOrUpdate, pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret,
AzureDiskCsiDriverOperatorServingCert, AzureDiskCsiDriverOperatorService,
rootCASecret, p.OwnerRef) intact, only delete the debug logs and fix the comment
text.

@mpatlasov mpatlasov force-pushed the STOR-2770-Reconcile-Azure-Disk-and-File-Metrics-Serving-Certs branch from 499336e to aeb8b6c Compare March 16, 2026 07:30
@openshift-ci-robot
Copy link

@mpatlasov: An error was encountered searching for bug STOR-2770 on the Jira server at https://issues.redhat.com. No known errors were detected, please see the full error message for details.

Full error message. request failed. Please analyze the request body for more details. Status code: 403:

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

Details

In response to this:

service-ca operator might not be running in some HCP environments. Hence, we need CPO help to generate certs for us.

This PR must address hypershift-e2e-aks failures on AKS platform (see also openshift/cluster-storage-operator#662 ).

Summary by CodeRabbit

  • New Features

  • Added metrics services and serving certificate support for Azure Disk and Azure File CSI driver operators.

  • Improvements

  • Extended PKI reconciliation to cover Azure-specific CSI components.

  • Added additional logging to surface progress and debug platform-specific certificate handling.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go (3)

1697-1697: ⚠️ Potential issue | 🟡 Minor

Remove debug logging statement.

This debug log statement with the "ZZZZZ" prefix appears to be a development artifact that should be removed before merging.

Proposed fix
-	r.Log.Info("ZZZZZ: -1- Platform specific certs ...")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`
at line 1697, Remove the development debug log call r.Log.Info("ZZZZZ: -1-
Platform specific certs ...") from the reconciliation path in
hostedcontrolplane_controller.go; locate the r.Log.Info invocation and delete
the entire statement so no "ZZZZZ" debug output remains (do not add replacement
logging unless there is a meaningful message to record).

1771-1789: ⚠️ Potential issue | 🟡 Minor

Remove debug logging and fix misleading comment.

Same issues as the Azure Disk section:

  1. The comment on line 1772 incorrectly states "Azure-file CSI driver Operator" - while this is technically correct, the format is inconsistent with similar comments in this file.

  2. Debug logging statements with "ZZZZZ" prefix should be removed.

Proposed fix
-		r.Log.Info("ZZZZZ: -22- Platform specific certs: case hyperv1.AzurePlatform ...")
-		// Azure-file CSI driver Operator metrics Serving Cert
+		// Azure File CSI Driver Operator metrics Serving Cert
 		AzureFileCsiDriverOperatorServingCert := manifests.AzureFileCSIDriverOperatorServingCertSecret(hcp.Namespace)
 		AzureFileCsiDriverOperatorService := manifests.AzureFileCSIDriverOperatorMetricsService(hcp.Namespace)
 		err = removeServiceCAAnnotationAndSecret(ctx, r.Client, AzureFileCsiDriverOperatorService, AzureFileCsiDriverOperatorServingCert)
 		if err != nil {
 			r.Log.Error(err, "failed to remove service ca annotation and secret: %w")
 		}
-		r.Log.Info("ZZZZZ: -33- After removeServiceCAAnnotationAndSecret(); createOrUpdate() ...")
 		if _, err = createOrUpdate(ctx, r, AzureFileCsiDriverOperatorServingCert, func() error {
-			r.Log.Info("ZZZZZ: -44- After createOrUpdate(); pki.Reconcile ...")
-			z := pki.ReconcileAzureFileCsiDriverOperatorMetricsServingCertSecret(AzureFileCsiDriverOperatorServingCert, rootCASecret, p.OwnerRef)
-			r.Log.Info("ZZZZZ: -55- pki.Reconcile returned: ", z)
-			return z
+			return pki.ReconcileAzureFileCsiDriverOperatorMetricsServingCertSecret(AzureFileCsiDriverOperatorServingCert, rootCASecret, p.OwnerRef)
 		}); err != nil {
-			r.Log.Info("ZZZZZ: -66- err: ", err)
 			return fmt.Errorf("failed to reconcile azure-file csi driver operator serving cert: %w", err)
 		}
-		r.Log.Info("ZZZZZ: -77- azure-file csi driver operator: serving cert secret DONE")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`
around lines 1771 - 1789, The block contains leftover debug logs and an
inconsistent comment: remove all "ZZZZZ" r.Log.Info(...) calls and replace the
misleading/inconsistent comment above the
AzureFileCsiDriverOperatorServingCertSecret/
AzureFileCsiDriverOperatorMetricsService creation with a concise, consistent
comment matching the file's style (e.g., "Azure-file CSI driver operator metrics
serving cert"). Ensure the logic around removeServiceCAAnnotationAndSecret,
createOrUpdate, and
pki.ReconcileAzureFileCsiDriverOperatorMetricsServingCertSecret remains
unchanged and keep error logging using r.Log.Error/r.Log.Info without debug
markers.

1729-1747: ⚠️ Potential issue | 🟡 Minor

Remove debug logging and fix misleading comment.

  1. The comment on line 1730-1731 incorrectly states "Azure-disk CSI driver Operator metrics Serving Cert" but appears to have been copied from elsewhere - the comment format is inconsistent with similar comments in this file (see line 1542 for the Node Tuning Operator pattern).

  2. Multiple debug logging statements with "ZZZZZ" prefix should be removed before merging.

Proposed fix
 	case hyperv1.AzurePlatform:
-		r.Log.Info("ZZZZZ: -2- Platform specific certs: case hyperv1.AzurePlatform ...")
-		// Azure-disk CSI driver Operator metrics Serving Cert
+		// Azure Disk CSI Driver Operator metrics Serving Cert
 		AzureDiskCsiDriverOperatorServingCert := manifests.AzureDiskCSIDriverOperatorServingCertSecret(hcp.Namespace)
 		AzureDiskCsiDriverOperatorService := manifests.AzureDiskCSIDriverOperatorMetricsService(hcp.Namespace)
 		err := removeServiceCAAnnotationAndSecret(ctx, r.Client, AzureDiskCsiDriverOperatorService, AzureDiskCsiDriverOperatorServingCert)
 		if err != nil {
 			r.Log.Error(err, "failed to remove service ca annotation and secret: %w")
 		}
-		r.Log.Info("ZZZZZ: -3- After removeServiceCAAnnotationAndSecret(); createOrUpdate() ...")
 		if _, err = createOrUpdate(ctx, r, AzureDiskCsiDriverOperatorServingCert, func() error {
-			r.Log.Info("ZZZZZ: -4- After createOrUpdate(); pki.Reconcile ...")
-			z := pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret(AzureDiskCsiDriverOperatorServingCert, rootCASecret, p.OwnerRef)
-			r.Log.Info("ZZZZZ: -5- pki.Reconcile returned: ", z)
-			return z
+			return pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret(AzureDiskCsiDriverOperatorServingCert, rootCASecret, p.OwnerRef)
 		}); err != nil {
-			r.Log.Info("ZZZZZ: -6- err: ", err)
 			return fmt.Errorf("failed to reconcile azure-disk csi driver operator serving cert: %w", err)
 		}
-		r.Log.Info("ZZZZZ: -7- azure-disk csi driver operator: serving cert secret DONE")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`
around lines 1729 - 1747, Remove the temporary "ZZZZZ" debug logging and update
the misleading comment: change the comment that currently reads "Azure-disk CSI
driver Operator metrics Serving Cert" to match the file's comment style (e.g.,
"Azure Disk CSI driver operator metrics serving cert") and delete all r.Log.Info
calls that contain "ZZZZZ" around the AzureDiskCsiDriverOperatorServingCert
block; ensure you keep the calls to removeServiceCAAnnotationAndSecret,
createOrUpdate, and the
pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret invocation
unchanged so behavior is preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`:
- Line 1697: Remove the development debug log call r.Log.Info("ZZZZZ: -1-
Platform specific certs ...") from the reconciliation path in
hostedcontrolplane_controller.go; locate the r.Log.Info invocation and delete
the entire statement so no "ZZZZZ" debug output remains (do not add replacement
logging unless there is a meaningful message to record).
- Around line 1771-1789: The block contains leftover debug logs and an
inconsistent comment: remove all "ZZZZZ" r.Log.Info(...) calls and replace the
misleading/inconsistent comment above the
AzureFileCsiDriverOperatorServingCertSecret/
AzureFileCsiDriverOperatorMetricsService creation with a concise, consistent
comment matching the file's style (e.g., "Azure-file CSI driver operator metrics
serving cert"). Ensure the logic around removeServiceCAAnnotationAndSecret,
createOrUpdate, and
pki.ReconcileAzureFileCsiDriverOperatorMetricsServingCertSecret remains
unchanged and keep error logging using r.Log.Error/r.Log.Info without debug
markers.
- Around line 1729-1747: Remove the temporary "ZZZZZ" debug logging and update
the misleading comment: change the comment that currently reads "Azure-disk CSI
driver Operator metrics Serving Cert" to match the file's comment style (e.g.,
"Azure Disk CSI driver operator metrics serving cert") and delete all r.Log.Info
calls that contain "ZZZZZ" around the AzureDiskCsiDriverOperatorServingCert
block; ensure you keep the calls to removeServiceCAAnnotationAndSecret,
createOrUpdate, and the
pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret invocation
unchanged so behavior is preserved.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: d793c962-5329-4aef-a34d-9728c4d4ad05

📥 Commits

Reviewing files that changed from the base of the PR and between 499336e and aeb8b6c.

📒 Files selected for processing (6)
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/manifests/azure_disk_csi_driver_operator.go
  • control-plane-operator/controllers/hostedcontrolplane/manifests/azure_file_csi_driver_operator.go
  • control-plane-operator/controllers/hostedcontrolplane/manifests/pki.go
  • control-plane-operator/controllers/hostedcontrolplane/pki/azure_disk_csi_driver_operator.go
  • control-plane-operator/controllers/hostedcontrolplane/pki/azure_file_csi_driver_operator.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • control-plane-operator/controllers/hostedcontrolplane/manifests/pki.go
  • control-plane-operator/controllers/hostedcontrolplane/pki/azure_disk_csi_driver_operator.go
  • control-plane-operator/controllers/hostedcontrolplane/manifests/azure_disk_csi_driver_operator.go

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 18, 2026
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mpatlasov mpatlasov force-pushed the STOR-2770-Reconcile-Azure-Disk-and-File-Metrics-Serving-Certs branch from aeb8b6c to eb28188 Compare March 18, 2026 22:39
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 18, 2026
@mpatlasov
Copy link
Contributor Author

Update: this force-push only rebased this PR to a newer head of main branch.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 18, 2026

@mpatlasov: This pull request references STOR-2770 which is a valid jira issue.

Details

In response to this:

service-ca operator might not be running in some HCP environments. Hence, we need CPO help to generate certs for us.

This PR must address hypershift-e2e-aks failures on AKS platform (see also openshift/cluster-storage-operator#662 ).

Summary by CodeRabbit

  • New Features

  • Added metrics services and serving-certificate support for Azure Disk and Azure File CSI driver operators.

  • Improvements

  • Extended PKI reconciliation to include Azure-specific CSI components.

  • Added additional debug logging to surface progress and aid troubleshooting of platform-specific certificate handling.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 18, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go (1)

1696-1696: ⚠️ Potential issue | 🟡 Minor

Remove temporary ZZZZZ debug logs from Azure platform cert reconciliation.

These logs are development artifacts in a recurring reconcile path and add noisy, non-actionable output. The callback-local z temporary can also be inlined.

Proposed cleanup
-	r.Log.Info("ZZZZZ: -1- Platform specific certs ...")
 	switch hcp.Spec.Platform.Type {
@@
-		r.Log.Info("ZZZZZ: -2- Platform specific certs: case hyperv1.AzurePlatform ...")
 		// Azure-disk CSI driver Operator metrics Serving Cert
 		AzureDiskCsiDriverOperatorServingCert := manifests.AzureDiskCSIDriverOperatorServingCertSecret(hcp.Namespace)
 		AzureDiskCsiDriverOperatorService := manifests.AzureDiskCSIDriverOperatorMetricsService(hcp.Namespace)
 		err := removeServiceCAAnnotationAndSecret(ctx, r.Client, AzureDiskCsiDriverOperatorService, AzureDiskCsiDriverOperatorServingCert)
@@
-		r.Log.Info("ZZZZZ: -3- After removeServiceCAAnnotationAndSecret(); createOrUpdate() ...")
 		if _, err = createOrUpdate(ctx, r, AzureDiskCsiDriverOperatorServingCert, func() error {
-			r.Log.Info("ZZZZZ: -4- After createOrUpdate(); pki.Reconcile ...")
-			z := pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret(AzureDiskCsiDriverOperatorServingCert, rootCASecret, p.OwnerRef)
-			r.Log.Info("ZZZZZ: -5- pki.Reconcile returned: ", z)
-			return z
+			return pki.ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret(AzureDiskCsiDriverOperatorServingCert, rootCASecret, p.OwnerRef)
 		}); err != nil {
-			r.Log.Info("ZZZZZ: -6- err: ", err)
 			return fmt.Errorf("failed to reconcile azure-disk csi driver operator serving cert: %w", err)
 		}
-		r.Log.Info("ZZZZZ: -7- azure-disk csi driver operator: serving cert secret DONE")
@@
-		r.Log.Info("ZZZZZ: -22- Platform specific certs: case hyperv1.AzurePlatform ...")
 		// Azure-file CSI driver Operator metrics Serving Cert
 		AzureFileCsiDriverOperatorServingCert := manifests.AzureFileCSIDriverOperatorServingCertSecret(hcp.Namespace)
 		AzureFileCsiDriverOperatorService := manifests.AzureFileCSIDriverOperatorMetricsService(hcp.Namespace)
@@
-		r.Log.Info("ZZZZZ: -33- After removeServiceCAAnnotationAndSecret(); createOrUpdate() ...")
 		if _, err = createOrUpdate(ctx, r, AzureFileCsiDriverOperatorServingCert, func() error {
-			r.Log.Info("ZZZZZ: -44- After createOrUpdate(); pki.Reconcile ...")
-			z := pki.ReconcileAzureFileCsiDriverOperatorMetricsServingCertSecret(AzureFileCsiDriverOperatorServingCert, rootCASecret, p.OwnerRef)
-			r.Log.Info("ZZZZZ: -55- pki.Reconcile returned: ", z)
-			return z
+			return pki.ReconcileAzureFileCsiDriverOperatorMetricsServingCertSecret(AzureFileCsiDriverOperatorServingCert, rootCASecret, p.OwnerRef)
 		}); err != nil {
-			r.Log.Info("ZZZZZ: -66- err: ", err)
 			return fmt.Errorf("failed to reconcile azure-file csi driver operator serving cert: %w", err)
 		}
-		r.Log.Info("ZZZZZ: -77- azure-file csi driver operator: serving cert secret DONE")

As per coding guidelines, **: -Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Also applies to: 1735-1753, 1777-1795

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`
at line 1696, Remove the temporary debug log lines (e.g., the r.Log.Info("ZZZZZ:
-1- Platform specific certs ...") and similar ZZZZZ messages around lines
handling Azure platform cert reconciliation) and inline the callback-local
temporary variable `z` where it’s used instead of introducing a short-lived `z`
variable; update the Azure platform cert reconciliation code path (methods in
hostedcontrolplane_controller.go that reference `z` and emit ZZZZZ logs) to
eliminate noisy dev logs and simplify by using the expression directly or a
clearly named variable if needed for readability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`:
- Line 1696: Remove the temporary debug log lines (e.g., the r.Log.Info("ZZZZZ:
-1- Platform specific certs ...") and similar ZZZZZ messages around lines
handling Azure platform cert reconciliation) and inline the callback-local
temporary variable `z` where it’s used instead of introducing a short-lived `z`
variable; update the Azure platform cert reconciliation code path (methods in
hostedcontrolplane_controller.go that reference `z` and emit ZZZZZ logs) to
eliminate noisy dev logs and simplify by using the expression directly or a
clearly named variable if needed for readability.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: c783125c-e7c6-4808-b416-d543ce40f4fb

📥 Commits

Reviewing files that changed from the base of the PR and between aeb8b6c and eb28188.

📒 Files selected for processing (6)
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/manifests/azure_disk_csi_driver_operator.go
  • control-plane-operator/controllers/hostedcontrolplane/manifests/azure_file_csi_driver_operator.go
  • control-plane-operator/controllers/hostedcontrolplane/manifests/pki.go
  • control-plane-operator/controllers/hostedcontrolplane/pki/azure_disk_csi_driver_operator.go
  • control-plane-operator/controllers/hostedcontrolplane/pki/azure_file_csi_driver_operator.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • control-plane-operator/controllers/hostedcontrolplane/manifests/azure_file_csi_driver_operator.go
  • control-plane-operator/controllers/hostedcontrolplane/manifests/azure_disk_csi_driver_operator.go
  • control-plane-operator/controllers/hostedcontrolplane/pki/azure_file_csi_driver_operator.go

@mpatlasov
Copy link
Contributor Author

/testwith openshift/cluster-storage-operator/main/hypershift-e2e-aks openshift/cluster-storage-operator#686

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 19, 2026

@mpatlasov, testwith: could not generate prow job. ERROR:

create refs for PR: path alias: resolve config openshift/hypershift@: got unexpected http 400 status code from configresolver: branch query missing or incorrect

@mpatlasov
Copy link
Contributor Author

/testwith openshift/cluster-storage-operator/main/hypershift-e2e-aks openshift/cluster-storage-operator#686

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 19, 2026

@mpatlasov, testwith: could not generate prow job. ERROR:

create refs for PR: path alias: resolve config openshift/hypershift@: got unexpected http 400 status code from configresolver: branch query missing or incorrect

@jsafrane
Copy link
Contributor

/testwith openshift/hypershift/main/e2e-aks openshift/cluster-storage-operator#686

@jsafrane
Copy link
Contributor

/test e2e-aks

@cwbotbot
Copy link

Test Results

e2e-aks

@mpatlasov
Copy link
Contributor Author

This run confirms that we cannot mount metrics certs on AKS witout a help from CPO:

kind: Event
lastTimestamp: "2026-03-19T00:02:05Z"
message: 'MountVolume.SetUp failed for volume "serving-cert" : secret "azure-disk-csi-driver-operator-serving-cert" not found'

And now, with this PR applied, when another PR mounts metrics certs, our storage operators are up and running with certs mounted:

kind: Pod
metadata:
  name: azure-disk-csi-driver-operator-66cfbd4f7b-mzz2n
spec:
            ...
            f:volumeMounts:
              k:{"mountPath":"/var/run/secrets/serving-cert"}:
                .: {}
                f:mountPath: {}
                f:name: {}

(this is from azure-disk-csi-driver-operator-66cfbd4f7b-mzz2n.yaml)

and this message from azure-disk csi driver operator logs confirms that we do not generate self-signed certs anymore:

I0319 11:22:21.188116       1 cmd.go:253] Using service-serving-cert provided certificates

(here is /testwith result with this data: https://github.com/openshift/hypershift/pull/7970/checks?check_run_id=67725457027 --> https://prow.ci.openshift.org/view/gs/test-platform-results/logs/multi-pr-openshift-hypershift-7970-openshift-cluster-storage-operator-686-e2e-aks/2034578885533765632 )

…erving certs by HCP controller

service-ca operator might not be running in some HCP environments. Hence, we need CPO help to generate certs for us.
(see https://redhat.atlassian.net/browse/STOR-2770 for a bigger picture)
@mpatlasov mpatlasov force-pushed the STOR-2770-Reconcile-Azure-Disk-and-File-Metrics-Serving-Certs branch from eb28188 to 5750392 Compare March 19, 2026 23:16
@mpatlasov
Copy link
Contributor Author

Update: this force-push only removed debug r.Log.Info("ZZZZZ: ...) lines from the PR.

@mpatlasov
Copy link
Contributor Author

@bryan-cox , @csrwng , can you please review, this PR copy/pastes logic from NTO PR, and I verified it works as expected -- see my comment above.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 19, 2026

@mpatlasov: This pull request references STOR-2770 which is a valid jira issue.

Details

In response to this:

service-ca operator might not be running in some HCP environments. Hence, we need CPO help to generate certs for us.

This PR must address hypershift-e2e-aks failures on AKS platform (see also openshift/cluster-storage-operator#662 ).

Summary by CodeRabbit

  • New Features

  • Added metrics services and serving-certificate support for Azure Disk and Azure File CSI driver operators.

  • Improvements

  • Extended PKI reconciliation to cover Azure-specific CSI components.

  • Improved error handling: certificate creation/update failures now surface; non-fatal cleanup failures are logged for troubleshooting.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@mpatlasov
Copy link
Contributor Author

/testwith openshift/hypershift/main/e2e-aks openshift/cluster-storage-operator#686

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 20, 2026

@mpatlasov: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

corev1 "k8s.io/api/core/v1"
)

func ReconcileAzureDiskCsiDriverOperatorMetricsServingCertSecret(secret, ca *corev1.Secret, ownerRef config.OwnerRef) error {
Copy link
Contributor

@jsafrane jsafrane Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: ReconcileAzureDiskCSIDriverOperatorMetricsServingCertSecret

Abbreviations in the middle of an idenitifer are all uppercase. https://google.github.io/styleguide/go/decisions#initialisms

This applies to the whole PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsafrane , I can easily substitute Csi with CSI in my PR -- that would be simple non-intrusive change. But func AzureDiskCsiDriverControllerMetricsServingCert() was introduced 2 years ago in this commit.

It's not nice to mix pre-existing AzureDiskCsiDriverControllerMetricsServingCert and new ReconcileAzureDiskCSIDriverOperatorMetricsServingCertSecret in the same project and go-file. (cc @bryan-cox )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsafrane , After more thinking I came to a trivial idea: let's merge this PR with Csi (because it's consistent with pre-existing AzureDiskCsiDriverControllerMetricsServingCert), and after that I'll open NO-JIRA PR replacing Csi with CSI everywhere (openshift/hypershift). cc @bryan-cox

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally put the manifests into azure.go, with other azure-specific objects. But I'd let HyperShift folks decide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! My logic was like that: If NTO implements func ClusterNodeTuningOperatorMetricsService() in control-plane-operator/controllers/hostedcontrolplane/manifests/nto.go, then we have to implement func AzureDiskCSIDriverOperatorMetricsService() in control-plane-operator/controllers/hostedcontrolplane/manifests/azure_disk_csi_driver_operator.go. I can re-work if HyperShift folks prefer another place for manifests.

Copy link
Member

@bryan-cox bryan-cox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 23, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bryan-cox, mpatlasov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/platform/azure PR/issue for Azure (AzurePlatform) platform jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants