Skip to content

Conversation

@ardaguclu
Copy link
Member

@ardaguclu ardaguclu commented Dec 3, 2025

This PR is based on #1872 (changes in enhancements/kube-apiserver/kms-encryption-foundations.md).

There are many aspects that need to be implemented to support KMS in OpenShift. We have decided to open more granular EPs to better track the work.

This EPs main aim is to focus on the encryption controller changes in library-go. This EP defers some concepts to future in order to start with simpler, manageable iterations.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 3, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 3, 2025

@ardaguclu: This pull request references CNTRLPLANE-2120 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

This PR is based on #1872 (changes in enhancements/kube-apiserver/kms-encryption-foundations.md).

There are many aspects that need to be implemented to support KMS in OpenShift. We have decided to open more granular EPs to better track the work.

This EPs main aim is to focus on the encryption controller changes in library-go. This EP defers some concepts to future in order to start with simpler, manageable iterations.

PoC PR openshift/library-go#2045 (this is just a PoC, original PR will be opened when this EP merges).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from hasbro17 and yuqi-zhang December 3, 2025 09:35
@ardaguclu ardaguclu force-pushed the kms-encryption-controllers branch from c091cbc to f734b05 Compare December 3, 2025 09:43
Copy link
Member

@flavianmissi flavianmissi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to take a short break from reviewing, but leaving the comments I got so far.

@ardaguclu
Copy link
Member Author

/cc @ibihim @flavianmissi

@openshift-ci openshift-ci bot requested review from flavianmissi and ibihim December 3, 2025 14:08
@ardaguclu ardaguclu force-pushed the kms-encryption-controllers branch 2 times, most recently from 1794054 to 1ddc3d8 Compare December 4, 2025 07:19
@ardaguclu
Copy link
Member Author

@flavianmissi I was uncomfortable about the disconnects between the sections and the verbosity. So I overhauled the EP to have better clarity. Please let me know your thoughts.

@ardaguclu ardaguclu force-pushed the kms-encryption-controllers branch 3 times, most recently from e920a9c to 5804b76 Compare December 4, 2025 08:35
@ardaguclu ardaguclu force-pushed the kms-encryption-controllers branch from f39a0d7 to 8f79ed6 Compare December 5, 2025 04:21
@flavianmissi
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 5, 2025
@ardaguclu
Copy link
Member Author

/cc @benluddy

@openshift-ci openshift-ci bot requested a review from benluddy December 5, 2025 13:00
@ardaguclu
Copy link
Member Author

As we agreed with @flavianmissi, in next iterations there will be another condition to notify users to delete unused kms plugins from cluster, when prune_controller prunes them.

Copy link
Contributor

@ibihim ibihim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. I have some questions though as a beginner to downstream e2ee with kms


**keyController** manages encryption key lifecycle. Creates encryption key secrets in `openshift-config-managed` namespace. For KMS mode, creates empty secrets with KMS configuration hashes.

**stateController** generates EncryptionConfiguration for API server consumption. Implements distributed state machine ensuring all API servers converge to same revision. For KMS mode, generates configuration with deterministic Unix socket paths.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a deterministic Unix Socket path? Are there probabilistic paths?

Why do we want this? To identify if the KMS is out of date and the address changed? To run several KMS plugins somehow in parallel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To run several KMS plugins somehow in parallel?

This is one of the reasons. Deterministic implies that plugin lifecycle runs kms plugin with a unix socket that can be generated by the encryption controllers to communicate with (plugin lifecycle and encryption controllers are not organically related). So that this works as a contract that provides a deterministic communication.

However, since we likely decide to only support external kms plugins. We won't need this functionality. I proposed this openshift/api#2622 API definition to directly use whatever user sets. Thus, this unix socket generation logic will be removed from the EP.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't the socket path observed by an apiserver be chosen by the operator that mounts it as a host volume? In that case, the endpoints in the apiserver-facing config file wouldn't be 1:1 with user-provided absolute paths.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the EP to directly use unix domain socket path from apiserver.config.openshift.io. This field will be set by cluster admin and we'll directly use it. There isn't any unix domain socket generation mechanism any more.

@ardaguclu
Copy link
Member Author

I'll update this EP base on the changes in openshift/api#2622
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 12, 2025

**keyController** manages encryption key lifecycle. Creates encryption key secrets in `openshift-config-managed` namespace. For KMS mode, creates empty secrets with KMS configuration hashes.

**stateController** generates EncryptionConfiguration for API server consumption. Implements distributed state machine ensuring all API servers converge to same revision. For KMS mode, generates configuration with deterministic Unix socket paths.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't the socket path observed by an apiserver be chosen by the operator that mounts it as a host volume? In that case, the endpoints in the apiserver-facing config file wouldn't be 1:1 with user-provided absolute paths.


**KMS** is the external Key Management Service (AWS KMS, HashiCorp Vault, etc.) that stores and manages the Key Encryption Key (KEK).

**KMS plugin** is a gRPC service implementing Kubernetes KMS v2 API, running as a sidecar to API server pods. It communicates with the external KMS to encrypt/decrypt data encryption keys (DEKs).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't imagine plugins will run as true sidecar containers anymore, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. I've updated the EP based on the last discussions. Please let me know your thoughts about this.


#### Variation: Configuration Changes (Key Rotation)

When cluster admin updates KMS configuration (e.g., new key ARN, different region):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't key_id -- or a hash of key_id -- sufficient by itself to consider previously-encrypted resources "stale"? I would have expected any significant change to a KMS provider config to result in a new key_id (and insignificant changes would not).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't key_id -- or a hash of key_id -- sufficient by itself to consider previously-encrypted resources "stale"?

There are 2 dimensions. First one is to detect key rotation in the same KMS Plugin (we'll track key_id to detect this). Another one is to migrate to new KMS Plugin from old one (we track unix domain socket path to detect this).

2. Compares new hash with hash in most recent encryption key secret annotation.
3. If hashes differ:
- Creates new encryption key secret with new hash
- migrationController automatically triggers re-encryption

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any mechanism to provide backpressure in a situation where a KMS provider is rotating key_id very quickly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point. I don't think we have any, since this is entirely managed externally. But in the future maybe we can provide a new degraded condition by detecting the frequency (I'm not sure how). cc: @flavianmissi

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 15, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 15, 2025

New changes are detected. LGTM label has been removed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 15, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from flavianmissi. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 15, 2025

@ardaguclu: This pull request references CNTRLPLANE-2120 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR is based on #1872 (changes in enhancements/kube-apiserver/kms-encryption-foundations.md).

There are many aspects that need to be implemented to support KMS in OpenShift. We have decided to open more granular EPs to better track the work.

This EPs main aim is to focus on the encryption controller changes in library-go. This EP defers some concepts to future in order to start with simpler, manageable iterations.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ardaguclu
Copy link
Member Author

/hold cancel
This PR is ready for another round of review.

/cc @flavianmissi @ibihim @benluddy

could you PTAL?. Thank you

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 15, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 16, 2025

@ardaguclu: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ardaguclu
Copy link
Member Author

Plugin lifecycle management decision may have an impact on this EP. So I'm adding hold until the decision is made
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants