-
Notifications
You must be signed in to change notification settings - Fork 591
rbd: make block encryption cipher configurable #5711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rbd: make block encryption cipher configurable #5711
Conversation
|
Hi @nixpanic or @Madhu-1 , since you are both very familiar with this repository, I would appreciate your feedback on my recent changes. I know you're busy, so if it would be helpful or save time, I am happy to schedule a quick meeting to walk you through the modifications. Maybe we can use the next Dev standup slot. Please let me know what works best for you. |
@Greenpepper15 Thanks for the work 🎉 , requested for reviews on this one. Lets continue the discussion in the PR itself, if required we can have a this as a topic in one of the standup. |
|
|
||
| # (optional) Instruct the plugin it has to encrypt the volume with this specific cipher. | ||
| # By default it is disabled. If encrypted is false then this flag will be ignored. | ||
| # (optional) Instruct the plugin it has to encrypt the volume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these changes are part of previous commit.
iPraveenParihar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add e2e test with these options enabled in StorageClass?
| "aes-xts-plain64": {}, | ||
| "serpent-xts-plain64": {}, | ||
| "twofish-xts-plain64": {}, | ||
| "aegis128-random": {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think we need to validate keysize for these ciphers?
Like what happens If a user provides cipher aegis128-random and 256 as keysize? Do the luks.Format() fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes if a key size is used that is not compatible with the cipher than luks.format() will fail.
The aegis128-random case is an easy case because if any other key size than 128 is used it will fail. But I am unsure if we should do it to all possible ciphers because than we would couple our implementation with the cipher suites available on the host system.
At the end the behavior would be the same. Either our sanity check will generate an error or luks.format() will generate an error.
But far as I can see from my local testing the quality of the error message from luks.format() varies widely.
For example this
sudo cryptsetup luksFormat --type luks2 --cipher aegis128-random --integrity aead --key-size 256 /dev/vdb
results in
Cipher aegis128-random (key size 256 bits) is not available.
which is good.
But something like
sudo cryptsetup luksFormat --type luks2 --cipher aes-xts-random --integrity hmac-sha256 --key-size 1024 /dev/vdb
(xts modes only allow key sizes of 256 and 512 on my system)
results in
device-mapper: reload ioctl on temporary-cryptsetup-fa35e15a-2c7f-4b6f-aea2-02ff3bfe7624 (253:1) failed: Invalid argument
which will is not very expressive.
Based on this I think a Ceph-csi local (cipher, key size) pair validation seems like a good idea. If people want to support different key sizes they can open a small PR.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about keeping a known good list, and reporting a warning about an unknown combination? That way users may send a PR for additional known good combinations, and are not completely surprised when luksFormat spits out an incomprehensible error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that sounds like a good compromise. I will implement this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey can you check out my recommendation system?
It is implemented in the GetRecommendation() function in cryptsetup.go.
I would be especially interested in feedback on how I report the recommendation to users. I report it via the UsefulLog function. You guys have other ideas?
// EncryptVolume encrypts provided device with LUKS.
func EncryptVolume(ctx context.Context, devicePath, passphrase string, cipher *cryptsetup.EncryptionOptions) error {
log.DebugLog(ctx, "Encrypting device %q with LUKS", devicePath)
if cipher != nil {
recommendation := cryptsetup.GetRecommendation(*cipher)
log.UsefulLog(ctx, "current encryption configuration is: %v", recommendation)
}
_, stdErr, err := luks.Format(devicePath, passphrase, cipher)
...
internal/rbd/encryption.go
Outdated
| return nil | ||
| } | ||
|
|
||
| func ParseCipherOptions(volOptions map[string]string) (*cryptsetup.EncryptionOptions, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add unit test for this.
|
Yes I am working on adding the e2e tests and unit tests. |
nixpanic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, looks quite ok in general. Missing unit+e2e tests are known and worked on, great!
It would be good to point to the cryptsetup documentation somewhere, in the hope it contains a list of valid/recommended/working combinations for ciphers and keysizes.
| return ve.id | ||
| } | ||
|
|
||
| func (ve *VolumeEncryption) CipherOptions() *cryptsetup.EncryptionOptions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new functions and stucts ideally get a small comment, here and in cryptsetup.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment for the exported function
| "aes-xts-plain64": {}, | ||
| "serpent-xts-plain64": {}, | ||
| "twofish-xts-plain64": {}, | ||
| "aegis128-random": {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about keeping a known good list, and reporting a warning about an unknown combination? That way users may send a PR for additional known good combinations, and are not completely surprised when luksFormat spits out an incomprehensible error.
docs/rbd/deploy.md
Outdated
| | `csi.storage.k8s.io/provisioner-secret-namespace`, `csi.storage.k8s.io/node-stage-secret-namespace` | yes (for Kubernetes) | namespaces of the above Secret objects | | ||
| | `mounter` | no | if set to `rbd-nbd`, use `rbd-nbd` on nodes that have `rbd-nbd` and `nbd` kernel modules to map rbd images | | ||
| | `encrypted` | no | disabled by default, use `"true"` to enable either LUKS or fscrypt encryption on PVC and `"false"` to disable it. **Do not change for existing storageclasses** | | ||
| | `encryptionCipher` | no | set the cipher that is used for the volume encryption. **Supported Keywords**: `aes-xts-plain64` (default), `serpent-xts-plain64`, `twofish-xts-plain64`, `aegis128-random` (requires `integrityMode`: `aead` and `encryptionKeySize`: `"128"`) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit has a little too long lines in the commit message, commitlint fails because of that.
Madhu-1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall Looks good to me, @Greenpepper15 Thanks for the great work 🎉 Lets add the missing E2E testing for it.
internal/rbd/encryption_test.go
Outdated
| } | ||
| } | ||
|
|
||
| func comparePtr[T comparable](t *testing.T, fieldName string, expected, actual *T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use https://pkg.go.dev/github.com/stretchr/testify/assert package for it? i think with this we can reduce lot of code in test file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote this test suite. You can check it out.
|
Hey I got a question concerning the e2e tests because I ran into a cyclic dependency. For the e2e test I want to expose a function in /internal/util/cryptsetup/cryptsetup.go that returns all the allowed Ciphers to the e2e tests. Then I can do something like err = createRBDStorageClass(
...
map[string]string{"encrypted": "true",
"encryptionType": crypto.EncryptionTypeBlock.String(),
"encryptionCipher": cryptsetup.GetAllowedCiphers()[0], <-- Return an allowed cipher that I definitely know is valid because it comes from the source
},
deletePolicy)But the e2e tests get the cryptsetup functionality through the file /e2e/vendor/github.com/ceph/ceph-csi/internal/until/crypsetup/cryptsetup.go does not have the new GetAllowedCiphers() defined because it references an old version of crypetup.go. Should I go something like this for now err = createRBDStorageClass(
...
map[string]string{"encrypted": "true",
"encryptionType": crypto.EncryptionTypeBlock.String(),
"encryptionCipher": "aes-xts-plain64", <-- Insert a static string that I know will work now but may change in a later version
},
deletePolicy)And then later open another PR to replace the static assignments with a GetAllowedCiphers() call? |
|
@Greenpepper15 have you tried running |
|
Yes I just tired and it works. Thanks for the advice. |
Yes that should be fine |
|
This pull request now has conflicts with the target branch. Could you please resolve conflicts and force push the corrected changes? 🙏 |
e2e/utils.go
Outdated
| } | ||
|
|
||
| // PraseLuksStatus parses parts of the output of a "cryptsetup luksStatus <device>" command | ||
| func parseLuksStatus(dump string) (luksStatus *cryptsetup.LuksStatus, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the e2e tests I get the encryption state of the PV via cryptsetup status to compare it against the desired encryption state.
Should I put this function into the cryptsetup.go file instead? I initially put it here because only the e2e tests use it but since the cryptsetup.go file encapsulates all the cryptsetup logic I think it would be better suited for it.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parsing code now is in the cryptsetup. go file. I also added a few tests for it.
|
Can somebody trigger the e2e tests? I tested them on my Cluster but I want to see if they work in the CI environment too. |
|
/test ci/centos/mini-e2e-helm/k8s-1.34/rbd |
|
@Greenpepper15 , the tests pass, but I do not see the test-cases you added being executed 🤔 |
|
Yes when I look at the test output I also cannot find the output for my tests. At least I saw that the previous encryption tests succeeded. Does anyone know the reason? On my debug cluster I tested with this test configuration and it worked for me. Or it is because of the merge conflict that is yet to be resolved? |
Maybe the CI job fails to checkout your change because there is a merge conflict? When I execute the same steps, I also do not have your changes to the |
This is only conflict in the merge conflict. I guess if I choose my version it will run without any issues. But we don't want a downgrade version on devel. How about I choose my changes for now and later somebody else calls |
|
I think the issue is that there is a merge conflict. It probably does not matter how you address it. Most practical would be to rebase on top of the latest Once the conflict is resolved, try to leave the |
|
/test ci/centos/mini-e2e-helm/k8s-1.34/rbd |
|
The test failed with this error E1110 20:00:02.940630 35903 crypto.go:238] ID: 244 Req-ID: 0001-0024-3674dcb1-48bf-45a6-8638-8fbb36b033b4-0000000000000004-c95798e0-5f4e-4956-9b64-da1138bba3e3 failed to encrypt device "/dev/rbd0" with LUKS (an error (exit status 1) occurred while running cryptsetup args: [-q luksFormat --type luks2 --hash sha256 --cipher aegis128-random --integrity aead --key-size 128 --luks2-metadata-size 4096k --luks2-keyslots-size 8192k --pbkdf-memory 32768 /dev/rbd0 -d -]): Cipher aegis128-random (key size 128 bits) is not available. This happens when the aegis128 kernel module is not loaded. I will temporarily remove this test for tonight and run the tests again to see the other tests do not fail. Could someone do a |
|
/test ci/centos/mini-e2e-helm/k8s-1.34/rbd |
okay this time it also seems be a time sensitive failure. Only this test uses cloning "validate encrypted (with selected cipher and integrity mode) PVC Clone chained with depth 2". Was it the cause of the failure? |
|
@Greenpepper15 this might be related to the new update in the sidecar which considers the invalid_argument and try to requeue it after 5 minutes, @iPraveenParihar PTAL, we might need to add |
|
Discard my above comments, i thought its related to the PVC but it looks like app, @Greenpepper15 we have not seen the issue on any other CI runs, it might be problem with this PR itself. |
|
We need to increase the test time it looks like 2:30 min is not enough :( |
|
Add a new commit to increase the timeout Line 74 in 468b76c
|
CI tests failed due to timeout. To debug the issue we increase the e2e timeout. Signed-off-by: David Mohren <david.mohren@clyso.com>
Pull request has been modified.
|
Done. Let's try it out. But why do you think this issue arises? When I manually trigger the RBD e2e tests this does not seem to happen. Since AEAD encryption with dmIntegrity and DmCrypt consumes more resources I guess increasing the timeout might help. Otherwise maybe the resource freeing is buggy on larger setups. I will see after we run the tests again. |
|
/test ci/centos/mini-e2e-helm/k8s-1.34/rbd |
|
So what is the current state of things? Should we wait to proceed till the holidays pass? |
nixpanic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☃️
|
@Mergifyio queue |
✅ The pull request has been merged automaticallyDetailsThe pull request has been merged automatically at 004532c |
Merge Queue Status✅ The pull request has been merged at 3d1a5af This pull request spent 3 hours 43 seconds in the queue, including 3 hours 27 seconds running CI. Required conditions to merge
|
|
Yay thanks for the help people! |
Describe what this PR does
The PR adds three new block encryption (RBD encryption) configuration options to customize the encryptions means used.
The option "encryptionCipher" configures the cipher with which the block device is encrypted. The option "encryptionKeySize" is about configuring the key size.
Lastly the option "integrityMode" configures the algorithms the block encryption should use to verify the integrity of consumed data. When this option is not used then no integrity verification happens. This is implemented via the kernel module dmIntegrity. Essentially when cryptsetup receives the “integrityMode” flag then it will set up a dmcrypt mapper device and a dmIntegrity mapper device.
When the integrity option is not used IO stack looks like this
App (Pod) —> dmcrypt mapper —> rbd device
With the integrity option it will look like this
App (Pod) —> dmcrypt mapper —> dmIntegrity mapper —> rbd device
In this setup the dmcrypt device would encrypt data and compute authentication tags (for example hmacs). Meanwhile dmIntegrity would be tasked with storing the encrypted data and corresponding authentication tags on the block device.
Motivation
I want to be able to customize means to employ encryption to match a required security level. Since I added an allowlist for cipher strings not any algorithm can be deployed (which hinders configurability) but it is the most easiest and secure means to validate user input.
Additionally I want to integrate means of verifying the integrity of consumed data because it’s becoming increasingly required from a compliance standpoint. For example the recently released Digital Operational Resilience Act (DORA) legislation from the EU demands that data confidentiality and integrity need to be established. I see the usage of dmIntegrity as the easiest means to achieve this.
Is there anything that requires special attention
Do you have any questions?
Are there concerns around backward compatibility?
Provide any external context for the change, if any.
These sources illustrate why integrating an integrity mode makes sense:
Digital Operational Resilience Act (DORA)
BSI TR-02102-1 "Kryptographische Verfahren: Empfehlungen und Schlüssellängen" Version: 2025-01
Here is useful dmintegrity documentation
Related issues
Resolves #5686
Future concerns
Checklist:
guidelines in the developer
guide.
Request
notes
updated with breaking and/or notable changes for the next major release.
Show available bot commands
These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:
/retest ci/centos/<job-name>: retest the<job-name>after unrelatedfailure (please report the failure too!)