-
-
Notifications
You must be signed in to change notification settings - Fork 363
feat: support NFS storage pools #2025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@stgraber if you have any opinions or pointers to the checkboxes feel free to look over and I can investigate a bit. I suspect some research needs to be done to figure out how we should interact with the uid/gid and squashing behavior of NFS mounts. |
|
Hi! Is this PR stalled? Do you need help? |
|
Ah, I just left a few comments in the Info function now. |
Well now you’re served :þ I can help review, fix stuff, and even write tests, don’t hesitate to ping me. I’d actually be happy to see it working pretty soon. |
|
@bensmrs thanks! I'll probably not touch this in July, and there is a hackercamp and work stuff happening in August on my end that might not allow me a lot of energy to pick this up after hours. I'd appreciate some pointers on the id remapping incus does and how that should play with the ID squashing NFS does. I think there should be some guidance and testing there to make sure it behaves as expected. It also takes a bunch of time which might not be needed if nfs reassings UID/GID anyway. If you have time to write tests and stuff I'd be happy to give you access to my fork. |
|
I don't believe that VFS idmap works on top of NFS at this point, so we'd be dealing with traditional shifting where Incus rewrites all the uid/gid as needed. What you want to ensure is that NFS is mounted the same way on all machines and that no uid/gid squashing is performed, then that should work fine. |
|
Should we call the driver |
|
I think we should stick to Exactly what kind of server version and configuration we can handle is probably something that's best addressed through the documentation for the driver. |
|
NFS would be SO nice. Just an interested party! |
b617326 to
49f1676
Compare
|
I did some changes so we can pass ipv6 What sort of testing do we want on this? Locally creating pools and creating contianers and VMs work. Also attaching additional volumes. |
We’d basically extend the available filesystem drivers in the test matrix, to test everything we’re already testing: incus/.github/workflows/tests.yml Lines 137 to 144 in f614b4c
The tests initialization routines allow you to define how your driver should be initialized in |
0080a77 to
7070ad5
Compare
|
I started looking at the tests and am a bit confused by the following error: Did I do something wrong here? |
Uhh, so it turns out that the
I was thinking mimicking the behavior of |
No problem, I ended up using
Ok I can’t beat this argument :) I’m getting some permission denied errors in my tests, I’m investigating. |
|
So, maybe I’m missing something when initializing the pool, but I can’t get past permission errors. Container creation and launch work well, but that’s not enough, see https://github.com/bensmrs/incus/actions/runs/18648669885/job/53161380407?pr=4 Am I missing something in my setup? See incus/.github/workflows/tests.yml Lines 394 to 395 in 5733ce8
|
|
Hmmm, I've only tried |
|
Container launch works, so regular operations don’t seem to cause any problem. |
|
Well I’m currently out of ideas. |
Suggested-by: Morten Linderud <morten@linderud.pw> Signed-off-by: Benjamin Somers <benjamin.somers@imt-atlantique.fr>
Signed-off-by: Benjamin Somers <benjamin.somers@imt-atlantique.fr>
Signed-off-by: Benjamin Somers <benjamin.somers@imt-atlantique.fr>
Signed-off-by: Benjamin Somers <benjamin.somers@imt-atlantique.fr>
Signed-off-by: Benjamin Somers <benjamin.somers@imt-atlantique.fr>
|
This draft seems very cool, but why not choose to manually perform NFS automatic mounting from the bottom of the operating system and then run incus in dir mode? |
If you do that, then Incus doesn't know that the same data can be accessed from all systems in a cluster. |
I would really want to avoid having to manage the cluster node |
Seems like a good reason |
Sorry, I don't understand what you mean |
The |
I don't think it's a problem in regular operation and maintenance applications, because the dir backend ultimately relies on the Linux file system tree structure, which will bring more options for deployment. For example, operation and maintenance personnel can choose NFS server remote control mounting (with a single point of failure), Ceph cluster file system or glusterfs cluster file system, DRBD disk synchronization system and deploy incus container data on top. The data will always be automatically synchronized by the underlying system. This is Linux - each part only does what it does best. |
I don't think arguing against the merits of an NFS storage driver in the PR adding said storage driver is helpful nor welcome. |
Sorry, I don't understand what you mean |
|
Please keep debate away from this PR. The forum is here for this kind of things, and, more moderately, the issue section, to discuss early design-stage decisions. Your use case, as legitimate as it may seem to you, unfortunately doesn’t cover a tenth of all the use cases of Incus users which we have to account for. |
OK, but in fact this is the most typical problem of production deployment. They may not use nfs backend. Due to the complexity of the production system, they directly use dir backend. |
|
The more I think about it, the more I feel we need to intercept syscalls. It shouldn’t be incredibly hard, and I think it would look pretty similar to the pre-existing (The other idea could be to mount the FS at another location and perform bind-mounting to the actual rootfs location.) |
Can you provide a summary of the issue as it currently stands? There's been a lot of back and forth over the past few days :) In general, if we're talking about intercepting chown, that'd be a terrible idea because it's called extremely frequently (so could cause a DoS) and has various variants that may be very hard to handle safely (including calls from within namespaces or chroots). |
On an NFS root Minimal reproducer from what I can tell. It's unclear to me why this is. I have ruled out |
|
@brauner can you look into this with your VFS maintainer hat on? To reproduce this on pengar, you can do: So basically, doing the chown from within a userns going through the uid/gid map fails but doing the exact same chown as real root on the host and then accessing the result from within the namespace is fine. Pengar is on 6.16.4, I've also reproduced it on 6.16.11. Not sure what the other two have been running, but probably similarly recent kernels. This is NFSv3 so there shouldn't be too much NFS black magic in theory. |
|
Okay, so @brauner and I got to the bottom of this one and it's unfortunately not good. There is no fundamental kernel issue here, it's more of a fundamental design issue with how NFS works... Since kernel idmap shift isn't supported on top of NFS, we instead only rely on the user namespace. This means that when we have root in the container (uid=0) outside, chown /foo from its current ownership (uid/gid 0/0) to a new ownership (uid/gid 1234/4567), it actually gets translated to uid 100000 wanting to chown a file owned by 100000/100000 to 101234/104567. That's the request (SETATTR) which gets sent to the server and the server rejects that as the credential (uid=100000/gid=100000) isn't privileged over the target uid/gid (101234/104567). There is no provision in the NFS protocol to pass through the uidmap/gidmap to effectively inform the server of what uid/gid range the requestor is privileged over and so no way to actually properly handle such a request. |
|
So we have a few things to consider here:
|
|
And couldn’t an idmapped bind mount be a solution? The host mounts the NFS shares somewhere, then bind-mounts it to Or maybe it’s bindfs-specific, in which case we’re back to using FUSE… |
That's 1), idmap bind-mount requires per-filesystem kernel code to handle it. NFS does not have support for it. FUSE in general does but requires each filesystem to add some logic, which is why I also brought up 2) as a quicker option to get something done. |
|
Ok, I had in mind bindfs’ I’ll have a quick look at fuse-nfs. |
|
https://github.com/bensmrs/fuse-nfs should be ok-ish for FUSE3; I’ll try to implement VFS idmap tomorrow. |
|
My patches should now work (I also integrated a long-waiting PR that I feel is desirable for us), but testing it will probably be painful. Are there quick ways for me to test the patched libfuse, or do I have to compile @mihalicyn’s tree? (in which case, big meh on my side as I have little time for it). |
|
Cool! If we get this working, we'd probably do a scheme like |
WIP draft PR for the feature. So far it works, a bit slow on my machine due to the id remapping which should probably be investigated.
vers=4.2, should be fine?sourceInfoarray needs some QA. Not sure I understand all of it.nfscan't supportxattr, do we need to handle this in other places than migrate?Fixes: #1311