feat: support NFS storage pools #2025

Foxboron · 2025-04-28T18:11:47Z

WIP draft PR for the feature. So far it works, a bit slow on my machine due to the id remapping which should probably be investigated.

Investigate the ID remapping
I'm unsure about the different migration options
Only support vers=4.2, should be fine?
Error out on missing source
Any specific options we should include?
Info array needs some QA. Not sure I understand all of it.
nfs can't support xattr, do we need to handle this in other places than migrate?

Fixes: #1311

Foxboron · 2025-05-08T09:42:26Z

@stgraber if you have any opinions or pointers to the checkboxes feel free to look over and I can investigate a bit.

I suspect some research needs to be done to figure out how we should interact with the uid/gid and squashing behavior of NFS mounts.

bensmrs · 2025-07-21T13:07:52Z

Hi! Is this PR stalled? Do you need help?

Foxboron · 2025-07-21T13:11:00Z

@bensmrs I think I need a bit of help to make sure that the Info struct is correct and that I'm not missing any details from the migration steps. I was hoping @stgraber had time to point in the correct direction on this part.

internal/server/storage/drivers/driver_nfs.go

stgraber · 2025-07-21T14:13:52Z

Ah, I just left a few comments in the Info function now.

bensmrs · 2025-07-21T14:39:02Z

I was hoping @stgraber had time to point in the correct direction on this part.

Well now you’re served :þ

I can help review, fix stuff, and even write tests, don’t hesitate to ping me. I’d actually be happy to see it working pretty soon.

Foxboron · 2025-07-21T14:58:57Z

@bensmrs thanks!

I'll probably not touch this in July, and there is a hackercamp and work stuff happening in August on my end that might not allow me a lot of energy to pick this up after hours.

I'd appreciate some pointers on the id remapping incus does and how that should play with the ID squashing NFS does. I think there should be some guidance and testing there to make sure it behaves as expected. It also takes a bunch of time which might not be needed if nfs reassings UID/GID anyway.

If you have time to write tests and stuff I'd be happy to give you access to my fork.

stgraber · 2025-07-21T19:14:19Z

I don't believe that VFS idmap works on top of NFS at this point, so we'd be dealing with traditional shifting where Incus rewrites all the uid/gid as needed.

What you want to ensure is that NFS is mounted the same way on all machines and that no uid/gid squashing is performed, then that should work fine.

Foxboron · 2025-07-22T11:35:37Z

Should we call the driver nfs4 to make sure we don't end up in a weird situation down the line where we don't want to support a v5 along with the v4 code? Is that realistic?

stgraber · 2025-07-22T16:19:35Z

I think we should stick to nfs as nfs4 may give the impression that we don't support nfs3 whereas the featureset we really need is perfectly fine for NFS3, if anything some of the bits in NFS4 may be problematic (built-in uid/gid mapping and such).

Exactly what kind of server version and configuration we can handle is probably something that's best addressed through the documentation for the driver.

symgryph · 2025-08-25T22:51:39Z

NFS would be SO nice. Just an interested party!

Foxboron · 2025-10-19T15:02:22Z

@bensmrs

I did some changes so we can pass ipv6 source= paths. It's not great but couldn't come up with a more reliable way to split up on ::1:/somepath or similar paths.

What sort of testing do we want on this? Locally creating pools and creating contianers and VMs work. Also attaching additional volumes.

bensmrs · 2025-10-19T15:13:02Z

What sort of testing do we want on this? Locally creating pools and creating contianers and VMs work. Also attaching additional volumes.

We’d basically extend the available filesystem drivers in the test matrix, to test everything we’re already testing:

incus/.github/workflows/tests.yml

Lines 137 to 144 in f614b4c

The tests initialization routines allow you to define how your driver should be initialized in test/backends, although it’s not enough. I can have a quick look at it as soon as I’m done with my other PR.

bensmrs · 2025-10-20T01:07:56Z

I started looking at the tests and am a bit confused by the following error:

> incus storage create incustest-KAg nfs source=10.1.0.227:/media
Error: NFS driver requires "nfs.host" to be specified or included in "source": [<remote host>:]<remote path>

Did I do something wrong here?
IMO, if you have an nfs.host option, I don’t think you should look for a host in source. Just make source be the path on nfs.host, or on localhost if no nfs.host is provided.

Foxboron · 2025-10-20T07:23:47Z

Did I do something wrong here?

Uhh, so it turns out that the Makefile has been moving things from being built into build/ to running go install. So there is a slight chance I have missed testing an iteration while working on this.

IMO, if you have an nfs.host option, I don’t think you should look for a host in source. Just make source be the path on nfs.host, or on localhost if no nfs.host is provided.

I was thinking mimicking the behavior of truenas.host and source from the truenas driver. Where you can just specify nfs.host globally and pass a path to source. But I haven't nailed the entire behavior there I suspect. I'm a little bit unsure what the most ergonomic way for the options to behave, that also harmonizes with the existing drivers.

bensmrs · 2025-10-20T08:24:27Z

Uhh, so it turns out that the Makefile has been moving things from being built into build/ to running go install. So there is a slight chance I have missed testing an iteration while working on this.

No problem, I ended up using nfs.host for driver initialization.

I was thinking mimicking the behavior of truenas.host and source from the truenas driver. Where you can just specify nfs.host globally and pass a path to source. But I haven't nailed the entire behavior there I suspect. I'm a little bit unsure what the most ergonomic way for the options to behave, that also harmonizes with the existing drivers.

Ok I can’t beat this argument :)

I’m getting some permission denied errors in my tests, I’m investigating.

bensmrs · 2025-10-20T12:06:45Z

So, maybe I’m missing something when initializing the pool, but I can’t get past permission errors. Container creation and launch work well, but that’s not enough, see https://github.com/bensmrs/incus/actions/runs/18648669885/job/53161380407?pr=4

Am I missing something in my setup? See

incus/.github/workflows/tests.yml

Lines 394 to 395 in 5733ce8

    
                     echo "/media 10.0.0.0/8(rw,sync,no_subtree_check,no_root_squash,no_all_squash)" | sudo tee /etc/exports 
        
                     sudo exportfs -a

Foxboron · 2025-10-20T12:42:02Z

Hmmm, I've only tried incus exec container bash and manually created files on NFS volumes on both ends. And it worked. Are we sure that /media doesn't have any weird permissions in ubuntu? Does it exist etc?

bensmrs · 2025-10-20T12:51:31Z

Container launch works, so regular operations don’t seem to cause any problem.
I’ll debug a bit more later, I was just wondering if your setup had anything specific other than the squash options.

bensmrs · 2025-10-20T17:09:02Z

Well I’m currently out of ideas. user_is_instance_user fails with a permission error. We can’t list /root from within the container, it appears owned by 0:0, whoami fails because user 0 is not found, and from the host, the mount appears owned by 1000000:1000000. I don’t know where to go from there; maybe I’m missing something obvious happening in the OpenFGA tests.

Suggested-by: Morten Linderud <morten@linderud.pw> Signed-off-by: Benjamin Somers <benjamin.somers@imt-atlantique.fr>

Signed-off-by: Benjamin Somers <benjamin.somers@imt-atlantique.fr>

jack9603301 · 2025-10-22T18:58:17Z

This draft seems very cool, but why not choose to manually perform NFS automatic mounting from the bottom of the operating system and then run incus in dir mode?

stgraber · 2025-10-22T19:00:21Z

This draft seems very cool, but why not choose to manually perform NFS automatic mounting from the bottom of the operating system and then run incus in dir mode?

If you do that, then Incus doesn't know that the same data can be accessed from all systems in a cluster.

Foxboron · 2025-10-22T19:02:19Z

This draft seems very cool, but why not choose to manually perform NFS automatic mounting from the bottom of the operating system and then run incus in dir mode?

I would really want to avoid having to manage the cluster node fstabs for every new NFS mount I want.

jack9603301 · 2025-10-22T19:04:27Z

I would really want to avoid having to manage the cluster node fstabs for every new NFS mount I want.

Seems like a good reason

jack9603301 · 2025-10-22T19:05:04Z

If you do that, then Incus doesn't know that the same data can be accessed from all systems in a cluster.

Sorry, I don't understand what you mean

bensmrs · 2025-10-22T20:50:56Z

If you do that, then Incus doesn't know that the same data can be accessed from all systems in a cluster.

Sorry, I don't understand what you mean

The dir backend is not suited for cluster operation because it just considers the local filesystems available to the host. This makes moving data from node to node basically copy it from one node to the other. There are two problems in using NFS with this: i) copying the data is useless, as it is available to every node, and, more critical even, ii) moving data, e.g. instances, would essentially make the receiving node write such data at the same place as it is currently being stored, without any kind of FS locking mechanism.

jack9603301 · 2025-10-23T07:10:53Z

The dir backend is not suited for cluster operation because it just considers the local filesystems available to the host. This makes moving data from node to node basically copy it from one node to the other. There are two problems in using NFS with this: i) copying the data is useless, as it is available to every node, and, more critical even, ii) moving data, e.g. instances, would essentially make the receiving node write such data at the same place as it is currently being stored, without any kind of FS locking mechanism.

I don't think it's a problem in regular operation and maintenance applications, because the dir backend ultimately relies on the Linux file system tree structure, which will bring more options for deployment. For example, operation and maintenance personnel can choose NFS server remote control mounting (with a single point of failure), Ceph cluster file system or glusterfs cluster file system, DRBD disk synchronization system and deploy incus container data on top. The data will always be automatically synchronized by the underlying system. This is Linux - each part only does what it does best.

Foxboron · 2025-10-23T07:22:51Z

I don't think it's a problem in regular operation and maintenance applications, because the dir backend ultimately relies on the Linux file system tree structure, which will bring more options for deployment. For example, operation and maintenance personnel can choose NFS server remote control mounting (with a single point of failure), Ceph cluster file system or glusterfs cluster file system, DRBD disk synchronization system and deploy incus container data on top. The data will always be automatically synchronized by the underlying system. This is Linux - each part only does what it does best.

I don't think arguing against the merits of an NFS storage driver in the PR adding said storage driver is helpful nor welcome.

jack9603301 · 2025-10-23T08:36:23Z

I don't think arguing against the merits of an NFS storage driver in the PR adding said storage driver is helpful nor welcome.

Sorry, I don't understand what you mean

bensmrs · 2025-10-23T08:41:08Z

Please keep debate away from this PR. The forum is here for this kind of things, and, more moderately, the issue section, to discuss early design-stage decisions. Your use case, as legitimate as it may seem to you, unfortunately doesn’t cover a tenth of all the use cases of Incus users which we have to account for.

jack9603301 · 2025-10-23T08:43:38Z

Please keep debate away from this PR. The forum is here for this kind of things, and, more moderately, the issue section, to discuss early design-stage decisions. Your use case, as legitimate as it may seem to you, unfortunately doesn’t cover a tenth of all the use cases of Incus users which we have to account for.

OK, but in fact this is the most typical problem of production deployment. They may not use nfs backend. Due to the complexity of the production system, they directly use dir backend.

bensmrs · 2025-10-23T11:00:41Z

The more I think about it, the more I feel we need to intercept syscalls. It shouldn’t be incredibly hard, and I think it would look pretty similar to the pre-existing setxattr interception. However, special care would need to be taken to filter out non-NFS-mounted paths. WDYT @stgraber? Should we go that way?

(The other idea could be to mount the FS at another location and perform bind-mounting to the actual rootfs location.)

stgraber · 2025-10-23T20:22:29Z

The more I think about it, the more I feel we need to intercept syscalls. It shouldn’t be incredibly hard, and I think it would look pretty similar to the pre-existing setxattr interception. However, special care would need to be taken to filter out non-NFS-mounted paths. WDYT @stgraber? Should we go that way?

(The other idea could be to mount the FS at another location and perform bind-mounting to the actual rootfs location.)

Can you provide a summary of the issue as it currently stands? There's been a lot of back and forth over the past few days :)

In general, if we're talking about intercepting chown, that'd be a terrible idea because it's called extremely frequently (so could cause a DoS) and has various variants that may be very hard to handle safely (including calls from within namespaces or chroots).

Foxboron · 2025-10-23T20:29:07Z

Can you provide a summary of the issue as it currently stands? There's been a lot of back and forth over the past few days :)

On an NFS root chown does not work.

Minimal reproducer from what I can tell.
lxc-usernsexec -m u:0:1000000:1000000000 -m g:0:1000000:1000000000 -- /bin/chown 55:55 testfile

It's unclear to me why this is. I have ruled out busybox bugs as this happens with an Arch userland as well as the busybox test images.

stgraber · 2025-10-23T21:08:57Z

@brauner can you look into this with your VFS maintainer hat on?

To reproduce this on pengar, you can do:

root@pengar:~# mkdir -p /mnt/nfs
root@pengar:~# mount -t nfs truenas01.shf.lab.linuxcontainers.org:/mnt/test/incus-nfs /mnt/nfs
Created symlink /run/systemd/system/remote-fs.target.wants/rpc-statd.service → /lib/systemd/system/rpc-statd.service.
root@pengar:~# ls -lh /mnt/nfs
total 512
drwxr-xr-x 2 100000 100000 3 Oct 23 21:01 rootfs
root@pengar:~# ls -lh /mnt/nfs/rootfs/
total 512
-rw-r--r-- 1 100000 100000 0 Oct 23 21:01 foo
root@pengar:~# lxc-usernsexec -m b:0:100000:65536 -- chown 1234:1234 /mnt/nfs/rootfs/foo
chown: changing ownership of '/mnt/nfs/rootfs/foo': Operation not permitted
root@pengar:~# lxc-usernsexec -m b:0:100000:65536 -- ls -l /mnt/nfs/rootfs/foo
-rw-r--r-- 1 root root 0 Oct 23 21:01 /mnt/nfs/rootfs/foo
root@pengar:~# chown 101234:101234 /mnt/nfs/rootfs/foo 
root@pengar:~# lxc-usernsexec -m b:0:100000:65536 -- ls -l /mnt/nfs/rootfs/foo
-rw-r--r-- 1 1234 1234 0 Oct 23 21:01 /mnt/nfs/rootfs/foo

So basically, doing the chown from within a userns going through the uid/gid map fails but doing the exact same chown as real root on the host and then accessing the result from within the namespace is fine.

Pengar is on 6.16.4, I've also reproduced it on 6.16.11. Not sure what the other two have been running, but probably similarly recent kernels. This is NFSv3 so there shouldn't be too much NFS black magic in theory.

stgraber · 2025-10-28T22:35:43Z

Okay, so @brauner and I got to the bottom of this one and it's unfortunately not good.

There is no fundamental kernel issue here, it's more of a fundamental design issue with how NFS works...

Since kernel idmap shift isn't supported on top of NFS, we instead only rely on the user namespace. This means that when we have root in the container (uid=0) outside, chown /foo from its current ownership (uid/gid 0/0) to a new ownership (uid/gid 1234/4567), it actually gets translated to uid 100000 wanting to chown a file owned by 100000/100000 to 101234/104567.

That's the request (SETATTR) which gets sent to the server and the server rejects that as the credential (uid=100000/gid=100000) isn't privileged over the target uid/gid (101234/104567).

There is no provision in the NFS protocol to pass through the uidmap/gidmap to effectively inform the server of what uid/gid range the requestor is privileged over and so no way to actually properly handle such a request.

stgraber · 2025-10-28T22:42:11Z

So we have a few things to consider here:

At the kernel level, if we were to implement VFS idmap on top of NFSv3 (v4 would be a massive headache), it would solve this situation, but ONLY for containers where the container's uid=0 is mapped to the host's uid=0, any other situation would still fail the server-side check
At the userspace level, there is an NFS FUSE client (https://github.com/sahlberg/fuse-nfs) but it's FUSE2 only as far as I can tell and as a result, doens't have @mihalicyn's support for VFS idmap. Getting it to support VFS idmap would provide a (slower) way forward.
We limit ourselves to what works today. That is 1) shared custom volumes (with a clear note about chown not working from within an unpriv container) 2) VM storage (possibly)
seems safe to do now, it's basically just putting some extra checks on this branch to provide the limited subset. 2) would be nice to have as that NFS client codebase isn't too hard to build, so if we can apply something like (mihalicyn/fuse-overlayfs@89a1af3) to it, it would provide a slow but otherwise functional option for unpriv containers. 1) would need a motivated kernel dev to work on, as @brauner was pointing out, the fact that nobody really talked about this issue before doesn't bode too well as far as general interest for something like this.

bensmrs · 2025-10-28T22:51:44Z

And couldn’t an idmapped bind mount be a solution? The host mounts the NFS shares somewhere, then bind-mounts it to /var/lib/incus/containers/<instance>/ with the proper shift.

Or maybe it’s bindfs-specific, in which case we’re back to using FUSE…

stgraber · 2025-10-28T23:34:51Z

And couldn’t an idmapped bind mount be a solution? The host mounts the NFS shares somewhere, then bind-mounts it to /var/lib/incus/containers/<instance>/ with the proper shift.

Or maybe it’s bindfs-specific, in which case we’re back to using FUSE…

That's 1), idmap bind-mount requires per-filesystem kernel code to handle it. NFS does not have support for it. FUSE in general does but requires each filesystem to add some logic, which is why I also brought up 2) as a quicker option to get something done.

bensmrs · 2025-10-28T23:48:02Z

Ok, I had in mind bindfs’ uid-offset and gid-offset options and just thought that maybe this translation mechanism existed for bind mounts (the bind of bindfs made my brain shortcut to my previous question).

I’ll have a quick look at fuse-nfs.

bensmrs · 2025-10-29T03:26:50Z

https://github.com/bensmrs/fuse-nfs should be ok-ish for FUSE3; I’ll try to implement VFS idmap tomorrow.

bensmrs · 2025-10-29T11:48:42Z

My patches should now work (I also integrated a long-waiting PR that I feel is desirable for us), but testing it will probably be painful. Are there quick ways for me to test the patched libfuse, or do I have to compile @mihalicyn’s tree? (in which case, big meh on my side as I have little time for it).
Or you could test on your side @stgraber if you already have everything ready…

stgraber · 2025-10-29T11:53:04Z

Cool!

If we get this working, we'd probably do a scheme like /var/lib/storage-pools/my-nfs is the NFS kernel mount, then /var/lib/storge-pools/my-nfs/.fuse is the same share but mounted over FUSE. Then whenever dealing with an unprivileged container, we'd use the .fuse path and everything else goes through the kernel client.

Foxboron mentioned this pull request May 1, 2025

Add a nfs storage driver #1311

Open

mrstux mentioned this pull request May 27, 2025

incusd/storage: fix squashfs unpacking to NFS destinations #2148

Closed

stgraber reviewed Jul 21, 2025

View reviewed changes

internal/server/storage/drivers/driver_nfs.go Show resolved Hide resolved

stgraber reviewed Jul 21, 2025

View reviewed changes

internal/server/storage/drivers/driver_nfs.go Outdated Show resolved Hide resolved

stgraber reviewed Jul 21, 2025

View reviewed changes

internal/server/storage/drivers/driver_nfs.go Show resolved Hide resolved

Foxboron force-pushed the morten/nfs branch 2 times, most recently from b617326 to 49f1676 Compare October 19, 2025 15:00

Foxboron marked this pull request as ready for review October 19, 2025 15:00

Foxboron force-pushed the morten/nfs branch 2 times, most recently from 0080a77 to 7070ad5 Compare October 19, 2025 20:32

github-actions bot added the Documentation Documentation needs updating label Oct 19, 2025

test: Fix Busybox permissions

0dcfb1b

Suggested-by: Morten Linderud <morten@linderud.pw> Signed-off-by: Benjamin Somers <benjamin.somers@imt-atlantique.fr>

bensmrs force-pushed the morten/nfs branch from 54bcee8 to cbe861a Compare October 22, 2025 18:53

bensmrs added 4 commits October 22, 2025 18:53

test: Add standalone NFS tests

550139c

Signed-off-by: Benjamin Somers <benjamin.somers@imt-atlantique.fr>

test: Add clustered NFS tests

edad9e9

Signed-off-by: Benjamin Somers <benjamin.somers@imt-atlantique.fr>

github: Add NFS tests

b75801f

Signed-off-by: Benjamin Somers <benjamin.somers@imt-atlantique.fr>

api: storage_driver_nfs

cbe861a

Signed-off-by: Benjamin Somers <benjamin.somers@imt-atlantique.fr>

stgraber marked this pull request as draft October 29, 2025 01:16

Uh oh!

feat: support NFS storage pools #2025

Are you sure you want to change the base?

feat: support NFS storage pools #2025

Uh oh!

Conversation

Foxboron commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Foxboron commented May 8, 2025

Uh oh!

bensmrs commented Jul 21, 2025

Uh oh!

Foxboron commented Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stgraber commented Jul 21, 2025

Uh oh!

bensmrs commented Jul 21, 2025

Uh oh!

Foxboron commented Jul 21, 2025

Uh oh!

stgraber commented Jul 21, 2025

Uh oh!

Foxboron commented Jul 22, 2025

Uh oh!

stgraber commented Jul 22, 2025

Uh oh!

symgryph commented Aug 25, 2025

Uh oh!

Foxboron commented Oct 19, 2025

Uh oh!

bensmrs commented Oct 19, 2025

Uh oh!

bensmrs commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Foxboron commented Oct 20, 2025

Uh oh!

bensmrs commented Oct 20, 2025

Uh oh!

bensmrs commented Oct 20, 2025

Uh oh!

Foxboron commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bensmrs commented Oct 20, 2025

Uh oh!

bensmrs commented Oct 20, 2025

Uh oh!

jack9603301 commented Oct 22, 2025

Uh oh!

stgraber commented Oct 22, 2025

Uh oh!

Foxboron commented Oct 22, 2025

Uh oh!

jack9603301 commented Oct 22, 2025

Uh oh!

jack9603301 commented Oct 22, 2025

Uh oh!

bensmrs commented Oct 22, 2025

Uh oh!

jack9603301 commented Oct 23, 2025

Uh oh!

Foxboron commented Oct 23, 2025

Uh oh!

jack9603301 commented Oct 23, 2025

Uh oh!

bensmrs commented Oct 23, 2025

Uh oh!

jack9603301 commented Oct 23, 2025

Uh oh!

bensmrs commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stgraber commented Oct 23, 2025

Uh oh!

Foxboron commented Oct 23, 2025

Foxboron commented Apr 28, 2025 •

edited

Loading

bensmrs commented Oct 20, 2025 •

edited

Loading

Foxboron commented Oct 20, 2025 •

edited

Loading

bensmrs commented Oct 23, 2025 •

edited

Loading

stgraber commented Oct 23, 2025 •

edited

Loading

bensmrs commented Oct 28, 2025 •

edited

Loading

bensmrs commented Oct 29, 2025 •

edited

Loading