-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Milestone
Description
ISSUE TYPE
- Bug Report
COMPONENT NAME
NFS & SSVM
CLOUDSTACK VERSION
4.17.2
CONFIGURATION
KVM Host, using the advanced network.
OS / ENVIRONMENT
KVM Host & CS Host Both AlmaLinux 8
SUMMARY
NFS Secondary Storage Fails to mount in SSVM upon SSVM or CS restart.
STEPS TO REPRODUCE
Most like will not be able to reproduce it may only be in my environment this is caused.
I have two NFS Secondary storage, and they run on a completely external network entirely remotely from my CloudStack network. These are their public IP addresses:
1. NFS One: 102.165.XXX.YYY
2. NFS Two: 102.165.XXX.ZZZ
I have a private network for my cloud stack environment, which is 192.168.50.0/24 and both my management and storage fall within this network using the same 192.168.50.1 gateway. But keep in mind, my actual NFS storage is remote via internet public IP.
For months I have been connecting to my NFS secondary storage via public IP with zero issues until one morning it just failed. The failed mount error in the logs shows:
```
2023-02-16 10:24:22,423 ERROR [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-2:null) GetRootDir for nfs://nfsip/data/secondary failed due to com.cloud.utils.exception.CloudRuntimeException: Unable to create local folder for: /mnt/SecStorage/91de6d1c-4c04-359c-82b5-fdcfe4a83da7 in order to mount nfs://102.165.XXX.ZZZ/data/secondary
com.cloud.utils.exception.CloudRuntimeException: Unable to create local folder for: /mnt/SecStorage/91de6d1c-4c04-359c-82b5-fdcfe4a83da7 in order to mount nfs://102.165.XXX.ZZZ/data/secondary
```
Upon further investigation, I found that when SSVM or CS is rebooted, the SSVM makes the following entry in the IP Route Table:
- 102.165.XXX.ZZZ via 192.168.50.1 dev eth1
This is correct according to Cloudstack because that's the default gateway for the storage network. However, if I remove this entry from the IP route table in the SSVM, the NFS mount is successful, because now it connects via the default public route.
So here we can see why the mount fails:
```
root@s-145-VM:~# mount -t nfs 102.165.XXX.ZZZ:/data/secondary /mnt/SecStorage/test
mount.nfs: access denied by server while mounting 102.165.XXX.ZZZ:/data/secondary
root@s-145-VM:~# mount -t nfs -vvv 102.165.XXX.ZZZ:/data/secondary /mnt/SecStorage/test
mount.nfs: timeout set for Thu Feb 16 14:07:11 2023
mount.nfs: trying text-based options 'vers=4.2,addr=102.165.XXX.ZZZ,clientaddr=192.168.50.53'
mount.nfs: mount(2): Operation not permitted
mount.nfs: trying text-based options 'addr=102.165.XXX.ZZZ'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 102.165.XXX.ZZZ prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: trying 102.165.XXX.ZZZ prog 100005 vers 3 prot UDP port 892
mount.nfs: mount(2): Permission denied
mount.nfs: access denied by server while mounting 102.165.XXX.ZZZ:/data/secondary
```
As you can see, it's trying to mount from a private IP address `192.168.50.53`, and because NFS is not in the same network, it will fail as it is not permitted.
Now, here's the weird part, like I said this has worked for months. My second NFS secondary storage is also on the same remote network: 102.165.XXX.YYY when I mount to this NFS storage from SSVM it mounts perfectly fine without any issues:
```
root@s-145-VM:~# mount -t nfs -vvv 102.165.XXX.YYY:/data/secondary /mnt/SecStorage/test
mount.nfs: timeout set for Thu Feb 16 14:07:58 2023
mount.nfs: trying text-based options 'vers=4.2,addr=102.165.XXX.YYY,clientaddr=197.189.XXX.YYY'
root@s-145-VM:~#
```
The reason is that SSVM does not end up creating a route in the routing table such as 102.165.XXX.YYY via 192.168.50.1, no. It remains to use the default route (public), which is why we can see the SSVM public IP 197.189.XXX.YYY.
This now leaves me with a few questions:
1. Can I not make use of remote/external NFS storage?
2. Why does SSVM not create a routing path for the second NFS and force it to only create for the 102.165.XXX.ZZZ NFS?
3. Why has this been working for months if the answer to question 1 is no?
4. Why does SSVM mount the second remote NFS server perfectly fine and not just do the same for my one NFS, which I really need.
EXPECTED RESULTS
NFS should mount accordingly regardless of private or public NFS server network as secondary storage.
My Goal
If using external NFS servers is not recommended, then I will happily configure new NFS servers via a private network to work accordingly. However, I am left with a loop problem. Because my NFS with my actual data cannot automatically mount upon CS or SSVM restart, the verification checks on my templates fail. They are 100% downloaded, but their ready state shows as "No". When checking the DB, the `template_view` table, their state shows as "Migrating". Now because NFS does not mount, checks cannot complete, and it no prevent me from moving my data from this existing NFS storage to a new NFS storage.
So if we cannot fix the bug above, is there a way I can set the state of these templates that show as "Migrating" to "Ready" so that I can just move my data accordingly? I have tried to update the table, but I am returned with an SQL error that this table is not updatable.
And if this is not possible, is there any way I can move my data from this NFS storage to a new NFS storage such as using RSYNC or so?
I am REALLY stuck with a looping problem and would appreciate all the efforts possible.
Metadata
Metadata
Assignees
Labels
No labels