Problem
When performing a backup of a VM provisioned from a Fedora DataSource (using dataVolumeTemplates with sourceRef), the KubeVirt VMBackup hotplug step fails with:
Warning HotplugFailed virtualmachineinstance/test-vm
failed to mount filesystem hotplug volume vmb-...-backup-target-pvc:
lstat /proc/1/root/var/lib/kubelet/plugins/kubernetes.io/csi/
openshift-storage.rbd.csi.ceph.com/.../disk.img: no such file or directory
The VMBackup still reports Done: True despite this failure, resulting in an empty backup target PVC (directory structure exists but no qcow2 files). The datamover pod then fails with no qcow2 files found in /backup-data.
Environment
- OpenShift CNV
v4.99.0-0.1771785652
- Storage:
ocs-storagecluster-ceph-rbd
- Feature gates enabled:
IncrementalBackup, UtilityVolumes, HotplugVolumes
What works vs. what doesn't
| VM |
Root Disk |
Machine Type |
Firmware |
Hotplug |
Backup |
| cirros (containerdisk source, 150Mi, RWO Block) |
Small containerdisk |
q35 |
BIOS |
Works |
Works |
| test-vm (Fedora DataSource, 30Gi, RWX Block) |
Large DataSource |
pc-q35-rhel9.2.0 |
EFI + SMM |
HotplugFailed |
Empty PVC |
Both VMs have changedBlockTracking: "true". Both backup target PVCs are created identically (10Gi, RWO, Filesystem mode, same storage class).
How the backup target PVC is created
Our datamover controller creates the backup target PVC via ensureTempPVC() — a plain Filesystem PVC with no special configuration:
pvc := &corev1.PersistentVolumeClaim{
ObjectMeta: metav1.ObjectMeta{
Name: pvcName, // "kubevirt-backup-<du.Name>"
Namespace: namespace, // VM namespace
},
Spec: corev1.PersistentVolumeClaimSpec{
AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce},
Resources: corev1.VolumeResourceRequirements{
Requests: corev1.ResourceList{
corev1.ResourceStorage: resource.MustParse("10Gi"),
},
},
// No volumeMode specified (defaults to Filesystem)
// No storageClassName specified (uses cluster default)
},
}
Question for KubeVirt team: Should the backup target PVC be created differently? Does the VMBackup API expect a specific volumeMode, annotation, or pre-existing disk.img file?
Observed behavior
- Controller creates temp PVC
kubevirt-backup-<name> in VM namespace (Filesystem, 10Gi, empty)
- VMBackup CR is created with
forceFullBackup: true
- virt-handler logs
"successfully mounted" at mount.go:561 (Kubernetes-level PVC attachment succeeds)
- HotplugFailed: virt-handler's filesystem hotplug into the VM fails — looks for
disk.img inside the freshly provisioned empty PVC
- VMBackup reports
VirtualMachineBackupCompletedSuccessfully / Done: True despite the hotplug failure
- QEMU never receives the backup volume — virt-launcher logs show zero backup activity (no backup commands, no NBD export, no volume attachment inside the VM)
- PV is rebound to OADP namespace, datamover pod mounts it
- Debug pod inspection confirmed the PVC is empty — the checkpoint directory structure exists (
test-vm/<checkpoint-name>/) but contains no files at all
- Datamover pod fails:
no qcow2 files found in /backup-data
Reproduced consistently across 4 backup attempts (b, c, d, e) for test-vm.
Key observations from investigation
The disk.img expectation
KubeVirt's filesystem PV disk documentation states that for standard filesystem hotplug volumes, a disk.img file must exist inside the PVC (auto-created for regular VM disks). However, the backup target PVC is not a VM disk — it's a destination for qcow2 export data. Nobody creates disk.img in it, and the hotplug code doesn't distinguish between a regular disk PVC and a backup target PVC.
Two-step mount process
The mount at mount.go:561 (Kubernetes-level PVC attachment to the virt-launcher pod) succeeds for both VMs. The failure happens at the second step — virt-handler's filesystem hotplug into the VM — which tries to bind-mount disk.img from the PVC into QEMU. This second step succeeds for cirros but fails for test-vm, despite identical backup target PVC setup.
Mount duration difference
The backup volume mount duration differs between the two VMs:
- cirros: mounted for ~1 second before unmount
- test-vm: mounted for ~33 seconds before unmount
UtilityVolumes feature gate
The UtilityVolumes feature gate is enabled on the cluster. KubeVirt introduced UtilityVolumes (PR #15922) specifically for backup PVC attachment — UtilityVolumes mount as a directory (no disk.img needed) rather than using the standard filesystem hotplug path. However, the VMBackup controller appears to attach the backup PVC as a regular hotplug volume rather than via spec.utilityVolumes on the VMI.
VMBackup completion status is misleading
- cirros:
VirtualMachineBackupCompletedWithWarning (guest agent not connected) — backup data was written
- test-vm:
VirtualMachineBackupCompletedSuccessfully — but no backup data was written at all
The "successful" completion is incorrect. The VMBackup should detect the HotplugFailed condition and report failure.
Questions for KubeVirt/CNV team
-
Why does hotplug work for cirros but fail for test-vm? Both backup target PVCs are identical. The VM configuration differences (EFI firmware, machine type pc-q35-rhel9.2.0, disk size/access mode) may influence which hotplug code path is taken.
-
Should the backup target PVC use UtilityVolumes? The UtilityVolumes feature gate is enabled but the VMBackup controller doesn't seem to use it. Is this expected for this KubeVirt version?
-
Should we create the backup target PVC differently? Does it need Block volumeMode, a pre-created disk.img, or specific annotations?
-
VMBackup should fail when HotplugFailed occurs. Currently it reports Done: True with no data written. This makes it impossible for downstream consumers to detect the failure from the VMBackup status alone.
How to reproduce
- Deploy a Fedora VM from the
fedora DataSource in openshift-virtualization-os-images with changedBlockTracking: "true", EFI firmware, pc-q35-rhel9.2.0 machine type
- Trigger a Velero backup with
snapshotMoveData: true
- Observe
HotplugFailed events: oc get events -n <vm-namespace> --field-selector reason=HotplugFailed
- Inspect the backup PVC contents — checkpoint directory will be empty
- Observe datamover pod failure:
no qcow2 files found in /backup-data
Problem
When performing a backup of a VM provisioned from a Fedora DataSource (using
dataVolumeTemplateswithsourceRef), the KubeVirt VMBackup hotplug step fails with:The VMBackup still reports
Done: Truedespite this failure, resulting in an empty backup target PVC (directory structure exists but no qcow2 files). The datamover pod then fails withno qcow2 files found in /backup-data.Environment
v4.99.0-0.1771785652ocs-storagecluster-ceph-rbdIncrementalBackup,UtilityVolumes,HotplugVolumesWhat works vs. what doesn't
Both VMs have
changedBlockTracking: "true". Both backup target PVCs are created identically (10Gi, RWO, Filesystem mode, same storage class).How the backup target PVC is created
Our datamover controller creates the backup target PVC via
ensureTempPVC()— a plain Filesystem PVC with no special configuration:Question for KubeVirt team: Should the backup target PVC be created differently? Does the VMBackup API expect a specific volumeMode, annotation, or pre-existing
disk.imgfile?Observed behavior
kubevirt-backup-<name>in VM namespace (Filesystem, 10Gi, empty)forceFullBackup: true"successfully mounted"at mount.go:561 (Kubernetes-level PVC attachment succeeds)disk.imginside the freshly provisioned empty PVCVirtualMachineBackupCompletedSuccessfully/Done: Truedespite the hotplug failuretest-vm/<checkpoint-name>/) but contains no files at allno qcow2 files found in /backup-dataReproduced consistently across 4 backup attempts (b, c, d, e) for test-vm.
Key observations from investigation
The
disk.imgexpectationKubeVirt's filesystem PV disk documentation states that for standard filesystem hotplug volumes, a
disk.imgfile must exist inside the PVC (auto-created for regular VM disks). However, the backup target PVC is not a VM disk — it's a destination for qcow2 export data. Nobody createsdisk.imgin it, and the hotplug code doesn't distinguish between a regular disk PVC and a backup target PVC.Two-step mount process
The mount at mount.go:561 (Kubernetes-level PVC attachment to the virt-launcher pod) succeeds for both VMs. The failure happens at the second step — virt-handler's filesystem hotplug into the VM — which tries to bind-mount
disk.imgfrom the PVC into QEMU. This second step succeeds for cirros but fails for test-vm, despite identical backup target PVC setup.Mount duration difference
The backup volume mount duration differs between the two VMs:
UtilityVolumes feature gate
The
UtilityVolumesfeature gate is enabled on the cluster. KubeVirt introduced UtilityVolumes (PR #15922) specifically for backup PVC attachment — UtilityVolumes mount as a directory (nodisk.imgneeded) rather than using the standard filesystem hotplug path. However, the VMBackup controller appears to attach the backup PVC as a regular hotplug volume rather than viaspec.utilityVolumeson the VMI.VMBackup completion status is misleading
VirtualMachineBackupCompletedWithWarning(guest agent not connected) — backup data was writtenVirtualMachineBackupCompletedSuccessfully— but no backup data was written at allThe "successful" completion is incorrect. The VMBackup should detect the HotplugFailed condition and report failure.
Questions for KubeVirt/CNV team
Why does hotplug work for cirros but fail for test-vm? Both backup target PVCs are identical. The VM configuration differences (EFI firmware, machine type
pc-q35-rhel9.2.0, disk size/access mode) may influence which hotplug code path is taken.Should the backup target PVC use UtilityVolumes? The
UtilityVolumesfeature gate is enabled but the VMBackup controller doesn't seem to use it. Is this expected for this KubeVirt version?Should we create the backup target PVC differently? Does it need Block volumeMode, a pre-created
disk.img, or specific annotations?VMBackup should fail when HotplugFailed occurs. Currently it reports
Done: Truewith no data written. This makes it impossible for downstream consumers to detect the failure from the VMBackup status alone.How to reproduce
fedoraDataSource inopenshift-virtualization-os-imageswithchangedBlockTracking: "true", EFI firmware,pc-q35-rhel9.2.0machine typesnapshotMoveData: trueHotplugFailedevents:oc get events -n <vm-namespace> --field-selector reason=HotplugFailedno qcow2 files found in /backup-data