Migrate vm across clusters #4534

ustcweizhou · 2020-12-11T13:14:43Z

Description

This PR adds a global setting migrate.vm.across.clusters to indicate whether vm can be live migrated to other clusters.

If vm is running on the last host in a cluster and vm cannot be migrated to other clusters, put host to maintenance will fail.

This is based on pr #4378

This fixes #3707 #3720

Some rules
(1) for vmware, across-cluster migration of vms with cluster-scoped pools is supported
(2) for other hypervisors except vmware, vm can be live migrated to other clusters (with same hypervisor type), if all volumes are in zone-wide storage pool.
(3) migration of systemvms (CPVM, SSVM) is only possible across clusters in same pod to avoid potential network issues.

Types of changes

Breaking change (fix or feature that would cause existing functionality to change)
New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)
Enhancement (improves an existing feature and functionality)
Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

Major
Minor

Screenshots (if appropriate):

How Has This Been Tested?

DaanHoogland

thanks for taking this forward @weizhouapache . I have my usual style complaints but looks good otherwise. I asume you are already running this in your env, I wonder if the functional reqs on this in thers envs are met but that is for them to asure. (no tests done)

DaanHoogland · 2020-12-11T15:23:56Z

server/src/main/java/com/cloud/vm/UserVmManagerImpl.java

+        DeployDestination dest = null;
+        if (destinationHost == null) {
+            vm.setLastHostId(null); // Do not check last host
+            final VirtualMachineProfile profile = new VirtualMachineProfileImpl(vm);
+            final Host host = _hostDao.findById(srcHostId);
+            final DataCenterDeployment plan = new DataCenterDeployment(host.getDataCenterId(), null, null, null, null, null);
+            ExcludeList excludes = new ExcludeList();
+            excludes.addHost(srcHostId);
+            try {
+                dest = _planningMgr.planDeployment(profile, plan, excludes, null);
+            } catch (final AffinityConflictException e2) {
+                s_logger.warn("Unable to create deployment, affinity rules associted to the VM conflict", e2);
+                throw new CloudRuntimeException("Unable to create deployment, affinity rules associted to the VM conflict");
+            } catch (final InsufficientServerCapacityException e3) {
+                throw new CloudRuntimeException("Unable to find a server to migrate the vm to");
+            }
+        } else {
+            dest = checkVmMigrationDestination(vm, srcHostId, destinationHost);
+        }
+
+        // If no suitable destination found then throw exception
+        if (dest == null) {
+            throw new RuntimeException("Unable to find suitable destination to migrate VM " + vm.getInstanceName());
+        }
+
+        UserVmVO uservm = _vmDao.findById(vmId);
+        if (uservm != null) {
+            collectVmDiskStatistics(uservm);
+            collectVmNetworkStatistics(uservm);
+        }
+        _itMgr.migrate(vm.getUuid(), srcHostId, dest);
+        VMInstanceVO vmInstance = _vmInstanceDao.findById(vmId);
+        if (vmInstance.getType().equals(VirtualMachine.Type.User)) {
+            return _vmDao.findById(vmId);
+        } else {
+            return vmInstance;
+        }


I know i sound like a broken record , but I see four methods here. Can you extract these pieces of code please?

DaanHoogland · 2020-12-11T15:26:09Z

server/src/main/java/com/cloud/vm/UserVmManagerImpl.java

        // If no suitable destination found then throw exception
        if (dest == null) {
-            throw new RuntimeException("Unable to find suitable destination to migrate VM " + vm.getInstanceName());
+            throw new CloudRuntimeException("Unable to find suitable destination to migrate VM " + vm.getInstanceName());


DaanHoogland · 2020-12-11T15:36:11Z

server/src/main/java/com/cloud/resource/ResourceManagerImpl.java

+            if (hosts == null || hosts.isEmpty()) {
+                s_logger.warn("Unable to find a host for vm migration in cluster: " + host.getClusterId());
+                if (MIGRATE_VM_ACROSS_CLUSTERS.value()) {
+                    s_logger.info("Looking for hosts across different clusters in zone: " + host.getDataCenterId());
+                    hosts = listAllUpAndEnabledHosts(Host.Type.Routing, null, null, host.getDataCenterId());
+                    if (hosts == null || hosts.isEmpty()) {
+                        s_logger.warn("Unable to find a host for vm migration in zone: " + host.getDataCenterId());
+                        return false;
+                    }
+                    // Dont migrate vm if it has volumes on cluster-wide pool
+                    for (final VMInstanceVO vm : vms) {
+                        if (_vmMgr.checkIfVmHasClusterWideVolumes(vm.getId())) {
+                            s_logger.warn("Unable to migrate vm " + vm.getInstanceName() + " as it has volumes on cluster-wide pool");
+                            return false;
+                        }
+                    }
+                } else {
+                    s_logger.warn("Not migrating VM across cluster since " + MIGRATE_VM_ACROSS_CLUSTERS.key() + " is false");
+                    return false;
+                }
+            }


sugestion:
if (! clusterWideMigrationSupported(...)) return false;
and extract this code to
boolean clusterWideMigrationSupported(...)

engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java

shwstppr · 2020-12-23T12:02:21Z

@blueorangutan package

@weizhouapache is this ready for review/test?

blueorangutan · 2020-12-23T12:03:34Z

@shwstppr a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

blueorangutan · 2020-12-23T13:15:58Z

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2513

…ed by dedicated resources.

shwstppr · 2021-01-25T09:53:02Z

@ustcweizhou I guess this will need changes in new UI now
I've not tested yet but cross-cluster migration could result in failure when there are only cluster-scoped primary stores so do we handle that? In case of VMware this could fail even with vMotion enabled with current master something that has been changed in #4385

shwstppr · 2021-01-25T09:56:32Z

Based on branch name marked for 4.15.1.0

weizhouapache · 2021-01-25T13:28:38Z

@ustcweizhou I guess this will need changes in new UI now
I've not tested yet but cross-cluster migration could result in failure when there are only cluster-scoped primary stores so do we handle that? In case of VMware this could fail even with vMotion enabled with current master something that has been changed in #4385

@shwstppr it is addressed in this pr. if vm has volumes on cluster-wide storage, migrate will fail.
migraiton is possible only if
(1) all volumes are on zone-wide storage
(2) source cluster and dest cluster have same hypervisor type.

…ed by dedicated resources.

weizhouapache · 2021-01-26T14:30:08Z

@DaanHoogland
this is an improvement so I changed the destination to master.

made some changes as your comments.

shwstppr · 2021-02-17T06:43:02Z

@weizhouapache sorry but this needs fixing conflicts again.
Also, is the use-case here is allow cross-cluster migration when putting the last host in maintenance. As we added some improvements for cross-cluster cross-pod VM migration with #4385 but they were largely for VMware.
Since VMware would allow cross-cluster migration even with cluster scoped pools with vmotion do we need to mention the new global setting is not for VMware

rohityadavcloud · 2021-02-19T08:59:08Z

Re-ping @shwstppr for review
@blueorangutan package

blueorangutan · 2021-02-19T08:59:45Z

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

blueorangutan · 2021-02-19T10:26:17Z

Packaging result: ✖centos7 ✔centos8 ✖debian. JID-2749

rohityadavcloud · 2021-02-19T10:32:48Z

@blueorangutan package

blueorangutan · 2021-02-19T10:34:55Z

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

blueorangutan · 2021-02-19T10:35:50Z

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

blueorangutan · 2021-02-19T13:40:41Z

Packaging result: ✖centos7 ✔centos8 ✔debian. JID-2764

…pools is supported by vmware vmotion

…n same pod to avoid potential network errors.

…s-cluster

weizhouapache · 2021-09-06T06:48:22Z

@shwstppr thanks for review and testing !

as per your comments, I have changed this pr by 5 commits.

fix #4534: an error in 'git merge'
fix #4534: remove useless methods in FirstFitPlanner.java
fix #4534: vms are stopped in host maintenance
fix #4534: across-cluster migration of vms with cluster-scoped pools is supported by vmware vmotion
fix #4534: migrate systemvms is only possible across clusters in same pod to avoid potential network errors.

weizhouapache · 2021-09-06T06:48:30Z

@blueorangutan package

blueorangutan · 2021-09-06T06:49:03Z

@weizhouapache a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

blueorangutan · 2021-09-06T07:30:44Z

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 1136

weizhouapache · 2021-09-06T07:41:28Z

@blueorangutan test

blueorangutan · 2021-09-06T07:42:03Z

@weizhouapache a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan · 2021-09-06T08:29:08Z

Trillian Build Failed (tid-1969)

weizhouapache · 2021-09-06T09:47:30Z

@blueorangutan test

shwstppr

LGTM. Tested with KVM and VMware environments. Both having 2x clusters with one host each.
KVM env had a zone-wide pool while VMware env only had cluster-wide pools. Logs from VMware env testing below:

Clusters:

(localcloud) SBCM5> > list clusters filter=id,name,podname
{
  "cluster": [
    {
      "id": "3d79b022-0fd9-4f51-805d-ef71bef562c5",
      "name": "p1-c1",
      "podname": "Pod1"
    },
    {
      "id": "a5b2a7b8-f897-4d5f-889f-a9f1321e7c2b",
      "name": "10.0.35.234/Trillian/p1-c2",
      "podname": "Pod1"
    }
  ],
  "count": 2
}

Hosts:

(localcloud) SBCM5> > list hosts type=Routing filter=id,name,clusterid,clustername
{
  "count": 2,
  "host": [
    {
      "clusterid": "a5b2a7b8-f897-4d5f-889f-a9f1321e7c2b",
      "clustername": "10.0.35.234/Trillian/p1-c2",
      "id": "655b4526-aa78-45e0-978f-41ac2ff45bf1",
      "name": "10.0.34.155"
    },
    {
      "clusterid": "3d79b022-0fd9-4f51-805d-ef71bef562c5",
      "clustername": "p1-c1",
      "id": "4b7798ef-745c-4595-9715-be976dfbe963",
      "name": "10.0.34.154"
    }
  ]
}

Storage Pools:

(localcloud) SBCM5> > list storagepools filter=id,name,scope,clusterid,clustername,
{
  "count": 2,
  "storagepool": [
    {
      "clusterid": "a5b2a7b8-f897-4d5f-889f-a9f1321e7c2b",
      "clustername": "10.0.35.234/Trillian/p1-c2",
      "id": "a57a6026-b377-3fc4-859e-837529f7ff9c",
      "name": "ps2",
      "scope": "CLUSTER"
    },
    {
      "clusterid": "3d79b022-0fd9-4f51-805d-ef71bef562c5",
      "clustername": "p1-c1",
      "id": "4f371379-63f1-317e-9709-563c8f57983e",
      "name": "ps1",
      "scope": "CLUSTER"
    }
  ]
}

VMs:

(localcloud) SBCM5> > list virtualmachines filter=id,name,hostid,hostname,state
{
  "count": 1,
  "virtualmachine": [
    {
      "hostid": "4b7798ef-745c-4595-9715-be976dfbe963",
      "hostname": "10.0.34.154",
      "id": "7ce5dab4-ddad-4f25-85f1-8e3992ccb0a0",
      "name": "t1",
      "state": "Running"
    }
  ]
}
(localcloud) SBCM5> > list systemvms filter=id,name,hostid,hostname,state
{
  "count": 2,
  "systemvm": [
    {
      "hostid": "4b7798ef-745c-4595-9715-be976dfbe963",
      "hostname": "10.0.34.154",
      "id": "cd83b075-4705-4d4f-82b6-f6779645f408",
      "name": "v-3-VM",
      "state": "Running"
    },
    {
      "hostid": "4b7798ef-745c-4595-9715-be976dfbe963",
      "hostname": "10.0.34.154",
      "id": "474492d9-1ed8-497f-8e0d-f5389471cabe",
      "name": "s-4-VM",
      "state": "Running"
    }
  ]
}

Update config to true, enable host maintenance and check VMs: <--Successful inter-cluster live-migration of VMs

(localcloud) SBCM5> > update configuration name=migrate.vm.across.clusters value=true
{
  "configuration": {
    "category": "Advanced",
    "description": "Indicates whether the VM can be migrated to different cluster if no host is found in same cluster",
    "isdynamic": true,
    "name": "migrate.vm.across.clusters",
    "value": "true"
  }
}
(localcloud) SBCM5> > prepare hostformaintenance id=4b7798ef-745c-4595-9715-be976dfbe963
{
  "host": {
    "capabilities": "hvm",
    "clusterid": "3d79b022-0fd9-4f51-805d-ef71bef562c5",
    "clustername": "p1-c1",
    "clustertype": "ExternalManaged",
    "cpuallocated": "0%",
    "cpuallocatedpercentage": "0%",
    "cpuallocatedvalue": 0,
    "cpuallocatedwithoverprovisioning": "0%",
    "cpuloadaverage": 0,
    "cpunumber": 6,
    "cpusockets": 3,
    "cpuspeed": 2100,
    "cpuused": "8.19%",
    "cpuwithoverprovisioning": "12600",
    "created": "2021-09-02T10:45:01+0000",
    "events": "ManagementServerDown; StartAgentRebalance; ShutdownRequested; AgentDisconnected; AgentConnected; Ping; HostDown; PingTimeout; Remove",
    "hahost": false,
    "hostha": {
      "haenable": false,
      "hastate": "Disabled"
    },
    "hypervisor": "VMware",
    "hypervisorversion": "6.7.3",
    "id": "4b7798ef-745c-4595-9715-be976dfbe963",
    "ipaddress": "10.0.34.154",
    "islocalstorageactive": false,
    "jobid": "95229fb2-bc0c-4496-8318-acd8606a82e9",
    "jobstatus": 0,
    "lastpinged": "1970-01-19T10:24:59+0000",
    "managementserverid": "736cf45b-5a00-4f6b-9287-08f654c73792",
    "memoryallocated": 0,
    "memoryallocatedbytes": 0,
    "memoryallocatedpercentage": "0%",
    "memorytotal": 8585134080,
    "memoryused": 4244029440,
    "memorywithoverprovisioning": "8585134080",
    "name": "10.0.34.154",
    "networkkbsread": 0,
    "networkkbswrite": 0,
    "outofbandmanagement": {
      "enabled": false,
      "powerstate": "Disabled"
    },
    "podid": "c9e2eccc-9c94-4a9e-a971-fa15a0bf59c2",
    "podname": "Pod1",
    "resourcestate": "PrepareForMaintenance",
    "state": "Up",
    "type": "Routing",
    "version": "4.16.0.0-SNAPSHOT",
    "zoneid": "1991b455-cebf-4507-88c4-8c8a467971c3",
    "zonename": "pr4774-t1933-vmware-67u3"
  }
}
(localcloud) SBCM5> > list virtualmachines filter=id,name,hostid,hostname,state
{
  "count": 1,
  "virtualmachine": [
    {
      "hostid": "655b4526-aa78-45e0-978f-41ac2ff45bf1",
      "hostname": "10.0.34.155",
      "id": "7ce5dab4-ddad-4f25-85f1-8e3992ccb0a0",
      "name": "t1",
      "state": "Running"
    }
  ]
}
(localcloud) SBCM5> > list systemvms filter=id,name,hostid,hostname,state
{
  "count": 2,
  "systemvm": [
    {
      "hostid": "655b4526-aa78-45e0-978f-41ac2ff45bf1",
      "hostname": "10.0.34.155",
      "id": "cd83b075-4705-4d4f-82b6-f6779645f408",
      "name": "v-3-VM",
      "state": "Running"
    },
    {
      "hostid": "655b4526-aa78-45e0-978f-41ac2ff45bf1",
      "hostname": "10.0.34.155",
      "id": "474492d9-1ed8-497f-8e0d-f5389471cabe",
      "name": "s-4-VM",
      "state": "Running"
    }
  ]
}

Cancel host maintenance on first host:

(localcloud) SBCM5> > cancel hostmaintenance id=4b7798ef-745c-4595-9715-be976dfbe963
{
  "host": {
    "capabilities": "hvm",
    "clusterid": "3d79b022-0fd9-4f51-805d-ef71bef562c5",
    "clustername": "p1-c1",
    "clustertype": "ExternalManaged",
    "cpuallocated": "0%",
    "cpuallocatedpercentage": "0%",
    "cpuallocatedvalue": 0,
    "cpuallocatedwithoverprovisioning": "0%",
    "cpuloadaverage": 0,
    "cpunumber": 6,
    "cpusockets": 3,
    "cpuspeed": 2100,
    "cpuused": "8.19%",
    "cpuwithoverprovisioning": "12600",
    "created": "2021-09-02T10:45:01+0000",
    "events": "ManagementServerDown; StartAgentRebalance; ShutdownRequested; AgentDisconnected; AgentConnected; Ping; HostDown; PingTimeout; Remove",
    "hahost": false,
    "hostha": {
      "haenable": false,
      "hastate": "Disabled"
    },
    "hypervisor": "VMware",
    "hypervisorversion": "6.7.3",
    "id": "4b7798ef-745c-4595-9715-be976dfbe963",
    "ipaddress": "10.0.34.154",
    "islocalstorageactive": false,
    "jobid": "2770a000-ae8b-4daa-b342-3160e9406da5",
    "jobstatus": 0,
    "lastpinged": "1970-01-19T10:24:59+0000",
    "managementserverid": "736cf45b-5a00-4f6b-9287-08f654c73792",
    "memoryallocated": 0,
    "memoryallocatedbytes": 0,
    "memoryallocatedpercentage": "0%",
    "memorytotal": 8585134080,
    "memoryused": 4244029440,
    "memorywithoverprovisioning": "8585134080",
    "name": "10.0.34.154",
    "networkkbsread": 0,
    "networkkbswrite": 0,
    "outofbandmanagement": {
      "enabled": false,
      "powerstate": "Disabled"
    },
    "podid": "c9e2eccc-9c94-4a9e-a971-fa15a0bf59c2",
    "podname": "Pod1",
    "resourcestate": "Enabled",
    "state": "Up",
    "type": "Routing",
    "version": "4.16.0.0-SNAPSHOT",
    "zoneid": "1991b455-cebf-4507-88c4-8c8a467971c3",
    "zonename": "pr4774-t1933-vmware-67u3"
  }
}

Update config to false, enable host maintenance on 2nd host: <-- Fails as config is false and not host available within cluster

(localcloud) SBCM5> > update configuration name=migrate.vm.across.clusters value=false
{
  "configuration": {
    "category": "Advanced",
    "description": "Indicates whether the VM can be migrated to different cluster if no host is found in same cluster",
    "isdynamic": true,
    "name": "migrate.vm.across.clusters",
    "value": "false"
  }
}
🙈 Error: async API failed for job a2ef5b55-35e3-48d8-a00c-59820f544eea
(localcloud) SBCM5> > prepare hostformaintenance id=655b4526-aa78-45e0-978f-41ac2ff45bf1
{
  "accountid": "202665fc-0bda-11ec-a29c-1e0094000118",
  "cmd": "org.apache.cloudstack.api.command.admin.host.PrepareForMaintenanceCmd",
  "completed": "2021-09-06T10:50:22+0000",
  "created": "2021-09-06T10:50:22+0000",
  "jobid": "93746f97-c532-42a1-a196-9f4458c02697",
  "jobinstanceid": "655b4526-aa78-45e0-978f-41ac2ff45bf1",
  "jobinstancetype": "Host",
  "jobprocstatus": 0,
  "jobresult": {
    "errorcode": 530,
    "errortext": "Failed to prepare host for maintenance due to: Unable to prepare for maintenance host 5"
  },
  "jobresultcode": 530,
  "jobresulttype": "object",
  "jobstatus": 2,
  "userid": "2027b571-0bda-11ec-a29c-1e0094000118"
}

Logs:

2021-09-06 10:50:22,675 DEBUG [c.c.a.ApiServlet] (qtp1233705144-21:ctx-0ed33ef6) (logid:23dd2023) ===START===  10.0.32.133 -- GET  apiKey=LIN6rqXuaJwMPfGYFh13qDwYz5VNNz1J2J6qIOWcd3oLQOq0WtD4CwRundBL6rzXToa3lQOC_vKjI3nkHtiD8Q&command=queryAsyncJobResult&jobid=93746f97-c532-42a1-a196-9f4458c02697&response=json&signature=XFJ0JF6QvW8ZK%2FfFtvr15m7fQiE%3D
2021-09-06 10:50:22,677 DEBUG [c.c.a.ApiServer] (qtp1233705144-21:ctx-0ed33ef6 ctx-5d99131c) (logid:23dd2023) CIDRs from which account 'Acct[202665fc-0bda-11ec-a29c-1e0094000118-admin] -- Account {"id": 2, "name": "admin", "uuid": "202665fc-0bda-11ec-a29c-1e0094000118"}' is allowed to perform API calls: 0.0.0.0/0,::/0
2021-09-06 10:50:22,683 INFO  [c.c.r.ResourceManagerImpl] (API-Job-Executor-6:ctx-6ea2ffbd job-67 ctx-c79733b1) (logid:93746f97) Maintenance: attempting maintenance of host 655b4526-aa78-45e0-978f-41ac2ff45bf1
2021-09-06 10:50:22,684 DEBUG [c.c.a.t.Request] (API-Job-Executor-6:ctx-6ea2ffbd job-67 ctx-c79733b1) (logid:93746f97) Seq 5-5937433158734577709: Sending  { Cmd , MgmtId: 32987831861528, via: 5(10.0.34.155), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.MaintainCommand":{"wait":"0","bypassHostMaintenance":"false"}}] }
2021-09-06 10:50:22,685 DEBUG [c.c.a.t.Request] (API-Job-Executor-6:ctx-6ea2ffbd job-67 ctx-c79733b1) (logid:93746f97) Seq 5-5937433158734577709: Executing:  { Cmd , MgmtId: 32987831861528, via: 5(10.0.34.155), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.MaintainCommand":{"wait":"0","bypassHostMaintenance":"false"}}] }
2021-09-06 10:50:22,685 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-37:ctx-97d304bf) (logid:88268016) Seq 5-5937433158734577709: Executing request
2021-09-06 10:50:22,685 INFO  [c.c.h.v.r.VmwareResource] (DirectAgent-37:ctx-97d304bf 10.0.34.155, job-67, cmd: MaintainCommand) (logid:93746f97) Executing resource MaintainCommand: {"wait":0,"bypassHostMaintenance":false}
2021-09-06 10:50:22,685 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-37:ctx-97d304bf) (logid:93746f97) Seq 5-5937433158734577709: Response Received: 
2021-09-06 10:50:22,686 DEBUG [c.c.a.t.Request] (DirectAgent-37:ctx-97d304bf) (logid:93746f97) Seq 5-5937433158734577709: Processing:  { Ans: , MgmtId: 32987831861528, via: 5(10.0.34.155), Ver: v1, Flags: 110, [{"com.cloud.agent.api.MaintainAnswer":{"willMigrate":"true","result":"true","details":"Put host in maintaince","wait":"0","bypassHostMaintenance":"false"}}] }
2021-09-06 10:50:22,686 DEBUG [c.c.a.m.AgentAttache] (DirectAgent-37:ctx-97d304bf) (logid:93746f97) Seq 5-5937433158734577709: No more commands found
2021-09-06 10:50:22,686 DEBUG [c.c.a.t.Request] (API-Job-Executor-6:ctx-6ea2ffbd job-67 ctx-c79733b1) (logid:93746f97) Seq 5-5937433158734577709: Received:  { Ans: , MgmtId: 32987831861528, via: 5(10.0.34.155), Ver: v1, Flags: 110, { MaintainAnswer } }
2021-09-06 10:50:22,686 DEBUG [c.c.a.m.AgentManagerImpl] (API-Job-Executor-6:ctx-6ea2ffbd job-67 ctx-c79733b1) (logid:93746f97) Details from executing class com.cloud.agent.api.MaintainCommand: Put host in maintaince
2021-09-06 10:50:22,690 DEBUG [c.c.r.ResourceState] (API-Job-Executor-6:ctx-6ea2ffbd job-67 ctx-c79733b1) (logid:93746f97) Resource state update: [id = 5; name = 10.0.34.155; old state = Enabled; event = AdminAskMaintenance; new state = PrepareForMaintenance]
2021-09-06 10:50:22,692 DEBUG [c.c.a.ApiServlet] (qtp1233705144-21:ctx-0ed33ef6 ctx-5d99131c ctx-d6e3d0a9) (logid:23dd2023) ===END===  10.0.32.133 -- GET  apiKey=LIN6rqXuaJwMPfGYFh13qDwYz5VNNz1J2J6qIOWcd3oLQOq0WtD4CwRundBL6rzXToa3lQOC_vKjI3nkHtiD8Q&command=queryAsyncJobResult&jobid=93746f97-c532-42a1-a196-9f4458c02697&response=json&signature=XFJ0JF6QvW8ZK%2FfFtvr15m7fQiE%3D
2021-09-06 10:50:22,700 WARN  [c.c.r.ResourceManagerImpl] (API-Job-Executor-6:ctx-6ea2ffbd job-67 ctx-c79733b1) (logid:93746f97) Unable to find a host for vm migration in cluster: 6
2021-09-06 10:50:22,701 WARN  [c.c.r.ResourceManagerImpl] (API-Job-Executor-6:ctx-6ea2ffbd job-67 ctx-c79733b1) (logid:93746f97) VMs cannot be migrated across cluster since migrate.vm.across.clusters is false for zone ID: 1
2021-09-06 10:50:22,702 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-6:ctx-6ea2ffbd job-67) (logid:93746f97) Complete async job-67, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"530","errortext":"Failed to prepare host for maintenance due to: Unable to prepare for maintenance host 5"}

shwstppr · 2021-09-06T10:55:37Z

server/src/main/java/com/cloud/resource/ResourceManagerImpl.java

+            for (final VMInstanceVO vm : vms) {
+                if (! HypervisorType.VMware.equals(host.getHypervisorType()) && _vmMgr.checkIfVmHasClusterWideVolumes(vm.getId())) {
+                    s_logger.warn(String.format("VM %s cannot be migrated across cluster as it has volumes on cluster-wide pool", vm));
+                    return false;
+                }
+            }


Minor nit: this block can be moved to start of if to fail early

@shwstppr
moved the hypervisor type check.

I have considered the order of checks. in my opinion, the checks of vm volumes have more db queries than listAllUpAndEnabledHosts so I put the check prior to volume pool check.

weizhouapache · 2021-09-06T13:02:25Z

@shwstppr great, thanks for testing !

weizhouapache · 2021-09-07T10:21:15Z

@blueorangutan package

blueorangutan · 2021-09-07T10:22:03Z

@weizhouapache a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

blueorangutan · 2021-09-07T11:03:44Z

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 1148

nvazquez · 2021-09-07T14:11:13Z

@blueorangutan test

blueorangutan · 2021-09-07T14:12:02Z

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan · 2021-09-08T00:06:33Z

Trillian test result (tid-1981)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 33813 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4534-t1981-kvm-centos7.zip
Smoke tests completed. 89 look OK, 0 have errors
Only failed tests results shown below:

Test	Result	Time (s)	Test File

nvazquez

LGTM

DaanHoogland approved these changes Dec 11, 2020

View reviewed changes

rohityadavcloud requested a review from shwstppr December 12, 2020 07:24

ustcweizhou pushed a commit to ustcweizhou/cloudstack that referenced this pull request Jan 15, 2021

apache#4534 Fix Vms are migrated to same clusters in CloudStack caus…

d4773f5

…ed by dedicated resources.

shwstppr added this to the 4.15.1.0 milestone Jan 25, 2021

DaanHoogland changed the base branch from master to 4.15 January 26, 2021 08:21

ustcweizhou pushed a commit to ustcweizhou/cloudstack that referenced this pull request Jan 26, 2021

apache#4534 Fix Vms are migrated to same clusters in CloudStack caus…

2bfeed0

…ed by dedicated resources.

ustcweizhou force-pushed the 4.15-migrate-vm-across-cluster branch from d4773f5 to 2bfeed0 Compare January 26, 2021 13:37

weizhouapache changed the base branch from 4.15 to master January 26, 2021 13:58

ustcweizhou added a commit to ustcweizhou/cloudstack that referenced this pull request Jan 26, 2021

apache#4534 extract some codes to methods

ce44bc7

weizhouapache modified the milestones: 4.15.1.0, 4.16.0.0 Feb 4, 2021

weizhouapache marked this pull request as ready for review February 4, 2021 13:08

rohityadavcloud assigned shwstppr Feb 12, 2021

ustcweizhou force-pushed the 4.15-migrate-vm-across-cluster branch from ce44bc7 to a54284c Compare February 18, 2021 22:28

weizhouapache closed this Feb 22, 2021

weizhouapache added 5 commits September 6, 2021 08:32

fix apache#4534: remove useless methods in FirstFitPlanner.java

ca3d0cc

fix apache#4534: vms are stopped in host maintenance

a31e24e

fix apache#4534: across-cluster migration of vms with cluster-scoped …

beeee5e

…pools is supported by vmware vmotion

fix apache#4534: migrate systemvms is only possible across clusters i…

75a8b0d

…n same pod to avoid potential network errors.

Merge remote-tracking branch 'apache/main' into 4.15-migrate-vm-acros…

708bd73

…s-cluster

weizhouapache requested a review from shwstppr September 6, 2021 09:47

shwstppr approved these changes Sep 6, 2021

View reviewed changes

fix apache#4534: code optimization

13086b1

weizhouapache closed this Sep 7, 2021

weizhouapache reopened this Sep 7, 2021

nvazquez approved these changes Sep 8, 2021

View reviewed changes

nvazquez merged commit a755ecf into apache:main Sep 8, 2021

nvazquez added component:migration type:enhancement labels Oct 14, 2021

rohityadavcloud mentioned this pull request Aug 17, 2022

VM stopped while Migration with Maintenance Mode (KVM) #3720

Closed

Migrate vm across clusters #4534

Migrate vm across clusters #4534

Uh oh!

Conversation

ustcweizhou commented Dec 11, 2020 • edited by weizhouapache Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Types of changes

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

Screenshots (if appropriate):

How Has This Been Tested?

Uh oh!

DaanHoogland left a comment

Choose a reason for hiding this comment

Uh oh!

DaanHoogland Dec 11, 2020

Choose a reason for hiding this comment

Uh oh!

DaanHoogland Dec 11, 2020

Choose a reason for hiding this comment

Uh oh!

DaanHoogland Dec 11, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shwstppr commented Dec 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blueorangutan commented Dec 23, 2020

Uh oh!

blueorangutan commented Dec 23, 2020

Uh oh!

shwstppr commented Jan 25, 2021

Uh oh!

shwstppr commented Jan 25, 2021

Uh oh!

weizhouapache commented Jan 25, 2021

Uh oh!

weizhouapache commented Jan 26, 2021

Uh oh!

shwstppr commented Feb 17, 2021

Uh oh!

rohityadavcloud commented Feb 19, 2021

Uh oh!

blueorangutan commented Feb 19, 2021

Uh oh!

blueorangutan commented Feb 19, 2021

Uh oh!

rohityadavcloud commented Feb 19, 2021

Uh oh!

blueorangutan commented Feb 19, 2021

Uh oh!

blueorangutan commented Feb 19, 2021

Uh oh!

blueorangutan commented Feb 19, 2021

Uh oh!

weizhouapache commented Sep 6, 2021

Uh oh!

weizhouapache commented Sep 6, 2021

Uh oh!

blueorangutan commented Sep 6, 2021

Uh oh!

blueorangutan commented Sep 6, 2021

Uh oh!

weizhouapache commented Sep 6, 2021

Uh oh!

blueorangutan commented Sep 6, 2021

Uh oh!

blueorangutan commented Sep 6, 2021

Uh oh!

weizhouapache commented Sep 6, 2021

Uh oh!

shwstppr left a comment

Choose a reason for hiding this comment

Uh oh!

shwstppr Sep 6, 2021

Choose a reason for hiding this comment

Uh oh!

weizhouapache Sep 6, 2021

ustcweizhou commented Dec 11, 2020 •

edited by weizhouapache

Loading

shwstppr commented Dec 23, 2020 •

edited

Loading