Skip to content

Conversation

@ustcweizhou
Copy link
Contributor

@ustcweizhou ustcweizhou commented Dec 11, 2020

Description

This PR adds a global setting migrate.vm.across.clusters to indicate whether vm can be live migrated to other clusters.

If vm is running on the last host in a cluster and vm cannot be migrated to other clusters, put host to maintenance will fail.

This is based on pr #4378

This fixes #3707 #3720

Some rules
(1) for vmware, across-cluster migration of vms with cluster-scoped pools is supported
(2) for other hypervisors except vmware, vm can be live migrated to other clusters (with same hypervisor type), if all volumes are in zone-wide storage pool.
(3) migration of systemvms (CPVM, SSVM) is only possible across clusters in same pod to avoid potential network issues.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Screenshots (if appropriate):

How Has This Been Tested?

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for taking this forward @weizhouapache . I have my usual style complaints but looks good otherwise. I asume you are already running this in your env, I wonder if the functional reqs on this in thers envs are met but that is for them to asure. (no tests done)

Comment on lines 5793 to 5937
DeployDestination dest = null;
if (destinationHost == null) {
vm.setLastHostId(null); // Do not check last host
final VirtualMachineProfile profile = new VirtualMachineProfileImpl(vm);
final Host host = _hostDao.findById(srcHostId);
final DataCenterDeployment plan = new DataCenterDeployment(host.getDataCenterId(), null, null, null, null, null);
ExcludeList excludes = new ExcludeList();
excludes.addHost(srcHostId);
try {
dest = _planningMgr.planDeployment(profile, plan, excludes, null);
} catch (final AffinityConflictException e2) {
s_logger.warn("Unable to create deployment, affinity rules associted to the VM conflict", e2);
throw new CloudRuntimeException("Unable to create deployment, affinity rules associted to the VM conflict");
} catch (final InsufficientServerCapacityException e3) {
throw new CloudRuntimeException("Unable to find a server to migrate the vm to");
}
} else {
dest = checkVmMigrationDestination(vm, srcHostId, destinationHost);
}

// If no suitable destination found then throw exception
if (dest == null) {
throw new RuntimeException("Unable to find suitable destination to migrate VM " + vm.getInstanceName());
}

UserVmVO uservm = _vmDao.findById(vmId);
if (uservm != null) {
collectVmDiskStatistics(uservm);
collectVmNetworkStatistics(uservm);
}
_itMgr.migrate(vm.getUuid(), srcHostId, dest);
VMInstanceVO vmInstance = _vmInstanceDao.findById(vmId);
if (vmInstance.getType().equals(VirtualMachine.Type.User)) {
return _vmDao.findById(vmId);
} else {
return vmInstance;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know i sound like a broken record , but I see four methods here. Can you extract these pieces of code please?

// If no suitable destination found then throw exception
if (dest == null) {
throw new RuntimeException("Unable to find suitable destination to migrate VM " + vm.getInstanceName());
throw new CloudRuntimeException("Unable to find suitable destination to migrate VM " + vm.getInstanceName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines 1267 to 1272
if (hosts == null || hosts.isEmpty()) {
s_logger.warn("Unable to find a host for vm migration in cluster: " + host.getClusterId());
if (MIGRATE_VM_ACROSS_CLUSTERS.value()) {
s_logger.info("Looking for hosts across different clusters in zone: " + host.getDataCenterId());
hosts = listAllUpAndEnabledHosts(Host.Type.Routing, null, null, host.getDataCenterId());
if (hosts == null || hosts.isEmpty()) {
s_logger.warn("Unable to find a host for vm migration in zone: " + host.getDataCenterId());
return false;
}
// Dont migrate vm if it has volumes on cluster-wide pool
for (final VMInstanceVO vm : vms) {
if (_vmMgr.checkIfVmHasClusterWideVolumes(vm.getId())) {
s_logger.warn("Unable to migrate vm " + vm.getInstanceName() + " as it has volumes on cluster-wide pool");
return false;
}
}
} else {
s_logger.warn("Not migrating VM across cluster since " + MIGRATE_VM_ACROSS_CLUSTERS.key() + " is false");
return false;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sugestion:
if (! clusterWideMigrationSupported(...)) return false;
and extract this code to
boolean clusterWideMigrationSupported(...)

@shwstppr
Copy link
Contributor

shwstppr commented Dec 23, 2020

@blueorangutan package

@weizhouapache is this ready for review/test?

@blueorangutan
Copy link

@shwstppr a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2513

ustcweizhou pushed a commit to ustcweizhou/cloudstack that referenced this pull request Jan 15, 2021
@shwstppr
Copy link
Contributor

@ustcweizhou I guess this will need changes in new UI now
I've not tested yet but cross-cluster migration could result in failure when there are only cluster-scoped primary stores so do we handle that? In case of VMware this could fail even with vMotion enabled with current master something that has been changed in #4385

@shwstppr
Copy link
Contributor

Based on branch name marked for 4.15.1.0

@shwstppr shwstppr added this to the 4.15.1.0 milestone Jan 25, 2021
@weizhouapache
Copy link
Member

@ustcweizhou I guess this will need changes in new UI now
I've not tested yet but cross-cluster migration could result in failure when there are only cluster-scoped primary stores so do we handle that? In case of VMware this could fail even with vMotion enabled with current master something that has been changed in #4385

@shwstppr it is addressed in this pr. if vm has volumes on cluster-wide storage, migrate will fail.
migraiton is possible only if
(1) all volumes are on zone-wide storage
(2) source cluster and dest cluster have same hypervisor type.

@DaanHoogland DaanHoogland changed the base branch from master to 4.15 January 26, 2021 08:21
ustcweizhou pushed a commit to ustcweizhou/cloudstack that referenced this pull request Jan 26, 2021
@ustcweizhou ustcweizhou force-pushed the 4.15-migrate-vm-across-cluster branch from d4773f5 to 2bfeed0 Compare January 26, 2021 13:37
@weizhouapache weizhouapache changed the base branch from 4.15 to master January 26, 2021 13:58
ustcweizhou added a commit to ustcweizhou/cloudstack that referenced this pull request Jan 26, 2021
@weizhouapache
Copy link
Member

@DaanHoogland
this is an improvement so I changed the destination to master.

made some changes as your comments.

@weizhouapache weizhouapache modified the milestones: 4.15.1.0, 4.16.0.0 Feb 4, 2021
@weizhouapache weizhouapache marked this pull request as ready for review February 4, 2021 13:08
@shwstppr
Copy link
Contributor

@weizhouapache sorry but this needs fixing conflicts again.
Also, is the use-case here is allow cross-cluster migration when putting the last host in maintenance. As we added some improvements for cross-cluster cross-pod VM migration with #4385 but they were largely for VMware.
Since VMware would allow cross-cluster migration even with cluster scoped pools with vmotion do we need to mention the new global setting is not for VMware

@ustcweizhou ustcweizhou force-pushed the 4.15-migrate-vm-across-cluster branch from ce44bc7 to a54284c Compare February 18, 2021 22:28
@rohityadavcloud
Copy link
Member

Re-ping @shwstppr for review
@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✖centos7 ✔centos8 ✖debian. JID-2749

@rohityadavcloud
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

1 similar comment
@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✖centos7 ✔centos8 ✔debian. JID-2764

@weizhouapache
Copy link
Member

@shwstppr thanks for review and testing !

as per your comments, I have changed this pr by 5 commits.

fix #4534: an error in 'git merge'
fix #4534: remove useless methods in FirstFitPlanner.java
fix #4534: vms are stopped in host maintenance
fix #4534: across-cluster migration of vms with cluster-scoped pools is supported by vmware vmotion
fix #4534: migrate systemvms is only possible across clusters in same pod to avoid potential network errors.

@weizhouapache
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@weizhouapache a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 1136

@weizhouapache
Copy link
Member

@blueorangutan test

@blueorangutan
Copy link

@weizhouapache a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian Build Failed (tid-1969)

@weizhouapache
Copy link
Member

@blueorangutan test

Copy link
Contributor

@shwstppr shwstppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Tested with KVM and VMware environments. Both having 2x clusters with one host each.
KVM env had a zone-wide pool while VMware env only had cluster-wide pools. Logs from VMware env testing below:

Clusters:

(localcloud) SBCM5> > list clusters filter=id,name,podname
{
  "cluster": [
    {
      "id": "3d79b022-0fd9-4f51-805d-ef71bef562c5",
      "name": "p1-c1",
      "podname": "Pod1"
    },
    {
      "id": "a5b2a7b8-f897-4d5f-889f-a9f1321e7c2b",
      "name": "10.0.35.234/Trillian/p1-c2",
      "podname": "Pod1"
    }
  ],
  "count": 2
}

Hosts:

(localcloud) SBCM5> > list hosts type=Routing filter=id,name,clusterid,clustername
{
  "count": 2,
  "host": [
    {
      "clusterid": "a5b2a7b8-f897-4d5f-889f-a9f1321e7c2b",
      "clustername": "10.0.35.234/Trillian/p1-c2",
      "id": "655b4526-aa78-45e0-978f-41ac2ff45bf1",
      "name": "10.0.34.155"
    },
    {
      "clusterid": "3d79b022-0fd9-4f51-805d-ef71bef562c5",
      "clustername": "p1-c1",
      "id": "4b7798ef-745c-4595-9715-be976dfbe963",
      "name": "10.0.34.154"
    }
  ]
}

Storage Pools:

(localcloud) SBCM5> > list storagepools filter=id,name,scope,clusterid,clustername,
{
  "count": 2,
  "storagepool": [
    {
      "clusterid": "a5b2a7b8-f897-4d5f-889f-a9f1321e7c2b",
      "clustername": "10.0.35.234/Trillian/p1-c2",
      "id": "a57a6026-b377-3fc4-859e-837529f7ff9c",
      "name": "ps2",
      "scope": "CLUSTER"
    },
    {
      "clusterid": "3d79b022-0fd9-4f51-805d-ef71bef562c5",
      "clustername": "p1-c1",
      "id": "4f371379-63f1-317e-9709-563c8f57983e",
      "name": "ps1",
      "scope": "CLUSTER"
    }
  ]
}

VMs:

(localcloud) SBCM5> > list virtualmachines filter=id,name,hostid,hostname,state
{
  "count": 1,
  "virtualmachine": [
    {
      "hostid": "4b7798ef-745c-4595-9715-be976dfbe963",
      "hostname": "10.0.34.154",
      "id": "7ce5dab4-ddad-4f25-85f1-8e3992ccb0a0",
      "name": "t1",
      "state": "Running"
    }
  ]
}
(localcloud) SBCM5> > list systemvms filter=id,name,hostid,hostname,state
{
  "count": 2,
  "systemvm": [
    {
      "hostid": "4b7798ef-745c-4595-9715-be976dfbe963",
      "hostname": "10.0.34.154",
      "id": "cd83b075-4705-4d4f-82b6-f6779645f408",
      "name": "v-3-VM",
      "state": "Running"
    },
    {
      "hostid": "4b7798ef-745c-4595-9715-be976dfbe963",
      "hostname": "10.0.34.154",
      "id": "474492d9-1ed8-497f-8e0d-f5389471cabe",
      "name": "s-4-VM",
      "state": "Running"
    }
  ]
}

Update config to true, enable host maintenance and check VMs: <--Successful inter-cluster live-migration of VMs

(localcloud) SBCM5> > update configuration name=migrate.vm.across.clusters value=true
{
  "configuration": {
    "category": "Advanced",
    "description": "Indicates whether the VM can be migrated to different cluster if no host is found in same cluster",
    "isdynamic": true,
    "name": "migrate.vm.across.clusters",
    "value": "true"
  }
}
(localcloud) SBCM5> > prepare hostformaintenance id=4b7798ef-745c-4595-9715-be976dfbe963
{
  "host": {
    "capabilities": "hvm",
    "clusterid": "3d79b022-0fd9-4f51-805d-ef71bef562c5",
    "clustername": "p1-c1",
    "clustertype": "ExternalManaged",
    "cpuallocated": "0%",
    "cpuallocatedpercentage": "0%",
    "cpuallocatedvalue": 0,
    "cpuallocatedwithoverprovisioning": "0%",
    "cpuloadaverage": 0,
    "cpunumber": 6,
    "cpusockets": 3,
    "cpuspeed": 2100,
    "cpuused": "8.19%",
    "cpuwithoverprovisioning": "12600",
    "created": "2021-09-02T10:45:01+0000",
    "events": "ManagementServerDown; StartAgentRebalance; ShutdownRequested; AgentDisconnected; AgentConnected; Ping; HostDown; PingTimeout; Remove",
    "hahost": false,
    "hostha": {
      "haenable": false,
      "hastate": "Disabled"
    },
    "hypervisor": "VMware",
    "hypervisorversion": "6.7.3",
    "id": "4b7798ef-745c-4595-9715-be976dfbe963",
    "ipaddress": "10.0.34.154",
    "islocalstorageactive": false,
    "jobid": "95229fb2-bc0c-4496-8318-acd8606a82e9",
    "jobstatus": 0,
    "lastpinged": "1970-01-19T10:24:59+0000",
    "managementserverid": "736cf45b-5a00-4f6b-9287-08f654c73792",
    "memoryallocated": 0,
    "memoryallocatedbytes": 0,
    "memoryallocatedpercentage": "0%",
    "memorytotal": 8585134080,
    "memoryused": 4244029440,
    "memorywithoverprovisioning": "8585134080",
    "name": "10.0.34.154",
    "networkkbsread": 0,
    "networkkbswrite": 0,
    "outofbandmanagement": {
      "enabled": false,
      "powerstate": "Disabled"
    },
    "podid": "c9e2eccc-9c94-4a9e-a971-fa15a0bf59c2",
    "podname": "Pod1",
    "resourcestate": "PrepareForMaintenance",
    "state": "Up",
    "type": "Routing",
    "version": "4.16.0.0-SNAPSHOT",
    "zoneid": "1991b455-cebf-4507-88c4-8c8a467971c3",
    "zonename": "pr4774-t1933-vmware-67u3"
  }
}
(localcloud) SBCM5> > list virtualmachines filter=id,name,hostid,hostname,state
{
  "count": 1,
  "virtualmachine": [
    {
      "hostid": "655b4526-aa78-45e0-978f-41ac2ff45bf1",
      "hostname": "10.0.34.155",
      "id": "7ce5dab4-ddad-4f25-85f1-8e3992ccb0a0",
      "name": "t1",
      "state": "Running"
    }
  ]
}
(localcloud) SBCM5> > list systemvms filter=id,name,hostid,hostname,state
{
  "count": 2,
  "systemvm": [
    {
      "hostid": "655b4526-aa78-45e0-978f-41ac2ff45bf1",
      "hostname": "10.0.34.155",
      "id": "cd83b075-4705-4d4f-82b6-f6779645f408",
      "name": "v-3-VM",
      "state": "Running"
    },
    {
      "hostid": "655b4526-aa78-45e0-978f-41ac2ff45bf1",
      "hostname": "10.0.34.155",
      "id": "474492d9-1ed8-497f-8e0d-f5389471cabe",
      "name": "s-4-VM",
      "state": "Running"
    }
  ]
}

Cancel host maintenance on first host:

(localcloud) SBCM5> > cancel hostmaintenance id=4b7798ef-745c-4595-9715-be976dfbe963
{
  "host": {
    "capabilities": "hvm",
    "clusterid": "3d79b022-0fd9-4f51-805d-ef71bef562c5",
    "clustername": "p1-c1",
    "clustertype": "ExternalManaged",
    "cpuallocated": "0%",
    "cpuallocatedpercentage": "0%",
    "cpuallocatedvalue": 0,
    "cpuallocatedwithoverprovisioning": "0%",
    "cpuloadaverage": 0,
    "cpunumber": 6,
    "cpusockets": 3,
    "cpuspeed": 2100,
    "cpuused": "8.19%",
    "cpuwithoverprovisioning": "12600",
    "created": "2021-09-02T10:45:01+0000",
    "events": "ManagementServerDown; StartAgentRebalance; ShutdownRequested; AgentDisconnected; AgentConnected; Ping; HostDown; PingTimeout; Remove",
    "hahost": false,
    "hostha": {
      "haenable": false,
      "hastate": "Disabled"
    },
    "hypervisor": "VMware",
    "hypervisorversion": "6.7.3",
    "id": "4b7798ef-745c-4595-9715-be976dfbe963",
    "ipaddress": "10.0.34.154",
    "islocalstorageactive": false,
    "jobid": "2770a000-ae8b-4daa-b342-3160e9406da5",
    "jobstatus": 0,
    "lastpinged": "1970-01-19T10:24:59+0000",
    "managementserverid": "736cf45b-5a00-4f6b-9287-08f654c73792",
    "memoryallocated": 0,
    "memoryallocatedbytes": 0,
    "memoryallocatedpercentage": "0%",
    "memorytotal": 8585134080,
    "memoryused": 4244029440,
    "memorywithoverprovisioning": "8585134080",
    "name": "10.0.34.154",
    "networkkbsread": 0,
    "networkkbswrite": 0,
    "outofbandmanagement": {
      "enabled": false,
      "powerstate": "Disabled"
    },
    "podid": "c9e2eccc-9c94-4a9e-a971-fa15a0bf59c2",
    "podname": "Pod1",
    "resourcestate": "Enabled",
    "state": "Up",
    "type": "Routing",
    "version": "4.16.0.0-SNAPSHOT",
    "zoneid": "1991b455-cebf-4507-88c4-8c8a467971c3",
    "zonename": "pr4774-t1933-vmware-67u3"
  }
}

Update config to false, enable host maintenance on 2nd host: <-- Fails as config is false and not host available within cluster

(localcloud) SBCM5> > update configuration name=migrate.vm.across.clusters value=false
{
  "configuration": {
    "category": "Advanced",
    "description": "Indicates whether the VM can be migrated to different cluster if no host is found in same cluster",
    "isdynamic": true,
    "name": "migrate.vm.across.clusters",
    "value": "false"
  }
}
🙈 Error: async API failed for job a2ef5b55-35e3-48d8-a00c-59820f544eea
(localcloud) SBCM5> > prepare hostformaintenance id=655b4526-aa78-45e0-978f-41ac2ff45bf1
{
  "accountid": "202665fc-0bda-11ec-a29c-1e0094000118",
  "cmd": "org.apache.cloudstack.api.command.admin.host.PrepareForMaintenanceCmd",
  "completed": "2021-09-06T10:50:22+0000",
  "created": "2021-09-06T10:50:22+0000",
  "jobid": "93746f97-c532-42a1-a196-9f4458c02697",
  "jobinstanceid": "655b4526-aa78-45e0-978f-41ac2ff45bf1",
  "jobinstancetype": "Host",
  "jobprocstatus": 0,
  "jobresult": {
    "errorcode": 530,
    "errortext": "Failed to prepare host for maintenance due to: Unable to prepare for maintenance host 5"
  },
  "jobresultcode": 530,
  "jobresulttype": "object",
  "jobstatus": 2,
  "userid": "2027b571-0bda-11ec-a29c-1e0094000118"
}

Logs:

2021-09-06 10:50:22,675 DEBUG [c.c.a.ApiServlet] (qtp1233705144-21:ctx-0ed33ef6) (logid:23dd2023) ===START===  10.0.32.133 -- GET  apiKey=LIN6rqXuaJwMPfGYFh13qDwYz5VNNz1J2J6qIOWcd3oLQOq0WtD4CwRundBL6rzXToa3lQOC_vKjI3nkHtiD8Q&command=queryAsyncJobResult&jobid=93746f97-c532-42a1-a196-9f4458c02697&response=json&signature=XFJ0JF6QvW8ZK%2FfFtvr15m7fQiE%3D
2021-09-06 10:50:22,677 DEBUG [c.c.a.ApiServer] (qtp1233705144-21:ctx-0ed33ef6 ctx-5d99131c) (logid:23dd2023) CIDRs from which account 'Acct[202665fc-0bda-11ec-a29c-1e0094000118-admin] -- Account {"id": 2, "name": "admin", "uuid": "202665fc-0bda-11ec-a29c-1e0094000118"}' is allowed to perform API calls: 0.0.0.0/0,::/0
2021-09-06 10:50:22,683 INFO  [c.c.r.ResourceManagerImpl] (API-Job-Executor-6:ctx-6ea2ffbd job-67 ctx-c79733b1) (logid:93746f97) Maintenance: attempting maintenance of host 655b4526-aa78-45e0-978f-41ac2ff45bf1
2021-09-06 10:50:22,684 DEBUG [c.c.a.t.Request] (API-Job-Executor-6:ctx-6ea2ffbd job-67 ctx-c79733b1) (logid:93746f97) Seq 5-5937433158734577709: Sending  { Cmd , MgmtId: 32987831861528, via: 5(10.0.34.155), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.MaintainCommand":{"wait":"0","bypassHostMaintenance":"false"}}] }
2021-09-06 10:50:22,685 DEBUG [c.c.a.t.Request] (API-Job-Executor-6:ctx-6ea2ffbd job-67 ctx-c79733b1) (logid:93746f97) Seq 5-5937433158734577709: Executing:  { Cmd , MgmtId: 32987831861528, via: 5(10.0.34.155), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.MaintainCommand":{"wait":"0","bypassHostMaintenance":"false"}}] }
2021-09-06 10:50:22,685 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-37:ctx-97d304bf) (logid:88268016) Seq 5-5937433158734577709: Executing request
2021-09-06 10:50:22,685 INFO  [c.c.h.v.r.VmwareResource] (DirectAgent-37:ctx-97d304bf 10.0.34.155, job-67, cmd: MaintainCommand) (logid:93746f97) Executing resource MaintainCommand: {"wait":0,"bypassHostMaintenance":false}
2021-09-06 10:50:22,685 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-37:ctx-97d304bf) (logid:93746f97) Seq 5-5937433158734577709: Response Received: 
2021-09-06 10:50:22,686 DEBUG [c.c.a.t.Request] (DirectAgent-37:ctx-97d304bf) (logid:93746f97) Seq 5-5937433158734577709: Processing:  { Ans: , MgmtId: 32987831861528, via: 5(10.0.34.155), Ver: v1, Flags: 110, [{"com.cloud.agent.api.MaintainAnswer":{"willMigrate":"true","result":"true","details":"Put host in maintaince","wait":"0","bypassHostMaintenance":"false"}}] }
2021-09-06 10:50:22,686 DEBUG [c.c.a.m.AgentAttache] (DirectAgent-37:ctx-97d304bf) (logid:93746f97) Seq 5-5937433158734577709: No more commands found
2021-09-06 10:50:22,686 DEBUG [c.c.a.t.Request] (API-Job-Executor-6:ctx-6ea2ffbd job-67 ctx-c79733b1) (logid:93746f97) Seq 5-5937433158734577709: Received:  { Ans: , MgmtId: 32987831861528, via: 5(10.0.34.155), Ver: v1, Flags: 110, { MaintainAnswer } }
2021-09-06 10:50:22,686 DEBUG [c.c.a.m.AgentManagerImpl] (API-Job-Executor-6:ctx-6ea2ffbd job-67 ctx-c79733b1) (logid:93746f97) Details from executing class com.cloud.agent.api.MaintainCommand: Put host in maintaince
2021-09-06 10:50:22,690 DEBUG [c.c.r.ResourceState] (API-Job-Executor-6:ctx-6ea2ffbd job-67 ctx-c79733b1) (logid:93746f97) Resource state update: [id = 5; name = 10.0.34.155; old state = Enabled; event = AdminAskMaintenance; new state = PrepareForMaintenance]
2021-09-06 10:50:22,692 DEBUG [c.c.a.ApiServlet] (qtp1233705144-21:ctx-0ed33ef6 ctx-5d99131c ctx-d6e3d0a9) (logid:23dd2023) ===END===  10.0.32.133 -- GET  apiKey=LIN6rqXuaJwMPfGYFh13qDwYz5VNNz1J2J6qIOWcd3oLQOq0WtD4CwRundBL6rzXToa3lQOC_vKjI3nkHtiD8Q&command=queryAsyncJobResult&jobid=93746f97-c532-42a1-a196-9f4458c02697&response=json&signature=XFJ0JF6QvW8ZK%2FfFtvr15m7fQiE%3D
2021-09-06 10:50:22,700 WARN  [c.c.r.ResourceManagerImpl] (API-Job-Executor-6:ctx-6ea2ffbd job-67 ctx-c79733b1) (logid:93746f97) Unable to find a host for vm migration in cluster: 6
2021-09-06 10:50:22,701 WARN  [c.c.r.ResourceManagerImpl] (API-Job-Executor-6:ctx-6ea2ffbd job-67 ctx-c79733b1) (logid:93746f97) VMs cannot be migrated across cluster since migrate.vm.across.clusters is false for zone ID: 1
2021-09-06 10:50:22,702 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-6:ctx-6ea2ffbd job-67) (logid:93746f97) Complete async job-67, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"530","errortext":"Failed to prepare host for maintenance due to: Unable to prepare for maintenance host 5"}

Comment on lines 1374 to 1379
for (final VMInstanceVO vm : vms) {
if (! HypervisorType.VMware.equals(host.getHypervisorType()) && _vmMgr.checkIfVmHasClusterWideVolumes(vm.getId())) {
s_logger.warn(String.format("VM %s cannot be migrated across cluster as it has volumes on cluster-wide pool", vm));
return false;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: this block can be moved to start of if to fail early

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shwstppr
moved the hypervisor type check.

I have considered the order of checks. in my opinion, the checks of vm volumes have more db queries than listAllUpAndEnabledHosts so I put the check prior to volume pool check.

@weizhouapache
Copy link
Member

@shwstppr great, thanks for testing !

@weizhouapache
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@weizhouapache a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 1148

@nvazquez
Copy link
Contributor

nvazquez commented Sep 7, 2021

@blueorangutan test

@blueorangutan
Copy link

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1981)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 33813 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4534-t1981-kvm-centos7.zip
Smoke tests completed. 89 look OK, 0 have errors
Only failed tests results shown below:

Test Result Time (s) Test File

Copy link
Contributor

@nvazquez nvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When only one host is available and it is put in maintenance, all VMs on it are Stopped.

9 participants