over in byhve, vm_run tries to make sure that vCPU-specific resources (cyclics, etc) are kept on the same CPU as whichever is current. vm_localize_resources makes this happen, and up at the top of that is if (vcpu->lastloccpu == curcpu). if we're not on the same CPU, we'll have to move resources (such as vlapic_localize_resources moving a cyclic), and in the case of vlapic_localize_resources that's serialized through cpu_lock.
on a largeish system (256 threads) with a largeish VM (192 vCPUs), we're limited at somewhere in the range of 500-700k vm_run()/second because of this(!)
while we don't want to be doing 500-700k interrupts/second, ideally, we can save a bunch of system time by binding vCPUs to physical CPUs and ensuring if (vcpu->lastloccpu == curcpu). at the very least, perhaps binding vCPU threads to a smaller CPU group so the odds are improved.
over in byhve,
vm_runtries to make sure that vCPU-specific resources (cyclics, etc) are kept on the same CPU as whichever is current.vm_localize_resourcesmakes this happen, and up at the top of that isif (vcpu->lastloccpu == curcpu). if we're not on the same CPU, we'll have to move resources (such asvlapic_localize_resourcesmoving a cyclic), and in the case ofvlapic_localize_resourcesthat's serialized throughcpu_lock.on a largeish system (256 threads) with a largeish VM (192 vCPUs), we're limited at somewhere in the range of 500-700k
vm_run()/second because of this(!)while we don't want to be doing 500-700k interrupts/second, ideally, we can save a bunch of system time by binding vCPUs to physical CPUs and ensuring
if (vcpu->lastloccpu == curcpu). at the very least, perhaps binding vCPU threads to a smaller CPU group so the odds are improved.