Monday, February 18, 2019

Module 'CPUID' power on failed

When building my shiny new homelab I had enabled nested virtualization globally on all the hosts,

echo 'vhv.allow = "TRUE"' >> /etc/vmware/config

This seemed like a great idea and installing nested ESXi VMs went smoothly.  Then I wanted shared storage for those nested labs and had the bright idea of using NetApp OnTap virtual appliances or simulators to provide that - the OnTap appliance eval provides more storage than I need for 60 days, while the simulator is limited to 210GB but not in duration - in my lab I think the simulator is plenty and use in the lab is exactly what it is provided for.

Anyhow deploy the OVF, power on, error 'module CPUID power on failed'.  Deploy the OnTap virtual appliance, same result.

Some Googling, figure out it's to do with the vhv setting, turn that off on one host and reboot, sure enough both VMs now work fine.  Migrate a powered off ESXi VM to the modified host, that works fine too as the VMX file has the nested virtualization settings anyway.

Now for the pain - decide to remove vhv.allow from all my hosts, click maintenance mode, vmotions all fail with 'Failed to receive migration' - because the destinations have different CPU capabilities than the originating hosts.  I understand this fully for the VMs using nested virtualization but this is all VMs...such fun now going through every host powering down all the VMs in order to cold migrate them...all done now, but the lesson is DO NOT globally turn on vhv.allow anymore, it's better to turn on VT passthrough on individual VMs and not have other VMs that won't power on.