Error starting domain: Guest CPU does'nt match specification

Symptom: A virtual machine does not start any more

The error message is:

Error starting domain: operation failed: guest CPU doesn't match
specification: extra features: ..., missing features: tsx, hle, rtm

What does that mean?

To improve the performance of virtual machines, they are not completely emulated in software. Instead modern CPU’s offer features to perform necessary functions in hardware and thus much quicker.

Sometimes however these features get deactivated, mostly for security reasons, see for example our article on issues like spectre and meltdown. That can either be done by the CPU vendor with a microcode update or by the Linux kernel developers with a Linux kernel update. In both cases the change does not immediately become active, but only after a reboot of the hypervisor.

That however can lead to the above mentioned problem with virtual machines. When created they usually use all CPU features their host has to offer and if the host looses some of these capabilities, they would not start any more.

Who is affected?

The feature tsx has been disabled in Linux kernel package 4.9.189-3+deb9u2 and the intel-microcode package 3.20191115.2~deb9u1.
These versions were released 2019-12-18.
If both packages are updated to or above these versions, the problem might occur.

Intel CPU’s since the Haswell Architecture introduced in 2013 have been identified to lack the support for the hle (Hardware Lock Elision) and rtm (Restricted Transactional Memory) and can therefor be affected by the issue.

The command grep -e hle -e rtm /proc/cpuinfo can be used on the virtualization host before rebooting an affected kernel update to determine if there are CPU’s present which might be affected by these updates.

Owners of such systems are recommended to temporarily reactivate kernel features in order to boot, shut down and reconfigure the virtual machines properly and then reset the CPUs, as described below. It is up to the administrator to change the UCR grub/append setting, or to interrupt the grub boot and edit the linux boot options. Both options are described below.


Fix if the virtual machine is powered off (preferred)

If the machine was powered off before the reboot, the solution is simple. Reset the CPU. In UVMM under Advanced set the CPU model to Default and click save. That should fix the problem and the machine should start again.

It is recommend, to make it a habit to stop all virtual machines manually after a microcode or kernel update and before a reboot.


Fix if the virtual machine is state managed save

In case the virtual machine was running before the update and reboot of the hypervisor, the VM will be placed in a shutdown with managed save state by libvirt. That means, that the content of the RAM is saved in a snapshot and such snapshots can only be restored with all CPU features available again. There are two possibilities to get the machines running again.
Both require a reconfiguration of the CPU (as described above) after the virtual machine was powered off.

Quick but risky solution: Discard the RAM by forcing a shutdown

With backups available and if the risk is manageable the RAM can be discarded by enforcing a power-off. While this method is much quicker, it could eventually bring the virtual machine in an inconsistent state.
Keep in mind that this method is similar to pulling the power plug of a physical machine.
After the machine has shut down, the CPU can be reconfigured as described above.

Re-enabling the CPU features and then shutting down the affected machines (preferred)

A more secure solution is to re-enable all necessary CPU features temporarily.
This will allow you to resume the affected machines and then cleanly shut them down.
These CPU features can be re-enabled by using the UCR variable grub/append. To enable the tsx feature for example, use
ucr set grub/append="$(ucr get grub/append) tsx=on"
and then rebooting the server.

After the machines were shut down, the CPU can be reconfigured as described above.

Please note that tsx has been deactivated because of the TAA vulnerability.
Reactivating tsx is not recommended in the long-term and the UCR variable should be reset to its previous value after performing this workaround, and the hypervisor should be rebooted.

An alternative to changing the UCR variable is to modify just the next boot, grub has to be interrupted by pressing e. That will allow the editing of the current boot process, so that custom kernel parameters can be added. The F10-Key can then be used to continue the boot process with the one time parameters.

Edit the Linux-Kernel parameters in grub

More technical information is available in the Univention bugtracker

This topic was automatically closed after 60 minutes. New replies are no longer allowed.

Mastodon