settingsLogin | Registersettings

[Openstack-operators] Libvirt CPU map (host-model)

0 votes

Hi,
the CPU model that we expose to the guest VMs varies considering the
compute node use case.
We use "cpumode=host-passthrough" for the compute nodes that run batch
processing VMs and "cpu
mode=host-model" for the compute nodes for service
VMs. The reason to have "cpu_mode=host-model" is because we assumed that
new CPUs (in the libvirt map) will continue to support previous features
allowing for live migration when we need to move the VMs to a new CPU
generation.

We recently upgraded from CentOS7.3 (libvirt 2.0.0) to CentOS7.4 (libvirt
3.2.0) and noticed that now libvirt maps a slightly different CPU for the
guests. For example, still "Haswell no-TSX" but no mention to the feature
"cmt". This blocks suspended VMs to restore and live migrate.

Has anyone experienced this same problem?

We are thinking in few solutions but none of them are nice (downgrade
libvirt? hard reboot instances? ...)

thanks,
Belmiro


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
asked Oct 9, 2017 in openstack-operators by Belmiro_Moreira (2,620 points)   2 4

2 Responses

0 votes

Hello Belmiro,

We ran into this issue recently, similarly upgrading a RHEL7.3 OpenStack
Platform Overcloud to RHEL7.4 and in the process upgrading libvirtd.

For instances that were spawned prior to this upgrade, we see the CPU flags
[1] , but for new instance workload the CPU flags [2]. Notably the
CMT=disabled flag is present in [1] but absent in [2]

This similarly prevents live migration of the older spawned instances, as
the CMT=disabled flag is rejected.

A RH bugzilla [3] was opened on the issue which attracted a lot of really
good contributions from libvirt maintainers. The one sure-fire workaround
we'd found is just to cold-boot the instance again, starting it under the
new libvirtd. But from that BZ there is also a slightly more hack-ish
workaround to hand-edit the running domain XML and clear the offending CMT
flag (comment 12 on that BZ).

Hope this helps some,

Thanks,
Paul Browne

[1] https://pastebin.com/JshWi6i3
[2] https://pastebin.com/5b8cAanP
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1495171

On 9 October 2017 at 04:59, Belmiro Moreira <
moreira.belmiro.email.lists@gmail.com> wrote:

Hi,
the CPU model that we expose to the guest VMs varies considering the
compute node use case.
We use "cpumode=host-passthrough" for the compute nodes that run batch
processing VMs and "cpu
mode=host-model" for the compute nodes for service
VMs. The reason to have "cpu_mode=host-model" is because we assumed that
new CPUs (in the libvirt map) will continue to support previous features
allowing for live migration when we need to move the VMs to a new CPU
generation.

We recently upgraded from CentOS7.3 (libvirt 2.0.0) to CentOS7.4 (libvirt
3.2.0) and noticed that now libvirt maps a slightly different CPU for the
guests. For example, still "Haswell no-TSX" but no mention to the feature
"cmt". This blocks suspended VMs to restore and live migrate.

Has anyone experienced this same problem?

We are thinking in few solutions but none of them are nice (downgrade
libvirt? hard reboot instances? ...)

thanks,
Belmiro


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

--


Paul Browne
Research Computing Platforms
University Information Services
Roger Needham Building
JJ Thompson Avenue
University of Cambridge
Cambridge
United Kingdom
E-Mail: pfb29@cam.ac.uk
Tel: 0044-1223-746548


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Oct 9, 2017 by Paul_Browne (400 points)  
0 votes

Hi Paul,
yes, this is exactly what we are observing.
Thanks for the bugzilla pointer.

We now also opened a ticket through the RedHat support.

thanks,
Belmiro

On Mon, Oct 9, 2017 at 12:37 PM, Paul Browne pfb29@cam.ac.uk wrote:

Hello Belmiro,

We ran into this issue recently, similarly upgrading a RHEL7.3 OpenStack
Platform Overcloud to RHEL7.4 and in the process upgrading libvirtd.

For instances that were spawned prior to this upgrade, we see the CPU
flags [1] , but for new instance workload the CPU flags [2]. Notably the
CMT=disabled flag is present in [1] but absent in [2]

This similarly prevents live migration of the older spawned instances, as
the CMT=disabled flag is rejected.

A RH bugzilla [3] was opened on the issue which attracted a lot of really
good contributions from libvirt maintainers. The one sure-fire workaround
we'd found is just to cold-boot the instance again, starting it under the
new libvirtd. But from that BZ there is also a slightly more hack-ish
workaround to hand-edit the running domain XML and clear the offending CMT
flag (comment 12 on that BZ).

Hope this helps some,

Thanks,
Paul Browne

[1] https://pastebin.com/JshWi6i3
[2] https://pastebin.com/5b8cAanP
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1495171

On 9 October 2017 at 04:59, Belmiro Moreira <moreira.belmiro.email.lists@
gmail.com> wrote:

Hi,
the CPU model that we expose to the guest VMs varies considering the
compute node use case.
We use "cpumode=host-passthrough" for the compute nodes that run batch
processing VMs and "cpu
mode=host-model" for the compute nodes for service
VMs. The reason to have "cpu_mode=host-model" is because we assumed that
new CPUs (in the libvirt map) will continue to support previous features
allowing for live migration when we need to move the VMs to a new CPU
generation.

We recently upgraded from CentOS7.3 (libvirt 2.0.0) to CentOS7.4 (libvirt
3.2.0) and noticed that now libvirt maps a slightly different CPU for the
guests. For example, still "Haswell no-TSX" but no mention to the feature
"cmt". This blocks suspended VMs to restore and live migrate.

Has anyone experienced this same problem?

We are thinking in few solutions but none of them are nice (downgrade
libvirt? hard reboot instances? ...)

thanks,
Belmiro


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

--


Paul Browne
Research Computing Platforms
University Information Services
Roger Needham Building
JJ Thompson Avenue
University of Cambridge
Cambridge
United Kingdom
E-Mail: pfb29@cam.ac.uk
Tel: 0044-1223-746548


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Oct 10, 2017 by Belmiro_Moreira (2,620 points)   2 4
...