settingsLogin | Registersettings

[openstack-dev] [nova] Why don't we unbind ports or terminate volume connections on shelve offload?

0 votes

This came up in the nova/cinder meeting today, but I can't for the life
of me think of why we don't unbind ports or terminate the connection
volumes when we shelve offload an instance from a compute host.

When you unshelve, if the instance was shelved offloaded, the conductor
asks the scheduler for a new set of hosts to build the instance on
(unshelve it). That could be a totally different host.

So am I just missing something super obvious? Or is this the most latent
bug ever?

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked May 31, 2017 in openstack-dev by mriedemos_at_gmail.c (15,720 points)   2 4 10

4 Responses

0 votes

On 04/13/2017 10:45 AM, Matt Riedemann wrote:
This came up in the nova/cinder meeting today, but I can't for the life of me
think of why we don't unbind ports or terminate the connection volumes when we
shelve offload an instance from a compute host.

When you unshelve, if the instance was shelved offloaded, the conductor asks the
scheduler for a new set of hosts to build the instance on (unshelve it). That
could be a totally different host.

So am I just missing something super obvious? Or is this the most latent bug ever?

Does anyone actually use shelve?

Chris


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Apr 13, 2017 by Chris_Friesen (20,420 points)   3 16 24
0 votes

On Thu, Apr 13, 2017, at 12:45 PM, Matt Riedemann wrote:
This came up in the nova/cinder meeting today, but I can't for the life
of me think of why we don't unbind ports or terminate the connection
volumes when we shelve offload an instance from a compute host.

When you unshelve, if the instance was shelved offloaded, the conductor
asks the scheduler for a new set of hosts to build the instance on
(unshelve it). That could be a totally different host.

So am I just missing something super obvious? Or is this the most latent
bug ever?

It's at the very least a hack, and may be a bug depending on what
behaviour is being seen while the instance is offloaded or unshelved.

The reason that networks and volumes are left in place is because it
is/was the only way to prevent them from being used by another instance
and causing a subsequent unshelve to fail. During the unshelve operation
it is expected that they will then be shifted over to the new host the
instance lands on if it switches hosts.

This is similar to how resize is handled. From an implementation point
of view you can think of shelve as being a really really long
resize/migration operation.

There very well may be issues with this approach.

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Apr 13, 2017 by andrew_at_lascii.com (6,820 points)   1 2 5
0 votes

On 4/13/2017 6:53 PM, Andrew Laski wrote:

On Thu, Apr 13, 2017, at 12:45 PM, Matt Riedemann wrote:

This came up in the nova/cinder meeting today, but I can't for the life
of me think of why we don't unbind ports or terminate the connection
volumes when we shelve offload an instance from a compute host.

When you unshelve, if the instance was shelved offloaded, the conductor
asks the scheduler for a new set of hosts to build the instance on
(unshelve it). That could be a totally different host.

So am I just missing something super obvious? Or is this the most latent
bug ever?

It's at the very least a hack, and may be a bug depending on what
behaviour is being seen while the instance is offloaded or unshelved.

The reason that networks and volumes are left in place is because it
is/was the only way to prevent them from being used by another instance
and causing a subsequent unshelve to fail. During the unshelve operation
it is expected that they will then be shifted over to the new host the
instance lands on if it switches hosts.

This is similar to how resize is handled. From an implementation point
of view you can think of shelve as being a really really long
resize/migration operation.

There very well may be issues with this approach.

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

I'm not advocating that we detach the volumes or ports - we can totally
leave those coupled with the instance in the database (the
port.device_id still points at the instance even though the port's
binding details don't have a host set). The thing with the volume though
is we need to terminate the connection to the backend for that host
before we offload, because when we unshelve and initialize a new volume
connection, there will now be two connections.

As noted elsewhere in the thread, there is a reported bug for this and
some history around it. Calling terminateconnection will fix the issue
for some backends in Cinder but not all (like it won't fix LVM). There
is some other internal 'remove
export' call in Cinder that fixes it for
LVM, but that is not exposed out of the API except through the
os-detach API, which is precisely the thing we can't call for shelve
offload for the reason you described.

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Apr 14, 2017 by mriedemos_at_gmail.c (15,720 points)   2 4 10
0 votes

On 4/13/2017 11:45 AM, Matt Riedemann wrote:
This came up in the nova/cinder meeting today, but I can't for the life
of me think of why we don't unbind ports or terminate the connection
volumes when we shelve offload an instance from a compute host.

When you unshelve, if the instance was shelved offloaded, the conductor
asks the scheduler for a new set of hosts to build the instance on
(unshelve it). That could be a totally different host.

So am I just missing something super obvious? Or is this the most latent
bug ever?

Looks like this is a known bug:

https://bugs.launchpad.net/nova/+bug/1547142

The fix on the nova side apparently depends on some changes on the
cinder side. The new v3.27 APIs in cinder might help with all of this,
but it doesn't fix old attachments.

By the way, search for shelve + volume in nova bugs and you're rewarded
with a treasure trove of bugs:

https://bugs.launchpad.net/nova/?field.searchtext=shelved+volume&search=Search&field.status%3Alist=NEW&field.status%3Alist=INCOMPLETE_WITH_RESPONSE&field.status%3Alist=INCOMPLETE_WITHOUT_RESPONSE&field.status%3Alist=CONFIRMED&field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.status%3Alist=FIXCOMMITTED&field.assignee=&field.bug_reporter=&field.omit_dupes=on&field.has_patch=&field.has_no_package=

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 31, 2017 by mriedemos_at_gmail.c (15,720 points)   2 4 10
...