settingsLogin | Registersettings

[openstack-dev] [nova] Queens PTG recap - everything else

0 votes

There was a whole lot of other stuff discussed at the PTG. The details
are in [1]. I won't go into everything here, so I'm just highlighting
some of the more concrete items that had owners or TODOs.

Ironic


The Ironic team came over on Wednesday afternoon. We talked a bit, had
some laughs, it was a good time. Since I don't speak fluent baremetal,
Dmitry Tantsur is going to recap those discussions in the mailing list.
Thanks again, Dmitry.

Privsep


Michael Still has been going hog wild converting the nova libvirt driver
code to use privsep instead of rootwrap. He has a series of changes
tracked under this blueprint [2]. Most of the discussion was a refresh
on privsep and a recap of what's already been merged and some discussion
on outstanding patches. The goal for Queens is to get the entire libvirt
driver converted and also try to get all of nova-compute converted, but
we want to limit that to getting things merged early in the release to
flush out bugs since a lot of these are weird, possibly untested code
paths. There was also discussion of a kind of privsep heartbeat daemon
to tell if it's running (even though it's not a separate service) but
this is complicated and is not something we'll pursue for Queens.

Websockify security proxy framework


This is a long-standing security hardening feature [3] which has changed
hands a few times and hasn't gotten much review. Sean Dague and Melanie
Witt agreed to focus on reviewing this for Queens.

Certificate validation


This is another item that's been discussed since at least the Ocata
summit but hasn't made much progress. Sean Dague agreed to help review
this, and Eric Fried said he knew someone that could help review the
security aspects of this change. Sean also suggested scheduling a
hangout so the John Hopkins University team working on this can give a
primer on the feature and what to look out for during review. We also
suggested getting a scenario test written for this in the barbican
tempest plugin, which runs as an experimental queue job for nova.

Notifications


Given the state of the Searchlight project and how we don't plan on
using Searchlight as a global proxy for the compute REST API, we are not
going to work on parity with versioned notifications there. There are
some cleanups we still need to do in Nova for versioned notifications
from a performance perspective. We also agreed that we aren't going to
consider deprecating legacy unversioned notifications until we have
parity with the versioned notifications, especially given legacy
unversioned notification consumers have not yet moved to using the
versioned notifications.

vGPU support


This depends on nested resource providers (like lots of other things).
It was not clear from the discussion if this is static or dynamic
support, e.g. can we hot plug vGPUs using Cyborg? I assume we will not
support hot plugging at first. We also need improved functional testing
of this space before we can make big changes.

Preemptible (spot) instances


This was continuing the discussion from the Boston forum session [5].
The major issue in Nova is that we don't want Nova to be in charge of
orchestrating preempting instances when a request comes in for a "paid"
instance. We agreed to start small where you can't burst over quota.
Blazar also delivered some reservation features in Pike [6] which sound
like they can be built on here, which also sound like expiration
policies. Someone will have to prototype an external (to nova) "reaper"
which will cull the preemptible instances based on some configurable
policy. Honestly the notes here are confusing so we're going to need
someone to drive this forward. That might mean picking up John Garbutt's
draft spec for this (link not available right now).

Driver updates


Various teams from IBM gave updates on plans for their drivers in Queens.

PowerVM (in tree): the team is proposing a few more capabilities to the
driver in Queens. Details are in the spec [7].

zDPM (out of tree): this out of tree driver has had two releases (ocata
and pike) and is working on 3rd party CI. One issue they have with
Tempest is they can only boot from volume.

zVM (out of tree): the team is working on refactoring some code into a
library, similar to os-xenapi, os-powervm and oslo.vmware. They have CI
running but are not yet reporting against nova changes.

Endpoint discovery


This is carry-over work from Ocata and Pike to standardize how Nova does
endpoint discovery with other services, like
keystone/placement/cinder/glance/neutron/ironic/barbican. The spec is
here [8]. The dependent keystoneauth1 changes were released in Pike so
we should be able to make quick progress on this early in Queens to
flush out bugs.

Documentation


We talked about the review process for docs changes and agreed it's not
easy to define when we can have a single +2 to approve a docs change
(typo fixes and such are fine). We also noted that it's OK for people to
ping other cores in IRC when they think a docs patch is ready to go, so
that we can help move docs patches along faster.

We also talked a bit about the proposed docs tree structure laid out
here [9] and everyone seemed OK with that. Note that we never had a
cross-project discussion about how to organize the various project team
main pages. If the broader docs team did, maybe someone can please recap
that for us.

Deprecations


We talked about several things worth deprecating in Queens.

  • The os-cells REST API: we aren't going to do a formal microversion
    deprecation for this. It's just going to go away when the cells v1 code
    goes away, hopefully in Rocky.

  • Running nova-api under eventlet: we've been running nova-api under
    uwsgi in CI since Pike, so we agreed to deprecate running nova-api under
    eventlet in Queens and remove that support in Rocky. Doing this
    deprecation needs an owner...

  • Deprecating (personality) file injection: we talked about this at the
    Pike PTG and agreed we should still do this. People can use userdata.
    I'm going to write a spec for what this looks like in the API. We also
    need to consider, as part of this, if we should allow the user to
    specify new userdata during rebuild. Note that the backend code (with
    libguestfs) will continue to work with older microversions using file
    injection, but this is how we signal it's going away or you shouldn't
    use it.

  • Ephemeral and swap disks: these aren't going away, but Matthew Booth
    would like to remove image caching for them, since it causes lot of
    problems. This seems OK to do.

Strict isolation of hosts for image and flavor


This was a re-hash of a spec [10] and discussion we had at the Pike PTG.
Tushar Patil's team is going to take over the spec and update it. And
just like we said in Atlanta, stakeholders that are doing some variant
of this scheduler filter already (Intel, WindRiver) should be reviewing
the spec to make sure it will cover their existing use cases and out of
tree scheduler filters.

Updating server instance keypairs


Several API consumers, including OpenStack Infra, have asked for the
ability to update the keypair associated with an instance. We initially
talked about just being able to specify a new keypair during rebuild,
but then it was pointed out that depending on how cloud-init is
configured, all one might need to do is update the keypair and reboot
the instance. Kevin Zheng is going to take over the spec [11] and update
it for the rebuild/reboot cases which will probably determine how we
want the API to behave.

More instance action (event) records


We track instance action start/end events for a lot of server action
APIs but not all, like attach/detach interfaces/volumes. Kevin Zheng is
going to work on filling in some of these gaps in Queens.

Performance issue of listing instances and filtering on IP


A long-standing known performance issue [12] is that when listing
instances and filtering on IP, we don't do the IP filtering in the DB
SQL query, we do it in code. We talked about a few ways to solve this,
such as adding a new mapping table for the filter/join, but this might
get out of sync with the instanceinfocaches table. Another idea was
doing an up-front filter of instances using the Neutron ports API such
that we could figure out which instances (via port.device_id) have
certain fixed IPs. The issue here is that the Neutron ports API does not
perform a regex filter on the IPs, which the compute API does. It also
means more proxying to other services. We kicked around the idea of
implementing this client-side in openstackclient and then having that be
a template for other client/SDK code to model (something I'm not
personally in love with). The TODO on this is to see what it would take
to get the Neutron ports API to support regex matching on IP filters.

Add a description field to flavors


There was general agreement on doing this [13], and allowing to update
the description for a flavor. We said we wouldn't persist the
flavor.description with the instance though, so showing an embedded
flavor with server details wouldn't include the description
(microversion >= 2.47).

Deprecating the keymap option


The proposal to deprecate the keymap option [14] will wait until there
is an alternative, which will not be available until a new release of
the noVNC package happens (version 1.0.0?). As for specifying the keymap
when creating a server, we said to not do this with flavor extra specs,
but instead pass userdata through to cloud-init.

Availability zones with ':' in the name


This was a discussion about a latent bug [15] and the forced hosts
capability for admins to specify AZ:HOST:NODE in the API. If the
availability zone name has a colon in it, it breaks the parsing. We
agreed that we will update the schema for AZs to not allow colons in the
name. We realize this means requests which used to work to create an AZ
with ':' in the name will now fail, and we are not going to do this with
a microversion, because even though you can create AZs with colons in
the name, you can't use them - it will just result in an obscure failure
later. We also agreed to backport the fix for this, and make sure it's
clearly documented in the API reference.

[1] https://etherpad.openstack.org/p/nova-ptg-queens
[2] https://blueprints.launchpad.net/nova/+spec/hurrah-for-privsep
[3] https://review.openstack.org/#/c/496160/
[4]
https://review.openstack.org/#/q/topic:bp/nova-validate-certificates+status:open
[5] https://etherpad.openstack.org/p/BOS-forum-advanced-instance-scheduling
[6]
http://blazar.readthedocs.io/en/latest/userdoc/using.instance-reservation.html
[7] https://review.openstack.org/#/c/503061/
[8] https://review.openstack.org/500190
[9] https://etherpad.openstack.org/p/ideal-nova-docs-landing-page
[10] https://review.openstack.org/#/c/381912/
[11] https://review.openstack.org/#/c/375221/
[12] https://bugs.launchpad.net/nova/+bug/1711303
[13] https://review.openstack.org/#/c/501017/
[14] https://review.openstack.org/#/c/483994/
[15] https://review.openstack.org/#/c/490722/

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Sep 18, 2017 in openstack-dev by mriedemos_at_gmail.c (15,720 points)   2 4 10

2 Responses

0 votes

Hi Matt,
thanks for these great summaries.

I didn't find any mention to nested quotas.
Was it discussed in the PTG? and what can we expect for Queens?

thanks,
Belmiro
CERN

On Mon, Sep 18, 2017 at 11:58 PM, Matt Riedemann mriedemos@gmail.com
wrote:

There was a whole lot of other stuff discussed at the PTG. The details are
in [1]. I won't go into everything here, so I'm just highlighting some of
the more concrete items that had owners or TODOs.

Ironic


The Ironic team came over on Wednesday afternoon. We talked a bit, had
some laughs, it was a good time. Since I don't speak fluent baremetal,
Dmitry Tantsur is going to recap those discussions in the mailing list.
Thanks again, Dmitry.

Privsep


Michael Still has been going hog wild converting the nova libvirt driver
code to use privsep instead of rootwrap. He has a series of changes tracked
under this blueprint [2]. Most of the discussion was a refresh on privsep
and a recap of what's already been merged and some discussion on
outstanding patches. The goal for Queens is to get the entire libvirt
driver converted and also try to get all of nova-compute converted, but we
want to limit that to getting things merged early in the release to flush
out bugs since a lot of these are weird, possibly untested code paths.
There was also discussion of a kind of privsep heartbeat daemon to tell if
it's running (even though it's not a separate service) but this is
complicated and is not something we'll pursue for Queens.

Websockify security proxy framework


This is a long-standing security hardening feature [3] which has changed
hands a few times and hasn't gotten much review. Sean Dague and Melanie
Witt agreed to focus on reviewing this for Queens.

Certificate validation


This is another item that's been discussed since at least the Ocata summit
but hasn't made much progress. Sean Dague agreed to help review this, and
Eric Fried said he knew someone that could help review the security aspects
of this change. Sean also suggested scheduling a hangout so the John
Hopkins University team working on this can give a primer on the feature
and what to look out for during review. We also suggested getting a
scenario test written for this in the barbican tempest plugin, which runs
as an experimental queue job for nova.

Notifications


Given the state of the Searchlight project and how we don't plan on using
Searchlight as a global proxy for the compute REST API, we are not going to
work on parity with versioned notifications there. There are some cleanups
we still need to do in Nova for versioned notifications from a performance
perspective. We also agreed that we aren't going to consider deprecating
legacy unversioned notifications until we have parity with the versioned
notifications, especially given legacy unversioned notification consumers
have not yet moved to using the versioned notifications.

vGPU support


This depends on nested resource providers (like lots of other things). It
was not clear from the discussion if this is static or dynamic support,
e.g. can we hot plug vGPUs using Cyborg? I assume we will not support hot
plugging at first. We also need improved functional testing of this space
before we can make big changes.

Preemptible (spot) instances


This was continuing the discussion from the Boston forum session [5]. The
major issue in Nova is that we don't want Nova to be in charge of
orchestrating preempting instances when a request comes in for a "paid"
instance. We agreed to start small where you can't burst over quota. Blazar
also delivered some reservation features in Pike [6] which sound like they
can be built on here, which also sound like expiration policies. Someone
will have to prototype an external (to nova) "reaper" which will cull the
preemptible instances based on some configurable policy. Honestly the notes
here are confusing so we're going to need someone to drive this forward.
That might mean picking up John Garbutt's draft spec for this (link not
available right now).

Driver updates


Various teams from IBM gave updates on plans for their drivers in Queens.

PowerVM (in tree): the team is proposing a few more capabilities to the
driver in Queens. Details are in the spec [7].

zDPM (out of tree): this out of tree driver has had two releases (ocata
and pike) and is working on 3rd party CI. One issue they have with Tempest
is they can only boot from volume.

zVM (out of tree): the team is working on refactoring some code into a
library, similar to os-xenapi, os-powervm and oslo.vmware. They have CI
running but are not yet reporting against nova changes.

Endpoint discovery


This is carry-over work from Ocata and Pike to standardize how Nova does
endpoint discovery with other services, like keystone/placement/cinder/glance/neutron/ironic/barbican.
The spec is here [8]. The dependent keystoneauth1 changes were released in
Pike so we should be able to make quick progress on this early in Queens to
flush out bugs.

Documentation


We talked about the review process for docs changes and agreed it's not
easy to define when we can have a single +2 to approve a docs change (typo
fixes and such are fine). We also noted that it's OK for people to ping
other cores in IRC when they think a docs patch is ready to go, so that we
can help move docs patches along faster.

We also talked a bit about the proposed docs tree structure laid out here
[9] and everyone seemed OK with that. Note that we never had a
cross-project discussion about how to organize the various project team
main pages. If the broader docs team did, maybe someone can please recap
that for us.

Deprecations


We talked about several things worth deprecating in Queens.

  • The os-cells REST API: we aren't going to do a formal microversion
    deprecation for this. It's just going to go away when the cells v1 code
    goes away, hopefully in Rocky.

  • Running nova-api under eventlet: we've been running nova-api under uwsgi
    in CI since Pike, so we agreed to deprecate running nova-api under eventlet
    in Queens and remove that support in Rocky. Doing this deprecation needs an
    owner...

  • Deprecating (personality) file injection: we talked about this at the
    Pike PTG and agreed we should still do this. People can use userdata. I'm
    going to write a spec for what this looks like in the API. We also need to
    consider, as part of this, if we should allow the user to specify new
    userdata during rebuild. Note that the backend code (with libguestfs) will
    continue to work with older microversions using file injection, but this is
    how we signal it's going away or you shouldn't use it.

  • Ephemeral and swap disks: these aren't going away, but Matthew Booth
    would like to remove image caching for them, since it causes lot of
    problems. This seems OK to do.

Strict isolation of hosts for image and flavor


This was a re-hash of a spec [10] and discussion we had at the Pike PTG.
Tushar Patil's team is going to take over the spec and update it. And just
like we said in Atlanta, stakeholders that are doing some variant of this
scheduler filter already (Intel, WindRiver) should be reviewing the spec to
make sure it will cover their existing use cases and out of tree scheduler
filters.

Updating server instance keypairs


Several API consumers, including OpenStack Infra, have asked for the
ability to update the keypair associated with an instance. We initially
talked about just being able to specify a new keypair during rebuild, but
then it was pointed out that depending on how cloud-init is configured, all
one might need to do is update the keypair and reboot the instance. Kevin
Zheng is going to take over the spec [11] and update it for the
rebuild/reboot cases which will probably determine how we want the API to
behave.

More instance action (event) records


We track instance action start/end events for a lot of server action APIs
but not all, like attach/detach interfaces/volumes. Kevin Zheng is going to
work on filling in some of these gaps in Queens.

Performance issue of listing instances and filtering on IP


A long-standing known performance issue [12] is that when listing
instances and filtering on IP, we don't do the IP filtering in the DB SQL
query, we do it in code. We talked about a few ways to solve this, such as
adding a new mapping table for the filter/join, but this might get out of
sync with the instanceinfocaches table. Another idea was doing an
up-front filter of instances using the Neutron ports API such that we could
figure out which instances (via port.device_id) have certain fixed IPs. The
issue here is that the Neutron ports API does not perform a regex filter on
the IPs, which the compute API does. It also means more proxying to other
services. We kicked around the idea of implementing this client-side in
openstackclient and then having that be a template for other client/SDK
code to model (something I'm not personally in love with). The TODO on this
is to see what it would take to get the Neutron ports API to support regex
matching on IP filters.

Add a description field to flavors


There was general agreement on doing this [13], and allowing to update the
description for a flavor. We said we wouldn't persist the
flavor.description with the instance though, so showing an embedded flavor
with server details wouldn't include the description (microversion >= 2.47).

Deprecating the keymap option


The proposal to deprecate the keymap option [14] will wait until there is
an alternative, which will not be available until a new release of the
noVNC package happens (version 1.0.0?). As for specifying the keymap when
creating a server, we said to not do this with flavor extra specs, but
instead pass userdata through to cloud-init.

Availability zones with ':' in the name


This was a discussion about a latent bug [15] and the forced hosts
capability for admins to specify AZ:HOST:NODE in the API. If the
availability zone name has a colon in it, it breaks the parsing. We agreed
that we will update the schema for AZs to not allow colons in the name. We
realize this means requests which used to work to create an AZ with ':' in
the name will now fail, and we are not going to do this with a
microversion, because even though you can create AZs with colons in the
name, you can't use them - it will just result in an obscure failure later.
We also agreed to backport the fix for this, and make sure it's clearly
documented in the API reference.

[1] https://etherpad.openstack.org/p/nova-ptg-queens
[2] https://blueprints.launchpad.net/nova/+spec/hurrah-for-privsep
[3] https://review.openstack.org/#/c/496160/
[4] https://review.openstack.org/#/q/topic:bp/nova-validate-cert
ificates+status:open
[5] https://etherpad.openstack.org/p/BOS-forum-advanced-instance
-scheduling
[6] http://blazar.readthedocs.io/en/latest/userdoc/using.instanc
e-reservation.html
[7] https://review.openstack.org/#/c/503061/
[8] https://review.openstack.org/500190
[9] https://etherpad.openstack.org/p/ideal-nova-docs-landing-page
[10] https://review.openstack.org/#/c/381912/
[11] https://review.openstack.org/#/c/375221/
[12] https://bugs.launchpad.net/nova/+bug/1711303
[13] https://review.openstack.org/#/c/501017/
[14] https://review.openstack.org/#/c/483994/
[15] https://review.openstack.org/#/c/490722/

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 19, 2017 by Belmiro_Moreira (2,620 points)   2 4
0 votes

On 9/19/2017 7:52 AM, Belmiro Moreira wrote:
I didn't find any mention to nested quotas.
Was it discussed in the PTG? and what can we expect for Queens?

There is no one driving the unified limits work [1] so nested quotas is
stalled and we didn't spend any time discussing it at the PTG.

[1]
http://lists.openstack.org/pipermail/openstack-dev/2017-September/121944.html

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 19, 2017 by mriedemos_at_gmail.c (15,720 points)   2 4 10
...