settingsLogin | Registersettings

[openstack-dev] [ironic] my notes from the PTG in Denver

0 votes

Hi all!

Here are my notes from the ironic (and a bit of nova) room in Denver.
The same content in a nicely rendered form is on my blog:

Here goes the raw rst-formatted version. Feel free to comment and ask questions
here or there.

Status of Pike priorities

In the Pike cycle, we had 22 priority items. Quite a few planned priorities
did land completely, despite the well-known staffing problems.


Booting from cinder volumes

This includes the iRMC implementation, but excludes the iLO one. There is
also a nova patch for updating IP addresses for volume connectors on review:

Next, we need to update cinder to support FCoE - then we'll be able to
support it in the generic PXE boot interface. Finally, there is some interest
in implementing out-of-band BFV for UCS drivers too.

Rolling (online) upgrades between releases

We've found a bug that was backported to stable/pike soon after the release
and now awaits a point release. We also need developer documentation and
some post-Pike clean ups.

We also discussed fast-forward upgrades. We may need an explicit migration
for VIFs from port.extra to port.internal_info, rloo will track this.
Overall, we need to always make our migrations explicit and runnable without
the services running.

The driver composition reform

Finished, with hardware types created for all supported hardware, and the
classic drivers pending deprecation in Queens.

Removing the classic drivers_ is planned for Rocky.

Standalone jobs (jobs without nova)

These are present and voting, but we're not using their potential. The
discussion is summarized below in Future development of our CI_.

Feature parity between two CLI implementations

The openstack baremetal CLI is now complete and preferred, with the
deprecation of the ironic CLI expected in Queens.

We would like OSC to have less dependencies though. There were talks about
having a standalone openstack command without dependencies on other
clients, only on keystoneauth1. rloo will follow up here.

TheJulia will check if there are any implications from the
interoperability team point of view.

Redfish hardware type

The redfish hardware type now provides all the basic stuff we need, i.e.
power and boot device management. There is an ongoing effort to implement
inspection. It is unclear whether more features can be implemented in a
vendor-agnostic fashion; rpioso is looking into Dell, while crushil
is looking into Lenovo.


Also finished are:

  • Post-deploy VIF attach/detach.

  • Physical network awareness.

Not finished

OSC default API version

We now issue a warning of no explicit version is provided to the CLI.
The next step will be to change the version to latest, but our current
definition of latest does not fit this goal really well. We use the latest
version known to the client, which will prevent it from working out-of-box
with older clouds. Instead, we need to finally implement API version
negotiation in ironicclient, and negotiate the latest version.

Reference architectures guide

There is one patch that lays out considerations that are going to be shared
between all proposed architectures. The use cases we would like to cover:

  • Admin-only provisioner (standalone architectures)

    • Small fleet and/or rare provisions.

      Here a non-HA architecture may be acceptable, and a noop or flat
      networking can be used.

    • Large fleet or frequent provisions.

      Here we will recommend HA and neutron networking. Noop networking is
      also acceptable.

  • Bare metal cloud for end users (with nova)

    • Smaller single-site cloud.

      Non-HA architecture and flat or noop networking is acceptable.
      Ironic conductors can live on OpenStack controller nodes.

    • Large single-site cloud.

      HA is required, and it is recommended to split ironic conductors with
      their TFTP/HTTP servers to separate machines. Neutron networking
      should be used, and thus compatible switches will be required, as well
      as their ML2 mechanism drivers.

      It is preferred to use virtual media instead of PXE/iPXE for deployment
      and cleaning, if supported by hardware. Otherwise, especially large
      clouds may consider splitting away TFTP servers.

    • Large multi-site cloud.

      The same as a single-site cloud plus using Cells v2.

Deploy steps

We agreed to continue this effort, even though the ansible deploy driver solves
some of its use cases. The crucial point is how to pass the requested deploy
steps parameters from a user to ironic. For a non-standalone case it means
passing them through nova.

In a discussion in the nova room we converged to an idea of introducing new
CRUD API for deploy templates (the exact name to be defined) on the ironic
side. Each such template will have a unique name and will correspond to a
deploy step and a set of arguments for it. On the nova side, a trait can
be requested with a name matching (in some sense) the name of a deploy
template. It will be passed to ironic, and ironic will apply the action,
specified in the template, during deployment.

The exact implementation and API will be defined in a spec, johnthetubaguy
is writing it.

Networking features

Routed network support is close to completion, we need to finish a patch for

The neutron event processing work is on a spec stage, but does not look
controversial for now.

We also have patches up for deprecating DHCP providers and for making our DHCP
code less dnsmasq-specific.

ironic-inspector HA

Preparation work is under way. We are making our PXE boot management
pluggable, with a new implementation on review that manages a dnsmasq
process directly, instead of changing iptables.

We seem to agree that rolling upgrades are not a priority for
ironic-inspector, as it's never hit via end users either directly or through
another service. It's a purely admin-only API, and admins can plan for a
potential outage.

There is a proposal to support ironic boot interfaces instead of a home-grown
implementation for boot management. The discussion of it launched a more
global discussion about ironic-inspector future, that continued the next day.

Just Do It

The following former priorities have all or the most of patches up for review,
and just require some attention:

  • Node tags

  • IPA API versioning

  • Rescue mode

  • Supported power states API

  • E-Tags in API

.. _public etherpads:
.. _Removing the classic drivers:

OpenStack goals status

We have not completed either of the two goals for the Pike cycle, and now we
have two more goals to complete. All four goals are relatively close to

Python 3

We have a non-voting integration job on ironic and a voting functional test
job on ironic-inspector. The missing steps are:

  • make the python 3 job voting on ironic
  • implement a job with IPA running on python 3 (blocked by pyudev weirdness)
  • create an integration job with python 3 for ironic-inspector (mostly blocked
    by swift, will have reduced coverage; an alternative is to try RadosGW)

Switching to uWSGI

Ironic standalone tests are running with mod_wsgi and voting, we only need to
switch to uWSGI.

For ironic-inspector it's much more complicated: it does not have a separate
API service for now at all. It's unclear if we'll able to just launch the
current service as it is behind a WSGI container, as we actively use green
threads. We have to probably wait until the HA work is done.

Splitting away the tempest plugin

We have a script to extract git history for a sub-tree. We need to create a
separate git repository somewhere, so that we do not submit 60-80 related
patches to zuul. Then this repository will be imported by the infra team, and
we'll proceed with the migration.

On the previous (ATL) PTG we decided to have ironic and ironic-inspector
plugins co-located. This will be less confusing for external users, as many of
them to not understand the difference clearly, but it will also complicate the

We will need to plan the actual migration in advance, and freeze the version
in-tree for some time.

Policy in the code

The ironic part is essentially done, we just need to change the way we
document policy:

No policy support exists in ironic-inspector, and it's unclear if this goal
assumes adding it. There is a desire to do so anyway.

Future development of our CI

Standalone tests

We have standalone tests voting, but we're not fully using their potential.
In the end, we want to reduce the number of non-standalone jobs to:

. a whole disk image job,

. a partition images job,

. a boot-from-volume job,

. a multi-node job with advanced networking (can be merged with one of the

first two),

. two grenade jobs: full and partial.

The following tests can likely be part of the standalone job:

  • tests for all combinations of disk types and deploy methods,
  • tests covering all community-supported drivers (snmp, redfish),
  • tests on different boot options (local vs network boot),
  • tests on root device hints (we plan to cover serial number, wwn and size
    with operators),
  • node adoption.

Take over testing

The take over feature is very important for our HA model, but is completely
untested. We discussed the two most important test cases:

. conductor failure during deployment with node in deploy wait,

. conductor failure for an active node using network boot.

We discussed two ways of implementing the test: using a multi-node job with two
conductors or using only one conductor. The latter requires a trick: after
killing the conductor, change its host name, so that it looks like a new
conductor. In either case, we can combine both tests into one run:

. start deploying two nodes with netboot:

#. ``driver=manual-management deploy_interface=iscsi``,
#. ``driver=manual-management deploy_interface=direct``,

The remaining steps will be repeated for both nodes.

. Wait for nodes provision_state becomes deploy wait.

. Kill the conductor.

. Manually clean up the files from the TFTP and HTTP directories and the

master image cache.

. Change the conductor host name in ironic.conf.

. Wait for directories to be populated again.

.. note:: We should aim to remove this step eventually.

. virsh start the nodes to continue their deployment.

. Wait for nodes to become active.

Here is where the second test starts:

. Repeat steps 3 - 6.

. virsh reboot the nodes.

. Check SSH connection to the rebooted instances.

In the future, we would also like to have negative tests on failed take over
for nodes in deploying. We should also have similar tests for cleaning.

Pike retrospective

We've had a short retrospective. Positive items:

  • Virtual midcycle
  • Weekly bug liaison (action: start doing it again),
  • Weekly priorities
  • Landed some big features
  • Acknowledge that vendors need more attention
  • Did not drive our PTL away :)

Not so positive:

  • Loss of people
  • Gate breakages (action: better hand off of current mitigation actions
    between timezones, report on IRC and the whiteboard what you've done and
    what's left)
  • Took too many priorities (action: take less, make the community understand
    that priorities != full backlog)
  • Still not enough attention to vendors (action: accept one patch per vendor
    as part of weekly priorities; the same for subteams)
  • Soft feature freeze
  • Need more folks reviewing (action: jlvillal considers picking up the
    weekly review call)
  • Releasing and cutting stable/pike was a mess (discussed in Release cycle_)
  • No alignment between OpenStack releases and vendor hardware releases.

Release cycle

We had really hard time releasing Pike. Grenade was branched before us,
essentially messing up our upgrade testing. We had to cut out stable/pike at a
random point, and then backport quite a few features, after repairing the CI.

When discussing that, we noted that we committed to releasing often and early,
but we'd never done it, at least not for ironic itself. Having regular
releases can help us avoiding getting overloaded in the end of the cycle.
We've decided:

  • Keep master as close to a releasable state as possible, including not
    exposing incomplete features to users and keeping release notes polished.
  • Release regularly, especially when we feel that something is ready to got
    out. Let us aim for releasing roughly once a month.
  • Let us cut stable/pike at the same time as the other projects. We will use
    the last released version as a basis for it.
  • We are going back to feature freeze at the same time as the other projects,
    two weeks before the branching at milestone 3. This will allow us to finish
    anything requiring finishing, particularly rolling upgrade preparation,
    documentation and release notes.

Nova virt driver API compatibility

Currently, we hardcode the required Bare Metal API microversion in our virt
driver. This introduces a hard dependency on a certain version of ironic, even
when it is not mandatory in reality, and enforces a particular upgrade order
between nova and ironic. For example, when we introduced boot-from-volume
support, we had to bump the required version, even though the feature itself
is optional. Cinder support, on the other hand, has multiple code paths
in nova, depending on which API version is available.

We would like to support the current and the previous versions of ironic in
the virt driver. For that we will need more advanced support for API
microversion negotiation in ironicclient. Currently it's only possible to
request one version during client creation. What we want to end up with is to
request the minimum version in getclient, and then provide an ability
to specify a version in each call. For example,

.. code-block:: python

 ir_client = ironicclient.get_client(session=session,
 nodes = ir_client.node.list()  # using 1.28
 ports = ir_client.port.list(os_ironic_api_version="1.34")  # overriding

Another idea was to allow specifying several versions in getclient. The
highest available version will be chosen and used for all calls:

.. code-block:: python

 ir_client = ironicclient.get_client(session=session,
                                     os_ironic_api_version=["1.28", "1.34"])
 if ir_client.negotiated_api_version == (1, 34):
     # do something

Nothing prevents us from implementing both, but the former seems to be what
the API SIG recommends (unofficially, dtantsur to follow up with a formal
guideline). It seems that we can reuse newly introduces version discovery
support from the keystoneauth1 library. TheJulia will look into it.

.. getclient:

What we consider a deploy?

We had a heated discussion on our deploy interfaces. Currently, the whole
business logic of provisioning, unprovisioning, taking over and cleaning nodes
is spread between the conductor and a deploy driver, with the deploy driver
containing the most of it. This ends up with a lot of duplication, and also
with vendor-specific deploy interfaces, which is something we would want to
avoid. It also ends up with a lot of conditionals in the deploy interfaces
code, as e.g. boot-from-volume does not need half of the actions.
A few options were considered without a clear winner:

. Move orchestration to the conductor, keep only image flashing logic in

deploy interfaces. This is arguably how we planned on using deploy
interfaces. But doing so would limit the ability of drivers to change how
deploy if orchestrated, if e.g. they need to change the order of some
operations or add a driver-specific operation in between of them.

. Create a new orchestration interface, keep only image flashing logic in

deploy interfaces. That will fix the problem with customization, but it
will complicate our interfaces matrix even further. And such change would
break all out-of-tree drivers with custom deploy interfaces.

. Do nothing and just try our best to clean up the duplication.

The last option is what we're going to do for Queens. Then we will re-evaluate
the remaining options.

Available clean steps API

We have currently no way to indicate which clean steps are available for which
node. Implementing such API is complicated by the fact that some clean steps
come from hardware interfaces, while some come from the ramdisk (at least for
IPA-based drivers). The exact API was discussed in the API SIG room, and then
later in the ironic room.

We agreed that clean steps need to be cached to make sure we can return them
in a synchronous GET request, like GET /v1/nodes/<UUID>/cleaning/steps
(the exact URI to be discussed in the spec). The caching itself will happen in
two cases:

. Implicitly on every cleaning

. Explicitly when a user requests manual cleaning without clean steps

A standard update_at field will be provided, so that users know when the
cached steps were last updated. rloo to follow up on the spec with it.

We decided to not take any actions to invalidate the cache for now.

Rethinking the vendor passthru API

Two problems were discussed:

. For dynamic drivers, the driver vendor passthru API only works with

the default *vendor* interface implementation

. No more support for mixing several vendor passthru implementations

For the first issue, we probably need to do the same thing as we plan to do
with driver properties: This does
not seem to be a high priority, so dtantsur will just file an RFE and
leave it there.

For the second issue, we don't have a clean solution now. It can be worked
around by changing node.vendor_interface on flight. pas-ha will
document it.

Future of bare metal scheduling

We have discussed the present and the future of scheduling bare metal
instances using nova. The discussion has started in the nova room and
continued in our room afterwards and on Friday.

Node availability

First, we discussed marking a node as unavailable for nova. Currently, when a
node is cleaning or otherwise unavailable, we set its resource classes count
to zero. This is, of course, hacky, and we want to get rid of it. I was
thinking about a new virt driver method to express availability, like

.. code-block:: python

 def is_operational(self, hostname):
     "Returns whether the host can be used for deployment."""

However, it was pointed out that ironic would probably be the only user of
such feature. Instead, it was proposed to use RESERVED field when
reporting resource classes. Indeed, cleaning can be treated as a temporary
reservation of the node by ironic for its internal business.

We will return RESERVED=0 when node is active or available. Otherwise,
RESERVED will equal to the total amount of reported resources (1
in case of a custom resource class). This will ensure that no resources are
available for scheduling without messing with the reported inventory.

Advanced configuration

Then we discussed means of passing from nova to ironic such information as
BIOS configuration or requested RAID layout. We agreed (again) that we don't
want nova to just pipe JSON blobs from a user to ironic. Instead, we will use
traits on the nova side and a new entity tentatively called deploy
on the ironic side.

A user will request a deploy template to be applied on a node by requesting
an appropriate trait. All matches traits will be passed from nova to ironic in
a similar way to how capabilities are passed now. Then ironic will fetch
deploy templates corresponding to traits and apply them.

The exact form of a deploy template is to be defined. A deploy template
will probably contain a deploy step name and its arguments. Thus, this work
will require the deploy steps work to be revived and finished.

johnthetubaguy will write specs on both topics.

Ownership of bare metal nodes

We want to allow nodes to be optionally owned by a particular tena^Wproject.
We discussed how to make the nova side work, with ironic still being the source
of truth for who owns which node. We decided that we can probably make it work
with traits as well.

Quantitative scheduling

Next, by request of some of the community members, we have discussed bringing
back the ability to use quantitative scheduling with bare metal instances.
We ended up with the same outcome as previously. Starting with Pike, bare
metal scheduling has to be done in terms of custom resource classes and
traits (ah, that magical traits!), and quantitative scheduling is not
coming back.

Inspection and resource classes

After the switch to resource classes, inspection is much less useful.
Previously the information it provided was enough for scheduling. Now we don't
care too much about CPU/memory/disk properties, but we do care about the
resource class. Essentially, inspection is only useful for discovering ports
and capabilities.

In-band inspection (using ironic-inspector) has a good work-around though: its
introspection rules (mini-DSL to run on the discovered data) can be used to
set the resource class based on logic provided by an operator. These rules are
part of the ironic-inspector API, and thus out-of-band inspection does not
benefit from them.

A potential solution is to move introspection rules API to ironic itself. That
would require agreeing on a common inventory format for both in-band and
out-of-band inspection. This is likely to be the IPA inventory format.
Then we'll have to change the inspect interface. Currently we have one call
that does the whole inspection process, we need a call that returns
an inventory. Then ironic itself will run introspection rules, create ports
and update properties and capabilities.

A big problem here is that the discovery process, implemented purely within
ironic-inspector, also heavily relies on introspection rules. We cannot
remove/deprecate the introspection rules API in ironic-inspector until this is
solved. The two API will have to co-exist for the time being. We should
probably put the mechanism behind introspection rules to ironic-lib.

sambetts plans to summarize a potential solution on the ML.

We also discussed potentially having the default resource class to use for new
nodes, if none is provided. That would simplify things for some consumers,
like TripleO. Another option is to generate a resource class based on some
template. We can even implement both:

.. code-block:: ini

 default_hardware_type = baremetal

results in baremetal resource class for new nodes, while

.. code-block:: ini

 inspected_hardware_type = bm-{memory_mb}-{cpus}-{cpu_arch}

results in a templated resource class to be set for inspected nodes that do
not have a resource class already set.

.. _IPA inventory format:

Future ironic-inspector architecture

The discussion in Inspection and resource classes_ brought us to an idea of
slowly merging most of ironic-inspector into ironic. Ironic will benefit by
receiving introspection rules and optional inventory storage, while
ironic-inspector will benefit from using the boot interface and from the
existing HA architecture. In the end, the only part remaining in a separate
project will be PXE handling for introspecting of nodes without ports and
for auto-discovery.

It's not clear how that will look. We could not discuss it in-depth, as a core
contributor (milan) was not able to come to the PTG. However, we have a
rough plan for the next steps:

. Implement optional support for using boot interfaces in the Inspector

*inspect* interface:

When discussing its technical details, we agreed that instead of having a
configuration option in ironic to force using a boot interface, we better
introduce a configuration option in ironic-inspector to completely disable
its boot management.

. Implement optional support for using network interfaces in the Inspector

*inspect* interface:

. Move introspection rules to ironic itself as discussed in `Inspection

and resource classes`_.

. Move the whole data processing to ironic and stop using ironic-inspector

when a boot interface has all required information.

The first item is planned for Queens, the second can fit as well. The timeline
for the other items is unclear. A separate call will be scheduled soon to
discuss this.

BIOS configuration

This feature has been discussed several times already. This time we came up
with a more or less solid plan to implement it in Queens.

  • We have confirmed the current plan to use clean steps for starting the
    configuration, similar how RAID already works. There will be two new clean
    steps: bios.apply_configuration and bios.factory_reset.

  • We discussed having a new BIOS interface versus introducing new methods on
    the management interface. We agreed that we want to allow mix-and-match of
    interfaces, e.g. using Redfish power with a vendor BIOS interface.

  • We also discussed the name of the new interface. While the name "BIOS" is
    not ideal, as some systems use UEFI and some don't even have a BIOS, we
    could not come up with a better proposal.

  • We will apply only very minimum validation to requested parameters.

Eventually, we will want to expose this feature as a deploy step as well.

A point of contention was how to display available BIOS configuration to a
user. Vendor representatives told us that available configurable parameters
may vary from node to node even within the same generation, so doing it
per-driver is not an option. We decided to go with the following approach:

  • Introduce a new API endpoint to return cached available parameters. The
    response will contain the standard updated_at field, informing a user
    when the cache was last updated.

  • The cache will be updated every time the configuration is changed via
    the clean steps mentioned above.

  • The cache will also be updated on moving a node from enroll to
    manageable provision states.

API for single request deploy

This idea has been in the air for really long time. Currently, a deployment
via the ironic API involves:

  • locking a node by setting instance_uuid,
  • attaching VIFs via the VIF API,
  • updating instance_info with a few fields,
  • requesting provision state active, providing a configdrive.

In addition to being not user-friendly, this complex procedure makes it harder
to configure policies in a way to allow a user to only deploy/undeploy nodes
and nothing else.

Essentially, three ideas where considered:

. Introduce a completely new API endpoint. This may complicate our already

quite complex API.

. Make working with the exising node more restful. For example, allow a PUT

request against a node updating both ``instance_uuid`` and
``instance_info``, and changing ``provision_state`` to ``active``.

It was noted, however, that directly changing ``provision_state`` is
confusing, as the result will not match it (the value of ``provision_state``
will become ``deploying``, not ``active``). This can be fixed by setting
``target_provision_state`` instead.

. Introduce a new deployment object and CRUD API associated with it. A UUID

of this object will replace ``instance_uuid``, while its body will contain
what we have in ``instance_info`` now. A deploy request would look like::

 POST /v1/deployments {'node_uuid': '...', 'root_gb': '...', 'config_drive': 


A request to undeploy will be just::

 DELETE /v1/deployments/<DEPLOY UUID>

Finally, and update of this object will cause a reprovision::

 PUT /v1/deployments/<DEPLOY UUID> {'config_drive': '...'}

This is also a restful option, which is also the hardest to implement.

We did not agree to implement any (or some) of these options. Instead,
pas-ha will look into possible policies adjustments to allow a non-admin
user to provision and unprovision instances. A definition of success is to be
able to switch nova to a non-admin user.

Bare metal instance HA

This session was dedicated to the proposal of implementing nova migrate
for bare metal instances: This spec
is against nova, and no ironic changes are expected.

The idea is to enable moving an instance from one ironic node to another,
assuming that any valuable data is stored only on remote volumes. We agreed
that in the cloud case local disks should not be treated as a reliable
persistent storage.

We discussed using nova migrate vs nova evacuate and decided that the
former probably will work better, as we won't mark a nove compute handling the
source node as down (it will bring down many more nodes). The only caveat is
that the users should not set any destination for the migration API call,
allowing nova to pick the destination itself.

Two more potential issues were spotted that need clarifying in the spec:

  • How to update hash ring? The compute services for ironic are organized in a
    hash ring, but once a node is provisioned, it is attached to a compute
    service. Probably just a database update is enough.

  • How exactly to replug VIFs.

A bonus point for implementing this feature will be support for resizing bare
metal instances, as migration is implemented as resizing without changing the

hshiina will update and clarify the spec.

Ansible deploy method

This was a short session. The proposed ansible deploy interface already
exists in ironic-staging-drivers and have a voting CI job. We are more or less
in agreement that we need it to satisfy cases requiring extensive

pas-ha presented a benchmark, showing that this method is only slightly
slower than the direct deploy method: A major optimization
would be calling ansible only once, when deploying several nodes, but
the current ironic architecture does not quite allow that.

Console log

We already have a support for serial console, so it feels natural to also
implement console log. Not everything, however, is obvious in the

First, we discussed the amount of data to store. The current proposal captures
the log indefinitely, which is not perfect. It looks like we can document
enabling logrotate to handle this problem outside of ironic. A mailing list
thread can be started to learn what people are using. In any case, we should
return only the last N KiB to nova, where N is to be defined.

Next, we discussed when exactly to start the logging. Logging during
cleaning/provisioning may be helpful, but can potentially expose sensitive
information to end users. We agreed to start logging on starting a provisioned

tiendc will update the spec with the outcome of this discussion.

Graphical console

This has been discussed several times already. We confirmed our plan to
introduce a new hardware interface - graphical_console_interface.
pas-ha will update the existing spec, as well as the implementation for
the idrac hardware type.

Queens priorities

This time we decided to take less priorities for the cycle, and make it clear
to the community that the priorities list is not our complete backlog.
That means, we will accept work that is not on the priorities list, so not
everything has to be fitted in it.

The list was finalized as a spec after the PTG:

OpenStack Development Mailing List (not for usage questions)
asked Oct 6, 2017 in openstack-dev by Dmitry_Tantsur (18,080 points)   2 5 10