settingsLogin | Registersettings

[openstack-dev] [ironic] ironic and traits

0 votes

Hi all,

I promised John to dump my thoughts on traits to the ML, so here we go :)

I see two roles of traits (or kinds of traits) for bare metal:
1. traits that say what the node can do already (e.g. "the node is
doing UEFI boot")
2. traits that say what the node can be configured to do (e.g. "the node can
boot in UEFI mode")

This seems confusing, but it's actually very useful. Say, I have a flavor that
requests UEFI boot via a trait. It will match both the nodes that are already in
UEFI mode, as well as nodes that can be put in UEFI mode.

This idea goes further with deploy templates (new concept we've been thinking
about). A flavor can request something like CUSTOMRAID5, and it will match the
nodes that already have RAID 5, or, more interestingly, the nodes on which we
can build RAID 5 before deployment. The UEFI example above can be treated in a
similar way.

This ends up with two sources of knowledge about traits in ironic:
1. Operators setting something they know about hardware ("this node is in UEFI
mode"),
2. Ironic drivers reporting something they
2.1. know about hardware ("this node is in UEFI mode" - again)
2.2. can do about hardware ("I can put this node in UEFI mode")

For case #1 we are planning on a new CRUD API to set/unset traits for a node.
Case #2 is more interesting. We have two options, I think:

a) Operators still set traits on nodes, drivers are simply validating them. E.g.
an operators sets CUSTOMRAID5, and the node's RAID interface checks if it is
possible to do. The downside is obvious - with a lot of deploy templates
available it can be a lot of manual work.

b) Drivers report the traits, and they get somehow added to the traits provided
by an operator. Technically, there are sub-cases again:
b.1) The new traits API returns a union of operator-provided and
driver-provided traits
b.2) The new traits API returns only operator-provided traits; driver-provided
traits are returned e.g. via a new field (node.driver_traits). Then nova will
have to merge the lists itself.

My personal favorite is the last option: I'd like a clear distinction between
different "sources" of traits, but I'd also like to reduce manual work for
operators.

A valid counter-argument is: what if an operator wants to override a
driver-provided trait? E.g. a node can do RAID 5, but I don't want this
particular node to do it for any reason. I'm not sure if it's a valid case, and
what to do about it.

Let me know what you think.

Dmitry


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Oct 23, 2017 in openstack-dev by Dmitry_Tantsur (18,080 points)   2 3 5

14 Responses

0 votes
  • Adding references to the specs: ironic side [1]; nova side [2] (which
    just merged).

  • Since Jay is on vacation, I'll tentatively note his vote by proxy [3]
    that ironic should be the source of truth - i.e. option (a). I think
    the upshot is that it's easier for Ironic to track and resolve conflicts
    than for the virt driver to do so.

The downside is obvious - with a lot of deploy templates
available it can be a lot of manual work.

  • How does option (b) help with this?

  • I suggested a way to maintain the "source" of a trait (operator,
    inspector, etc.) [4] which would help with resolving conflicts.
    However, I agree it would be better to avoid this extra complexity if
    possible.

  • This is slightly off topic, but it's related and will eventually need
    to be considered: How are you going to know whether a
    UEFI-capable-but-not-enabled node should have its UEFI mode turned on?
    Are you going to parse the traits specified in the flavor? (This might
    work for Ironic, but will be tough in the general case.)

[1] https://review.openstack.org/504531
[2] https://review.openstack.org/507052
[3]
https://review.openstack.org/#/c/507052/4/specs/queens/approved/ironic-traits.rst@88
[4]
https://review.openstack.org/#/c/504531/4/specs/approved/node-traits.rst@196

On 10/16/2017 11:24 AM, Dmitry Tantsur wrote:
Hi all,

I promised John to dump my thoughts on traits to the ML, so here we go :)

I see two roles of traits (or kinds of traits) for bare metal:
1. traits that say what the node can do already (e.g. "the node is
doing UEFI boot")
2. traits that say what the node can be configured to do (e.g. "the node can
boot in UEFI mode")

This seems confusing, but it's actually very useful. Say, I have a flavor that
requests UEFI boot via a trait. It will match both the nodes that are already in
UEFI mode, as well as nodes that can be put in UEFI mode.

This idea goes further with deploy templates (new concept we've been thinking
about). A flavor can request something like CUSTOMRAID5, and it will match the
nodes that already have RAID 5, or, more interestingly, the nodes on which we
can build RAID 5 before deployment. The UEFI example above can be treated in a
similar way.

This ends up with two sources of knowledge about traits in ironic:
1. Operators setting something they know about hardware ("this node is in UEFI
mode"),
2. Ironic drivers reporting something they
2.1. know about hardware ("this node is in UEFI mode" - again)
2.2. can do about hardware ("I can put this node in UEFI mode")

For case #1 we are planning on a new CRUD API to set/unset traits for a node.
Case #2 is more interesting. We have two options, I think:

a) Operators still set traits on nodes, drivers are simply validating them. E.g.
an operators sets CUSTOMRAID5, and the node's RAID interface checks if it is
possible to do. The downside is obvious - with a lot of deploy templates
available it can be a lot of manual work.

b) Drivers report the traits, and they get somehow added to the traits provided
by an operator. Technically, there are sub-cases again:
b.1) The new traits API returns a union of operator-provided and
driver-provided traits
b.2) The new traits API returns only operator-provided traits; driver-provided
traits are returned e.g. via a new field (node.driver_traits). Then nova will
have to merge the lists itself.

My personal favorite is the last option: I'd like a clear distinction between
different "sources" of traits, but I'd also like to reduce manual work for
operators.

A valid counter-argument is: what if an operator wants to override a
driver-provided trait? E.g. a node can do RAID 5, but I don't want this
particular node to do it for any reason. I'm not sure if it's a valid case, and
what to do about it.

Let me know what you think.

Dmitry


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 16, 2017 by Eric_Fried (1,900 points)   2
0 votes

On 16 October 2017 at 17:55, Eric Fried openstack@fried.cc wrote:

  • Adding references to the specs: ironic side [1]; nova side [2] (which
    just merged).

  • Since Jay is on vacation, I'll tentatively note his vote by proxy [3]
    that ironic should be the source of truth - i.e. option (a). I think
    the upshot is that it's easier for Ironic to track and resolve conflicts
    than for the virt driver to do so.

As I see it, all of these options have Ironic as the source of truth for
Nova.

Driver here is about the Ironic drivers, not Nova virt driver.

The downside is obvious - with a lot of deploy templates

available it can be a lot of manual work.

  • How does option (b) help with this?

The operator defines the configuration templates. The driver could then
report traits for any configuration templates that it knows it a given node
can support.

But I suspect a node would have to boot up an image to check if a given set
of RAID or BIOS parameters are valid. Is that correct? I am sure there are
way to cache things that could help somewhat.

  • I suggested a way to maintain the "source" of a trait (operator,
    inspector, etc.) [4] which would help with resolving conflicts.
    However, I agree it would be better to avoid this extra complexity if
    possible.

That is basically (b.2).

  • This is slightly off topic, but it's related and will eventually need
    to be considered: How are you going to know whether a
    UEFI-capable-but-not-enabled node should have its UEFI mode turned on?
    Are you going to parse the traits specified in the flavor? (This might
    work for Ironic, but will be tough in the general case.)

[1] https://review.openstack.org/504531

Also the other ironic spec: https://review.openstack.org/#/c/504952

[2] https://review.openstack.org/507052
[3]
https://review.openstack.org/#/c/507052/4/specs/queens/appro
ved/ironic-traits.rst@88
[4]
https://review.openstack.org/#/c/504531/4/specs/approved/nod
e-traits.rst@196

On 10/16/2017 11:24 AM, Dmitry Tantsur wrote:

Hi all,

I promised John to dump my thoughts on traits to the ML, so here we go :)

I see two roles of traits (or kinds of traits) for bare metal:
1. traits that say what the node can do already (e.g. "the node is
doing UEFI boot")
2. traits that say what the node can be configured to do (e.g. "the
node can
boot in UEFI mode")

This seems confusing, but it's actually very useful. Say, I have a
flavor that
requests UEFI boot via a trait. It will match both the nodes that are
already in
UEFI mode, as well as nodes that can be put in UEFI mode.

This idea goes further with deploy templates (new concept we've been
thinking
about). A flavor can request something like CUSTOMRAID5, and it will
match the
nodes that already have RAID 5, or, more interestingly, the nodes on
which we
can build RAID 5 before deployment. The UEFI example above can be
treated in a
similar way.

This ends up with two sources of knowledge about traits in ironic:
1. Operators setting something they know about hardware ("this node is
in UEFI
mode"),
2. Ironic drivers reporting something they
2.1. know about hardware ("this node is in UEFI mode" - again)
2.2. can do about hardware ("I can put this node in UEFI mode")

For case #1 we are planning on a new CRUD API to set/unset traits for a
node.
Case #2 is more interesting. We have two options, I think:

a) Operators still set traits on nodes, drivers are simply validating
them. E.g.
an operators sets CUSTOMRAID5, and the node's RAID interface checks if
it is
possible to do. The downside is obvious - with a lot of deploy templates
available it can be a lot of manual work.

b) Drivers report the traits, and they get somehow added to the traits
provided
by an operator. Technically, there are sub-cases again:
b.1) The new traits API returns a union of operator-provided and
driver-provided traits
b.2) The new traits API returns only operator-provided traits;
driver-provided
traits are returned e.g. via a new field (node.driver_traits). Then nova
will
have to merge the lists itself.

As an alternative, we could enable a configuration template by Resource
Class.
That way its explicit, but you don't have to set it on every node?

I think we would then need a version of (b.1) to report that extra trait up
to Nova, based on the given Resource Class.

My personal favorite is the last option: I'd like a clear distinction
between
different "sources" of traits, but I'd also like to reduce manual work
for
operators.

I am all for making an operators lives easier, but personally I lean
towards explicitly enabling things, hence my current preference for (a).

I would be tempted to add (b.2) as a second step, after we get (a) working
and tested.

A valid counter-argument is: what if an operator wants to override a

driver-provided trait? E.g. a node can do RAID 5, but I don't want this
particular node to do it for any reason. I'm not sure if it's a valid
case, and
what to do about it.

I could claim some horrid performance bug in a RAID controller might mean
you want that, but I am just making that up.

I was imagining for given set of nodes, you want to QA only a certain set
of RAID configs, and those are the ones you offer and that is a different
set to some other set of nodes, even if they could support the configs
supplied for the other nodes. Right now those restrictions will just map to
Nova flavors you create, but longer term that might cause problems. Maybe
it is to have six disks configured the same was as five disks, just with
one disk unused, maybe you don't want that?

I am curious, can we validate if the params are valid for RAID and BIOS
config without trying it out on a given host? How would we do that for all
nodes once a new configuration template is added?

Thanks,
John


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 16, 2017 by John_Garbutt (15,460 points)   3 4 5
0 votes

Hi!

Answering to both Eric and John inline.

On 10/16/2017 07:26 PM, John Garbutt wrote:
On 16 October 2017 at 17:55, Eric Fried <openstack@fried.cc
openstack@fried.cc> wrote:

* Adding references to the specs: ironic side [1]; nova side [2] (which
just merged).

* Since Jay is on vacation, I'll tentatively note his vote by proxy [3]
that ironic should be the source of truth - i.e. option (a).  I think
the upshot is that it's easier for Ironic to track and resolve conflicts
than for the virt driver to do so.

As I see it, all of these options have Ironic as the source of truth for Nova.

Driver here is about the Ironic drivers, not Nova virt driver.

This is correct, sorry for confusion.

The downside is obvious - with a lot of deploy templates

> available it can be a lot of manual work.

* How does option (b) help with this?

The operator defines the configuration templates. The driver could then report
traits for any configuration templates that it knows it a given node can support.

Yeah, this avoids explicit

openstack baremetal node trait set CUSTOMRAID5

for many nodes.

But I suspect a node would have to boot up an image to check if a given set of
RAID or BIOS parameters are valid. Is that correct? I am sure there are way to
cache things that could help somewhat.

BIOS - no. RAID - well, some drivers do RAID in-band, but I think we can only
leave driver-side validation here to simplify things.

* I suggested a way to maintain the "source" of a trait (operator,
inspector, etc.) [4] which would help with resolving conflicts.
However, I agree it would be better to avoid this extra complexity if
possible.

That is basically (b.2).

* This is slightly off topic, but it's related and will eventually need
to be considered: How are you going to know whether a
UEFI-capable-but-not-enabled node should have its UEFI mode turned on?
Are you going to parse the traits specified in the flavor?  (This might
work for Ironic, but will be tough in the general case.)

We have a nova spec approved for passing matches traits to ironic. Ironic then
will use them to figure out. Currently it works the same way with capabilities.

[1] https://review.openstack.org/504531 <https://review.openstack.org/504531>

Also the other ironic spec: https://review.openstack.org/#/c/504952

[2] https://review.openstack.org/507052 <https://review.openstack.org/507052>
[3]
https://review.openstack.org/#/c/507052/4/specs/queens/approved/ironic-traits.rst@88
<https://review.openstack.org/#/c/507052/4/specs/queens/approved/ironic-traits.rst@88>
[4]
https://review.openstack.org/#/c/504531/4/specs/approved/node-traits.rst@196
<https://review.openstack.org/#/c/504531/4/specs/approved/node-traits.rst@196>

On 10/16/2017 11:24 AM, Dmitry Tantsur wrote:
 > Hi all,
 >
 > I promised John to dump my thoughts on traits to the ML, so here we go :)
 >
 > I see two roles of traits (or kinds of traits) for bare metal:
 > 1. traits that say what the node can do already (e.g. "the node is
 > doing UEFI boot")
 > 2. traits that say what the node can be *configured* to do (e.g. "the
node can
 > boot in UEFI mode")
 >
 > This seems confusing, but it's actually very useful. Say, I have a flavor
that
 > requests UEFI boot via a trait. It will match both the nodes that are
already in
 > UEFI mode, as well as nodes that can be put in UEFI mode.
 >
 > This idea goes further with deploy templates (new concept we've been thinking
 > about). A flavor can request something like CUSTOM_RAID_5, and it will
match the
 > nodes that already have RAID 5, or, more interestingly, the nodes on which we
 > can build RAID 5 before deployment. The UEFI example above can be treated
in a
 > similar way.
 >
 > This ends up with two sources of knowledge about traits in ironic:
 > 1. Operators setting something they know about hardware ("this node is in
UEFI
 > mode"),
 > 2. Ironic drivers reporting something they
 >   2.1. know about hardware ("this node is in UEFI mode" - again)
 >   2.2. can do about hardware ("I can put this node in UEFI mode")
 >
 > For case #1 we are planning on a new CRUD API to set/unset traits for a node.
 > Case #2 is more interesting. We have two options, I think:
 >
 > a) Operators still set traits on nodes, drivers are simply validating
them. E.g.
 > an operators sets CUSTOM_RAID_5, and the node's RAID interface checks if
it is
 > possible to do. The downside is obvious - with a lot of deploy templates
 > available it can be a lot of manual work.
 >
 > b) Drivers report the traits, and they get somehow added to the traits
provided
 > by an operator. Technically, there are sub-cases again:
 >   b.1) The new traits API returns a union of operator-provided and
 > driver-provided traits
 >   b.2) The new traits API returns only operator-provided traits;
driver-provided
 > traits are returned e.g. via a new field (node.driver_traits). Then nova will
 > have to merge the lists itself.

As an alternative, we could enable a configuration template by Resource Class.
That way its explicit, but you don't have to set it on every node?

This assumes that every resource class corresponds to only one template. We
already have people upset by having only one resource class per node :)

I think we would then need a version of (b.1) to report that extra trait up to
Nova, based on the given Resource Class.

 > My personal favorite is the last option: I'd like a clear distinction between
 > different "sources" of traits, but I'd also like to reduce manual work for
 > operators.

I am all for making an operators lives easier, but personally I lean towards
explicitly enabling things, hence my current preference for (a).

This is certainly easier to implement.

I would be tempted to add (b.2) as a second step, after we get (a) working and
tested.

I'm not sure how it will work with both ways, to be honest..

 > A valid counter-argument is: what if an operator wants to override a
 > driver-provided trait? E.g. a node can do RAID 5, but I don't want this
 > particular node to do it for any reason. I'm not sure if it's a valid
case, and
 > what to do about it.

I could claim some horrid performance bug in a RAID controller might mean you
want that, but I am just making that up.

Well, it sounds like a very valid case actually.

I was imagining for given set of nodes, you want to QA only a certain set of
RAID configs, and those are the ones you offer and that is a different set to
some other set of nodes, even if they could support the configs supplied for the
other nodes. Right now those restrictions will just map to Nova flavors you
create, but longer term that might cause problems. Maybe it is to have six disks
configured the same was as five disks, just with one disk unused, maybe you
don't want that?

I think you've convinced me to go with an explicit case.

I am curious, can we validate if the params are valid for RAID and BIOS config
without trying it out on a given host? How would we do that for all nodes once a
new configuration template is added?

We can have only limited validation, and that's probably fine. I'm mostly
worried about operators adding CUSTOMRAID5 to e.g. a node whose driver does
not support RAID. Or RAID 5.

Thanks,
John


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 17, 2017 by Dmitry_Tantsur (18,080 points)   2 3 5
0 votes

Sorry for delay, took a week off before starting a new job. Comments inline.

On 10/16/2017 12:24 PM, Dmitry Tantsur wrote:
Hi all,

I promised John to dump my thoughts on traits to the ML, so here we go :)

I see two roles of traits (or kinds of traits) for bare metal:
1. traits that say what the node can do already (e.g. "the node is
doing UEFI boot")
2. traits that say what the node can be configured to do (e.g. "the node can
boot in UEFI mode")

There's only one role for traits. #2 above. #1 is state information.
Traits are not for state information. Traits are only for communicating
capabilities of a resource provider (baremetal node).

For example, let's say we add the following to the os-traits library [1]

  • STORAGERAID0
  • STORAGERAID1
  • STORAGERAID5
  • STORAGERAID6
  • STORAGERAID10

The Ironic administrator would add all RAID-related traits to the
baremetal nodes that had the capability of supporting that particular
RAID setup [2]

When provisioned, the baremetal node would either have RAID configured
in a certain level or not configured at all.

A very important note: the Placement API and Nova scheduler (or future
Ironic scheduler) doesn't care about this. At all. I know it sounds like
I'm being callous, but I'm not. Placement and scheduling doesn't care
about the state of things. It only cares about the capabilities of
target destinations. That's it.

This seems confusing, but it's actually very useful. Say, I have a flavor that
requests UEFI boot via a trait. It will match both the nodes that are already in
UEFI mode, as well as nodes that can be put in UEFI mode.

No :) It will only match nodes that have the UEFI capability. The set of
providers that have the ability to be booted via UEFI is always a
superset of the set of providers that have been booted via UEFI.
Placement and scheduling decisions only care about that superset -- the
providers with a particular capability.

This idea goes further with deploy templates (new concept we've been thinking
about). A flavor can request something like CUSTOMRAID5, and it will match the
nodes that already have RAID 5, or, more interestingly, the nodes on which we
can build RAID 5 before deployment. The UEFI example above can be treated in a
similar way.

This ends up with two sources of knowledge about traits in ironic:
1. Operators setting something they know about hardware ("this node is in UEFI
mode"),
2. Ironic drivers reporting something they
2.1. know about hardware ("this node is in UEFI mode" - again)
2.2. can do about hardware ("I can put this node in UEFI mode")

You're correct that both pieces of information are important. However,
only the "can do about hardware" part is relevant to Placement and Nova.

For case #1 we are planning on a new CRUD API to set/unset traits for a node.

I would strongly advise against this. Traits are not for state
information.

Instead, consider having a DB (or JSON) schema that lists state
information in fields that are explicitly for that state information.

For example, a schema that looks like this:

{
"boot": {
"mode": ,
"params":
},
"disk": {
"raid": {
"level": ,
"controller": ,
"driver": ,
"params":
}, ...
},
"network": {
...
}
}

etc, etc.

Don't use trait strings to represent state information.

Best,
-jay

Case #2 is more interesting. We have two options, I think:

a) Operators still set traits on nodes, drivers are simply validating them. E.g.
an operators sets CUSTOMRAID5, and the node's RAID interface checks if it is
possible to do. The downside is obvious - with a lot of deploy templates
available it can be a lot of manual work.

b) Drivers report the traits, and they get somehow added to the traits provided
by an operator. Technically, there are sub-cases again:
b.1) The new traits API returns a union of operator-provided and
driver-provided traits
b.2) The new traits API returns only operator-provided traits; driver-provided
traits are returned e.g. via a new field (node.driver_traits). Then nova will
have to merge the lists itself.

My personal favorite is the last option: I'd like a clear distinction between
different "sources" of traits, but I'd also like to reduce manual work for
operators.

A valid counter-argument is: what if an operator wants to override a
driver-provided trait? E.g. a node can do RAID 5, but I don't want this
particular node to do it for any reason. I'm not sure if it's a valid case, and
what to do about it.

Let me know what you think.

Dmitry

[1] http://git.openstack.org/cgit/openstack/os-traits/tree/
[2] Based on how many attached disks the node had, the presence and
abilities of a hardware RAID controller, etc


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 22, 2017 by Jay_Pipes (59,760 points)   3 10 13
0 votes

Hi Jay!

I appreciate your comments, but I think you're approaching the problem from
purely VM point of view. Things simply don't work the same way in bare
metal, at least not if we want to provide the same user experience.

On Sun, Oct 22, 2017 at 2:25 PM, Jay Pipes jaypipes@gmail.com wrote:

Sorry for delay, took a week off before starting a new job. Comments
inline.

On 10/16/2017 12:24 PM, Dmitry Tantsur wrote:

Hi all,

I promised John to dump my thoughts on traits to the ML, so here we go :)

I see two roles of traits (or kinds of traits) for bare metal:
1. traits that say what the node can do already (e.g. "the node is
doing UEFI boot")
2. traits that say what the node can be configured to do (e.g. "the
node can
boot in UEFI mode")

There's only one role for traits. #2 above. #1 is state information.
Traits are not for state information. Traits are only for communicating
capabilities of a resource provider (baremetal node).

These are not different, that's what I'm talking about here. No users care
about the difference between "this node was put in UEFI mode by an operator
in advance", "this node was put in UEFI mode by an ironic driver on demand"
and "this node is always in UEFI mode, because it's AARCH64 and it does not
have BIOS". These situation produce the same result (the node is booted in
UEFI mode), and thus it's up to ironic to hide this difference.

My suggestion with traits is one way to do it, I'm not sure what you
suggest though.

For example, let's say we add the following to the os-traits library [1]

  • STORAGERAID0
  • STORAGERAID1
  • STORAGERAID5
  • STORAGERAID6
  • STORAGERAID10

The Ironic administrator would add all RAID-related traits to the
baremetal nodes that had the capability of supporting that particular
RAID setup [2]

When provisioned, the baremetal node would either have RAID configured in
a certain level or not configured at all.

A very important note: the Placement API and Nova scheduler (or future
Ironic scheduler) doesn't care about this. At all. I know it sounds like
I'm being callous, but I'm not. Placement and scheduling doesn't care about
the state of things. It only cares about the capabilities of target
destinations. That's it.

Yes, because VMs always start with a clean state, and hypervisor is there
to ensure that. We don't have this luxury in ironic :) E.g. our SNMP driver
is not even aware of boot modes (or RAID, or BIOS configuration), which
does not mean that a node using it cannot be in UEFI mode (have a RAID or
BIOS pre-configured, etc, etc).

This seems confusing, but it's actually very useful. Say, I have a flavor

that
requests UEFI boot via a trait. It will match both the nodes that are
already in
UEFI mode, as well as nodes that can be put in UEFI mode.

No :) It will only match nodes that have the UEFI capability. The set of
providers that have the ability to be booted via UEFI is always a
superset of the set of providers that have been booted via UEFI.
Placement and scheduling decisions only care about that superset -- the
providers with a particular capability.

Well, no, it will. Again, you're purely basing on the VM idea, where a VM
is always put in UEFI mode, no matter how the hypervisor looks like. It
is simply not the case for us. You have to care what state the node is,
because many drivers cannot change this state.

This idea goes further with deploy templates (new concept we've been

thinking
about). A flavor can request something like CUSTOMRAID5, and it will
match the
nodes that already have RAID 5, or, more interestingly, the nodes on
which we
can build RAID 5 before deployment. The UEFI example above can be treated
in a
similar way.

This ends up with two sources of knowledge about traits in ironic:
1. Operators setting something they know about hardware ("this node is in
UEFI
mode"),
2. Ironic drivers reporting something they
2.1. know about hardware ("this node is in UEFI mode" - again)
2.2. can do about hardware ("I can put this node in UEFI mode")

You're correct that both pieces of information are important. However,
only the "can do about hardware" part is relevant to Placement and Nova.

For case #1 we are planning on a new CRUD API to set/unset traits for a

node.

I would strongly advise against this. Traits are not for state
information.

Instead, consider having a DB (or JSON) schema that lists state
information in fields that are explicitly for that state information.

For example, a schema that looks like this:

{
"boot": {
"mode": ,
"params":
},
"disk": {
"raid": {
"level": ,
"controller": ,
"driver": ,
"params":
}, ...
},
"network": {
...
}
}

etc, etc.

Don't use trait strings to represent state information.

I don't see an alternative proposal that will satisfy what we have to solve.

Best,
-jay

Case #2 is more interesting. We have two options, I think:

a) Operators still set traits on nodes, drivers are simply validating
them. E.g.
an operators sets CUSTOMRAID5, and the node's RAID interface checks if
it is
possible to do. The downside is obvious - with a lot of deploy templates
available it can be a lot of manual work.

b) Drivers report the traits, and they get somehow added to the traits
provided
by an operator. Technically, there are sub-cases again:
b.1) The new traits API returns a union of operator-provided and
driver-provided traits
b.2) The new traits API returns only operator-provided traits;
driver-provided
traits are returned e.g. via a new field (node.driver_traits). Then nova
will
have to merge the lists itself.

My personal favorite is the last option: I'd like a clear distinction
between
different "sources" of traits, but I'd also like to reduce manual work for
operators.

A valid counter-argument is: what if an operator wants to override a
driver-provided trait? E.g. a node can do RAID 5, but I don't want this
particular node to do it for any reason. I'm not sure if it's a valid
case, and
what to do about it.

Let me know what you think.

Dmitry

[1] http://git.openstack.org/cgit/openstack/os-traits/tree/
[2] Based on how many attached disks the node had, the presence and
abilities of a hardware RAID controller, etc


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 23, 2017 by Dmitry_Tantsur (18,080 points)   2 3 5
0 votes

Writing from my phone... May I ask that before you proceed with any plan
that uses traits for state information that we have a hangout or
videoconference to discuss this? Unfortunately today and tomorrow I'm not
able to do a hangout but I can do one on Wednesday any time of the day.

Lemme know!
-jay

On Oct 23, 2017 5:01 AM, "Dmitry Tantsur" dtantsur@redhat.com wrote:

Hi Jay!

I appreciate your comments, but I think you're approaching the problem
from purely VM point of view. Things simply don't work the same way in bare
metal, at least not if we want to provide the same user experience.

On Sun, Oct 22, 2017 at 2:25 PM, Jay Pipes jaypipes@gmail.com wrote:

Sorry for delay, took a week off before starting a new job. Comments
inline.

On 10/16/2017 12:24 PM, Dmitry Tantsur wrote:

Hi all,

I promised John to dump my thoughts on traits to the ML, so here we go :)

I see two roles of traits (or kinds of traits) for bare metal:
1. traits that say what the node can do already (e.g. "the node is
doing UEFI boot")
2. traits that say what the node can be configured to do (e.g. "the
node can
boot in UEFI mode")

There's only one role for traits. #2 above. #1 is state information.
Traits are not for state information. Traits are only for communicating
capabilities of a resource provider (baremetal node).

These are not different, that's what I'm talking about here. No users care
about the difference between "this node was put in UEFI mode by an operator
in advance", "this node was put in UEFI mode by an ironic driver on demand"
and "this node is always in UEFI mode, because it's AARCH64 and it does not
have BIOS". These situation produce the same result (the node is booted in
UEFI mode), and thus it's up to ironic to hide this difference.

My suggestion with traits is one way to do it, I'm not sure what you
suggest though.

For example, let's say we add the following to the os-traits library [1]

  • STORAGERAID0
  • STORAGERAID1
  • STORAGERAID5
  • STORAGERAID6
  • STORAGERAID10

The Ironic administrator would add all RAID-related traits to the
baremetal nodes that had the capability of supporting that particular
RAID setup [2]

When provisioned, the baremetal node would either have RAID configured in
a certain level or not configured at all.

A very important note: the Placement API and Nova scheduler (or future
Ironic scheduler) doesn't care about this. At all. I know it sounds like
I'm being callous, but I'm not. Placement and scheduling doesn't care about
the state of things. It only cares about the capabilities of target
destinations. That's it.

Yes, because VMs always start with a clean state, and hypervisor is there
to ensure that. We don't have this luxury in ironic :) E.g. our SNMP driver
is not even aware of boot modes (or RAID, or BIOS configuration), which
does not mean that a node using it cannot be in UEFI mode (have a RAID or
BIOS pre-configured, etc, etc).

This seems confusing, but it's actually very useful. Say, I have a flavor

that
requests UEFI boot via a trait. It will match both the nodes that are
already in
UEFI mode, as well as nodes that can be put in UEFI mode.

No :) It will only match nodes that have the UEFI capability. The set of
providers that have the ability to be booted via UEFI is always a
superset of the set of providers that have been booted via UEFI.
Placement and scheduling decisions only care about that superset -- the
providers with a particular capability.

Well, no, it will. Again, you're purely basing on the VM idea, where a VM
is always put in UEFI mode, no matter how the hypervisor looks like. It
is simply not the case for us. You have to care what state the node is,
because many drivers cannot change this state.

This idea goes further with deploy templates (new concept we've been

thinking
about). A flavor can request something like CUSTOMRAID5, and it will
match the
nodes that already have RAID 5, or, more interestingly, the nodes on
which we
can build RAID 5 before deployment. The UEFI example above can be
treated in a
similar way.

This ends up with two sources of knowledge about traits in ironic:
1. Operators setting something they know about hardware ("this node is
in UEFI
mode"),
2. Ironic drivers reporting something they
2.1. know about hardware ("this node is in UEFI mode" - again)
2.2. can do about hardware ("I can put this node in UEFI mode")

You're correct that both pieces of information are important. However,
only the "can do about hardware" part is relevant to Placement and Nova.

For case #1 we are planning on a new CRUD API to set/unset traits for a

node.

I would strongly advise against this. Traits are not for state
information.

Instead, consider having a DB (or JSON) schema that lists state
information in fields that are explicitly for that state information.

For example, a schema that looks like this:

{
"boot": {
"mode": ,
"params":
},
"disk": {
"raid": {
"level": ,
"controller": ,
"driver": ,
"params":
}, ...
},
"network": {
...
}
}

etc, etc.

Don't use trait strings to represent state information.

I don't see an alternative proposal that will satisfy what we have to
solve.

Best,
-jay

Case #2 is more interesting. We have two options, I think:

a) Operators still set traits on nodes, drivers are simply validating
them. E.g.
an operators sets CUSTOMRAID5, and the node's RAID interface checks if
it is
possible to do. The downside is obvious - with a lot of deploy templates
available it can be a lot of manual work.

b) Drivers report the traits, and they get somehow added to the traits
provided
by an operator. Technically, there are sub-cases again:
b.1) The new traits API returns a union of operator-provided and
driver-provided traits
b.2) The new traits API returns only operator-provided traits;
driver-provided
traits are returned e.g. via a new field (node.driver_traits). Then nova
will
have to merge the lists itself.

My personal favorite is the last option: I'd like a clear distinction
between
different "sources" of traits, but I'd also like to reduce manual work
for
operators.

A valid counter-argument is: what if an operator wants to override a
driver-provided trait? E.g. a node can do RAID 5, but I don't want this
particular node to do it for any reason. I'm not sure if it's a valid
case, and
what to do about it.

Let me know what you think.

Dmitry

[1] http://git.openstack.org/cgit/openstack/os-traits/tree/
[2] Based on how many attached disks the node had, the presence and
abilities of a hardware RAID controller, etc



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscrib
e
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 23, 2017 by Jay_Pipes (59,760 points)   3 10 13
0 votes

Actually, I was suggesting the same to John the other day :) I can throw a
doodle later today to pick the time.

On 10/23/2017 01:19 PM, Jay Pipes wrote:
Writing from my phone... May I ask that before you proceed with any plan that
uses traits for state information that we have a hangout or videoconference to
discuss this? Unfortunately today and tomorrow I'm not able to do a hangout but
I can do one on Wednesday any time of the day.

Lemme know!
-jay

On Oct 23, 2017 5:01 AM, "Dmitry Tantsur" <dtantsur@redhat.com
dtantsur@redhat.com> wrote:

Hi Jay!

I appreciate your comments, but I think you're approaching the problem from
purely VM point of view. Things simply don't work the same way in bare
metal, at least not if we want to provide the same user experience.

On Sun, Oct 22, 2017 at 2:25 PM, Jay Pipes <jaypipes@gmail.com
<mailto:jaypipes@gmail.com>> wrote:

    Sorry for delay, took a week off before starting a new job. Comments inline.

    On 10/16/2017 12:24 PM, Dmitry Tantsur wrote:

        Hi all,

        I promised John to dump my thoughts on traits to the ML, so here we
        go :)

        I see two roles of traits (or kinds of traits) for bare metal:
        1. traits that say what the node can do already (e.g. "the node is
        doing UEFI boot")
        2. traits that say what the node can be *configured* to do (e.g.
        "the node can
        boot in UEFI mode")


    There's only one role for traits. #2 above. #1 is state information.
    Traits are not for state information. Traits are only for communicating
    capabilities of a resource provider (baremetal node).


These are not different, that's what I'm talking about here. No users care
about the difference between "this node was put in UEFI mode by an operator
in advance", "this node was put in UEFI mode by an ironic driver on demand"
and "this node is always in UEFI mode, because it's AARCH64 and it does not
have BIOS". These situation produce the same result (the node is booted in
UEFI mode), and thus it's up to ironic to hide this difference.

My suggestion with traits is one way to do it, I'm not sure what you suggest
though.


    For example, let's say we add the following to the os-traits library [1]

    * STORAGE_RAID_0
    * STORAGE_RAID_1
    * STORAGE_RAID_5
    * STORAGE_RAID_6
    * STORAGE_RAID_10

    The Ironic administrator would add all RAID-related traits to the
    baremetal nodes that had the *capability* of supporting that particular
    RAID setup [2]

    When provisioned, the baremetal node would either have RAID configured
    in a certain level or not configured at all.


    A very important note: the Placement API and Nova scheduler (or future
    Ironic scheduler) doesn't care about this. At all. I know it sounds like
    I'm being callous, but I'm not. Placement and scheduling doesn't care
    about the state of things. It only cares about the capabilities of
    target destinations. That's it.


Yes, because VMs always start with a clean state, and hypervisor is there to
ensure that. We don't have this luxury in ironic :) E.g. our SNMP driver is
not even aware of boot modes (or RAID, or BIOS configuration), which does
not mean that a node using it cannot be in UEFI mode (have a RAID or BIOS
pre-configured, etc, etc).


        This seems confusing, but it's actually very useful. Say, I have a
        flavor that
        requests UEFI boot via a trait. It will match both the nodes that
        are already in
        UEFI mode, as well as nodes that can be put in UEFI mode.


    No :) It will only match nodes that have the UEFI capability. The set of
    providers that have the ability to be booted via UEFI is *always* a
    superset of the set of providers that *have been booted via UEFI*.
    Placement and scheduling decisions only care about that superset -- the
    providers with a particular capability.


Well, no, it will. Again, you're purely basing on the VM idea, where a VM is
always *put* in UEFI mode, no matter how the hypervisor looks like. It is
simply not the case for us. You have to care what state the node is, because
many drivers cannot change this state.


        This idea goes further with deploy templates (new concept we've been
        thinking
        about). A flavor can request something like CUSTOM_RAID_5, and it
        will match the
        nodes that already have RAID 5, or, more interestingly, the nodes on
        which we
        can build RAID 5 before deployment. The UEFI example above can be
        treated in a
        similar way.

        This ends up with two sources of knowledge about traits in ironic:
        1. Operators setting something they know about hardware ("this node
        is in UEFI
        mode"),
        2. Ironic drivers reporting something they
            2.1. know about hardware ("this node is in UEFI mode" - again)
            2.2. can do about hardware ("I can put this node in UEFI mode")


    You're correct that both pieces of information are important. However,
    only the "can do about hardware" part is relevant to Placement and Nova.

        For case #1 we are planning on a new CRUD API to set/unset traits
        for a node.


    I would *strongly* advise against this. Traits are not for state
    information.

    Instead, consider having a DB (or JSON) schema that lists state
    information in fields that are explicitly for that state information.

    For example, a schema that looks like this:

    {
       "boot": {
         "mode": <one of 'bios' or 'uefi'>,
         "params": <dict>
       },
       "disk": {
         "raid": {
           "level": <int>,
           "controller": <one of 'sw' or 'hw'>,
           "driver": <string>,
           "params": <dict>
         },  ...
       },
       "network": {
         ...
       }
    }

    etc, etc.

    Don't use trait strings to represent state information.


I don't see an alternative proposal that will satisfy what we have to solve.


    Best,
    -jay

        Case #2 is more interesting. We have two options, I think:

        a) Operators still set traits on nodes, drivers are simply
        validating them. E.g.
        an operators sets CUSTOM_RAID_5, and the node's RAID interface
        checks if it is
        possible to do. The downside is obvious - with a lot of deploy templates
        available it can be a lot of manual work.

        b) Drivers report the traits, and they get somehow added to the
        traits provided
        by an operator. Technically, there are sub-cases again:
            b.1) The new traits API returns a union of operator-provided and
        driver-provided traits
            b.2) The new traits API returns only operator-provided traits;
        driver-provided
        traits are returned e.g. via a new field (node.driver_traits). Then
        nova will
        have to merge the lists itself.

        My personal favorite is the last option: I'd like a clear
        distinction between
        different "sources" of traits, but I'd also like to reduce manual
        work for
        operators.

        A valid counter-argument is: what if an operator wants to override a
        driver-provided trait? E.g. a node can do RAID 5, but I don't want this
        particular node to do it for any reason. I'm not sure if it's a
        valid case, and
        what to do about it.

        Let me know what you think.

        Dmitry


    [1] http://git.openstack.org/cgit/openstack/os-traits/tree/
    
    [2] Based on how many attached disks the node had, the presence and
    abilities of a hardware RAID controller, etc


    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
    OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
    
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
    



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 23, 2017 by Dmitry_Tantsur (18,080 points)   2 3 5
0 votes

From: Jay Pipes [mailto:jaypipes@gmail.com]
Sent: Monday, October 23, 2017 12:20 PM
To: OpenStack Development Mailing List openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [ironic] ironic and traits

Writing from my phone... May I ask that before you proceed with any plan that uses traits for state information that we have a hangout or videoconference to discuss this? Unfortunately today and tomorrow I'm not able to do a hangout but I can do one on Wednesday any time of the day.

[Mooney, Sean K] on the uefi boot topic I did bring up at the ptg that we wanted to standardizes tratis for “verified boot”
that included a trait for uefi secure boot enabled and to indicated a hardware root of trust, e.g. intel boot guard or similar
we distinctly wanted to be able to tag nova compute hosts with those new traits so we could require that vms that request
a host with uefi secure boot enabled and a hardware root of trust are scheduled only to those nodes.

There are many other examples that effect both vms and bare metal such as, ecc/interleaved memory, cluster on die,
l3 cache code and data prioritization, vt-d/vt-c, HPET, Hyper threading, power states … all of these feature may be present on the platform
but I also need to know if they are turned on. Ruling out state in traits means all of this logic will eventually get pushed to scheduler filters
which will be suboptimal long term as more state is tracked. Software defined infrastructure may be the future but hardware defined software
is sadly the present…

I do however think there should be a sperateion between asking for a host that provides x with a trait and asking for x to be configure via
A trait. The trait securebootenabled should never result in the feature being enabled It should just find a host with it on. If you want
To request it to be turned on you would request a host with securebootcapable as a trait and have a flavor extra spec or image property to request
Ironic to enabled it. these are two very different request and should not be treated the same.

Lemme know!
-jay

On Oct 23, 2017 5:01 AM, "Dmitry Tantsur" dtantsur@redhat.com wrote:
Hi Jay!
I appreciate your comments, but I think you're approaching the problem from purely VM point of view. Things simply don't work the same way in bare metal, at least not if we want to provide the same user experience.

On Sun, Oct 22, 2017 at 2:25 PM, Jay Pipes jaypipes@gmail.com wrote:
Sorry for delay, took a week off before starting a new job. Comments inline.

On 10/16/2017 12:24 PM, Dmitry Tantsur wrote:
Hi all,

I promised John to dump my thoughts on traits to the ML, so here we go :)

I see two roles of traits (or kinds of traits) for bare metal:
1. traits that say what the node can do already (e.g. "the node is
doing UEFI boot")
2. traits that say what the node can be configured to do (e.g. "the node can
boot in UEFI mode")

There's only one role for traits. #2 above. #1 is state information. Traits are not for state information. Traits are only for communicating capabilities of a resource provider (baremetal node).

These are not different, that's what I'm talking about here. No users care about the difference between "this node was put in UEFI mode by an operator in advance", "this node was put in UEFI mode by an ironic driver on demand" and "this node is always in UEFI mode, because it's AARCH64 and it does not have BIOS". These situation produce the same result (the node is booted in UEFI mode), and thus it's up to ironic to hide this difference.

My suggestion with traits is one way to do it, I'm not sure what you suggest though.

For example, let's say we add the following to the os-traits library [1]

  • STORAGERAID0
  • STORAGERAID1
  • STORAGERAID5
  • STORAGERAID6
  • STORAGERAID10

The Ironic administrator would add all RAID-related traits to the baremetal nodes that had the capability of supporting that particular RAID setup [2]

When provisioned, the baremetal node would either have RAID configured in a certain level or not configured at all.

A very important note: the Placement API and Nova scheduler (or future Ironic scheduler) doesn't care about this. At all. I know it sounds like I'm being callous, but I'm not. Placement and scheduling doesn't care about the state of things. It only cares about the capabilities of target destinations. That's it.

Yes, because VMs always start with a clean state, and hypervisor is there to ensure that. We don't have this luxury in ironic :) E.g. our SNMP driver is not even aware of boot modes (or RAID, or BIOS configuration), which does not mean that a node using it cannot be in UEFI mode (have a RAID or BIOS pre-configured, etc, etc).

This seems confusing, but it's actually very useful. Say, I have a flavor that
requests UEFI boot via a trait. It will match both the nodes that are already in
UEFI mode, as well as nodes that can be put in UEFI mode.

No :) It will only match nodes that have the UEFI capability. The set of providers that have the ability to be booted via UEFI is always a superset of the set of providers that have been booted via UEFI. Placement and scheduling decisions only care about that superset -- the providers with a particular capability.

Well, no, it will. Again, you're purely basing on the VM idea, where a VM is always put in UEFI mode, no matter how the hypervisor looks like. It is simply not the case for us. You have to care what state the node is, because many drivers cannot change this state.

This idea goes further with deploy templates (new concept we've been thinking
about). A flavor can request something like CUSTOMRAID5, and it will match the
nodes that already have RAID 5, or, more interestingly, the nodes on which we
can build RAID 5 before deployment. The UEFI example above can be treated in a
similar way.

This ends up with two sources of knowledge about traits in ironic:
1. Operators setting something they know about hardware ("this node is in UEFI
mode"),
2. Ironic drivers reporting something they
2.1. know about hardware ("this node is in UEFI mode" - again)
2.2. can do about hardware ("I can put this node in UEFI mode")

You're correct that both pieces of information are important. However, only the "can do about hardware" part is relevant to Placement and Nova.
For case #1 we are planning on a new CRUD API to set/unset traits for a node.

I would strongly advise against this. Traits are not for state information.

Instead, consider having a DB (or JSON) schema that lists state information in fields that are explicitly for that state information.

For example, a schema that looks like this:

{
"boot": {
"mode": ,
"params":
},
"disk": {
"raid": {
"level": ,
"controller": ,
"driver": ,
"params":
}, ...
},
"network": {
...
}
}

etc, etc.

Don't use trait strings to represent state information.

I don't see an alternative proposal that will satisfy what we have to solve.

Best,
-jay
Case #2 is more interesting. We have two options, I think:

a) Operators still set traits on nodes, drivers are simply validating them. E.g.
an operators sets CUSTOMRAID5, and the node's RAID interface checks if it is
possible to do. The downside is obvious - with a lot of deploy templates
available it can be a lot of manual work.

b) Drivers report the traits, and they get somehow added to the traits provided
by an operator. Technically, there are sub-cases again:
b.1) The new traits API returns a union of operator-provided and
driver-provided traits
b.2) The new traits API returns only operator-provided traits; driver-provided
traits are returned e.g. via a new field (node.driver_traits). Then nova will
have to merge the lists itself.

My personal favorite is the last option: I'd like a clear distinction between
different "sources" of traits, but I'd also like to reduce manual work for
operators.

A valid counter-argument is: what if an operator wants to override a
driver-provided trait? E.g. a node can do RAID 5, but I don't want this
particular node to do it for any reason. I'm not sure if it's a valid case, and
what to do about it.

Let me know what you think.

Dmitry

[1] http://git.openstack.org/cgit/openstack/os-traits/tree/
[2] Based on how many attached disks the node had, the presence and abilities of a hardware RAID controller, etc


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 23, 2017 by Mooney,_Sean_K (3,580 points)   3 8
0 votes

I agree with Sean. In general terms:

  • A resource provider should be marked with a trait if that feature

    • Can be turned on or off (whether it's currently on or not); or
    • Is always on and can't ever be turned off.
  • A consumer wanting that feature present (doesn't matter whether it's
    on or off) should specify it as a required trait.
  • A consumer wanting that feature present and turned on should

    • Specify it as a required trait; AND
    • Indicate that it be turned on via some other mechanism (e.g. a
      separate extra_spec).

I believe this satisfies Dmitry's (Ironic's) needs, but also Jay's drive
for placement purity.

Please invite me to the hangout or whatever.

Thanks,
Eric

On 10/23/2017 07:22 AM, Mooney, Sean K wrote:
 

 

From:Jay Pipes [mailto:jaypipes@gmail.com]
Sent: Monday, October 23, 2017 12:20 PM
To: OpenStack Development Mailing List openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [ironic] ironic and traits

 

Writing from my phone... May I ask that before you proceed with any plan
that uses traits for state information that we have a hangout or
videoconference to discuss this? Unfortunately today and tomorrow I'm
not able to do a hangout but I can do one on Wednesday any time of the day.

 

/[Mooney, Sean K] on the uefi boot topic I did bring up at the ptg that
we wanted to standardizes tratis for “verified boot” /

/that included a trait for uefi secure boot enabled and to indicated a
hardware root of trust, e.g. intel boot guard or similar/

/we distinctly wanted to be able to tag nova compute hosts with those
new traits so we could require that vms that request/

/a host with uefi secure boot enabled and a hardware root of trust are
scheduled only to those nodes. /

/ /

/There are many other examples that effect both vms and bare metal such
as, ecc/interleaved memory, cluster on die, /

/l3 cache code and data prioritization, vt-d/vt-c, HPET, Hyper
threading, power states … all of these feature may be present on the
platform/

/but I also need to know if they are turned on. Ruling out state in
traits means all of this logic will eventually get pushed to scheduler
filters/

/which will be suboptimal long term as more state is tracked. Software
defined infrastructure may be the future but hardware defined software/

/is sadly the present…/

/ /

/I do however think there should be a sperateion between asking for a
host that provides x with a trait and  asking for x to be configure via/

/A trait. The trait securebootenabled should never result in the
feature being enabled It should just find a host with it on. If you want/

/To request it to be turned on you would request a host with
securebootcapable as a trait and have a flavor extra spec or image
property to request/

/Ironic to enabled it.  these are two very different request and should
not be treated the same. /

 

 

Lemme know!

-jay

 

On Oct 23, 2017 5:01 AM, "Dmitry Tantsur" <dtantsur@redhat.com
dtantsur@redhat.com> wrote:

Hi Jay!

I appreciate your comments, but I think you're approaching the
problem from purely VM point of view. Things simply don't work the
same way in bare metal, at least not if we want to provide the same
user experience.

 

On Sun, Oct 22, 2017 at 2:25 PM, Jay Pipes <jaypipes@gmail.com
<mailto:jaypipes@gmail.com>> wrote:

    Sorry for delay, took a week off before starting a new job.
    Comments inline.

    On 10/16/2017 12:24 PM, Dmitry Tantsur wrote:

        Hi all,

        I promised John to dump my thoughts on traits to the ML, so
        here we go :)

        I see two roles of traits (or kinds of traits) for bare metal:
        1. traits that say what the node can do already (e.g. "the
        node is
        doing UEFI boot")
        2. traits that say what the node can be *configured* to do
        (e.g. "the node can
        boot in UEFI mode")


    There's only one role for traits. #2 above. #1 is state
    information. Traits are not for state information. Traits are
    only for communicating capabilities of a resource provider
    (baremetal node).

 

These are not different, that's what I'm talking about here. No
users care about the difference between "this node was put in UEFI
mode by an operator in advance", "this node was put in UEFI mode by
an ironic driver on demand" and "this node is always in UEFI mode,
because it's AARCH64 and it does not have BIOS". These situation
produce the same result (the node is booted in UEFI mode), and thus
it's up to ironic to hide this difference.

 

My suggestion with traits is one way to do it, I'm not sure what you
suggest though.

 


    For example, let's say we add the following to the os-traits
    library [1]

    * STORAGE_RAID_0
    * STORAGE_RAID_1
    * STORAGE_RAID_5
    * STORAGE_RAID_6
    * STORAGE_RAID_10

    The Ironic administrator would add all RAID-related traits to
    the baremetal nodes that had the *capability* of supporting that
    particular RAID setup [2]

    When provisioned, the baremetal node would either have RAID
    configured in a certain level or not configured at all.


    A very important note: the Placement API and Nova scheduler (or
    future Ironic scheduler) doesn't care about this. At all. I know
    it sounds like I'm being callous, but I'm not. Placement and
    scheduling doesn't care about the state of things. It only cares
    about the capabilities of target destinations. That's it.

 

Yes, because VMs always start with a clean state, and hypervisor is
there to ensure that. We don't have this luxury in ironic :) E.g.
our SNMP driver is not even aware of boot modes (or RAID, or BIOS
configuration), which does not mean that a node using it cannot be
in UEFI mode (have a RAID or BIOS pre-configured, etc, etc).

 

     

        This seems confusing, but it's actually very useful. Say, I
        have a flavor that
        requests UEFI boot via a trait. It will match both the nodes
        that are already in
        UEFI mode, as well as nodes that can be put in UEFI mode.


    No :) It will only match nodes that have the UEFI capability.
    The set of providers that have the ability to be booted via UEFI
    is *always* a superset of the set of providers that *have been
    booted via UEFI*. Placement and scheduling decisions only care
    about that superset -- the providers with a particular capability.

 

Well, no, it will. Again, you're purely basing on the VM idea, where
a VM is always *put* in UEFI mode, no matter how the hypervisor
looks like. It is simply not the case for us. You have to care what
state the node is, because many drivers cannot change this state.

 

     

        This idea goes further with deploy templates (new concept
        we've been thinking
        about). A flavor can request something like CUSTOM_RAID_5,
        and it will match the
        nodes that already have RAID 5, or, more interestingly, the
        nodes on which we
        can build RAID 5 before deployment. The UEFI example above
        can be treated in a
        similar way.

        This ends up with two sources of knowledge about traits in
        ironic:
        1. Operators setting something they know about hardware
        ("this node is in UEFI
        mode"),
        2. Ironic drivers reporting something they
           2.1. know about hardware ("this node is in UEFI mode" -
        again)
           2.2. can do about hardware ("I can put this node in UEFI
        mode")


    You're correct that both pieces of information are important.
    However, only the "can do about hardware" part is relevant to
    Placement and Nova.

        For case #1 we are planning on a new CRUD API to set/unset
        traits for a node.


    I would *strongly* advise against this. Traits are not for state
    information.

    Instead, consider having a DB (or JSON) schema that lists state
    information in fields that are explicitly for that state
    information.

    For example, a schema that looks like this:

    {
      "boot": {
        "mode": <one of 'bios' or 'uefi'>,
        "params": <dict>
      },
      "disk": {
        "raid": {
          "level": <int>,
          "controller": <one of 'sw' or 'hw'>,
          "driver": <string>,
          "params": <dict>
        },  ...
      },
      "network": {
        ...
      }
    }

    etc, etc.

    Don't use trait strings to represent state information.

 

I don't see an alternative proposal that will satisfy what we have
to solve.

 


    Best,
    -jay

        Case #2 is more interesting. We have two options, I think:

        a) Operators still set traits on nodes, drivers are simply
        validating them. E.g.
        an operators sets CUSTOM_RAID_5, and the node's RAID
        interface checks if it is
        possible to do. The downside is obvious - with a lot of
        deploy templates
        available it can be a lot of manual work.

        b) Drivers report the traits, and they get somehow added to
        the traits provided
        by an operator. Technically, there are sub-cases again:
           b.1) The new traits API returns a union of
        operator-provided and
        driver-provided traits
           b.2) The new traits API returns only operator-provided
        traits; driver-provided
        traits are returned e.g. via a new field
        (node.driver_traits). Then nova will
        have to merge the lists itself.

        My personal favorite is the last option: I'd like a clear
        distinction between
        different "sources" of traits, but I'd also like to reduce
        manual work for
        operators.

        A valid counter-argument is: what if an operator wants to
        override a
        driver-provided trait? E.g. a node can do RAID 5, but I
        don't want this
        particular node to do it for any reason. I'm not sure if
        it's a valid case, and
        what to do about it.

        Let me know what you think.

        Dmitry


    [1] http://git.openstack.org/cgit/openstack/os-traits/tree/
    [2] Based on how many attached disks the node had, the presence
    and abilities of a hardware RAID controller, etc



    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
    OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
    
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 23, 2017 by Eric_Fried (1,900 points)   2
0 votes

On Mon, Oct 23, 2017 at 2:54 PM, Eric Fried openstack@fried.cc wrote:

I agree with Sean. In general terms:

  • A resource provider should be marked with a trait if that feature

    • Can be turned on or off (whether it's currently on or not); or
    • Is always on and can't ever be turned off.

No, traits are not boolean. If a resource provider stops providing a
capability, then the existing related trait should just be removed, that's
it.
If you see a trait, that's just means that the related capability for the
Resource Provider is supported, that's it too.

MHO.

-Sylvain

  • A consumer wanting that feature present (doesn't matter whether it's
    on or off) should specify it as a required trait.
  • A consumer wanting that feature present and turned on should

    • Specify it as a required trait; AND
    • Indicate that it be turned on via some other mechanism (e.g. a
      separate extra_spec).

I believe this satisfies Dmitry's (Ironic's) needs, but also Jay's drive
for placement purity.

Please invite me to the hangout or whatever.

Thanks,
Eric

On 10/23/2017 07:22 AM, Mooney, Sean K wrote:

From:Jay Pipes [mailto:jaypipes@gmail.com]
Sent: Monday, October 23, 2017 12:20 PM
To: OpenStack Development Mailing List <openstack-dev@lists.
openstack.org>
Subject: Re: [openstack-dev] [ironic] ironic and traits

Writing from my phone... May I ask that before you proceed with any plan
that uses traits for state information that we have a hangout or
videoconference to discuss this? Unfortunately today and tomorrow I'm
not able to do a hangout but I can do one on Wednesday any time of the
day.

/[Mooney, Sean K] on the uefi boot topic I did bring up at the ptg that
we wanted to standardizes tratis for “verified boot” /

/that included a trait for uefi secure boot enabled and to indicated a
hardware root of trust, e.g. intel boot guard or similar/

/we distinctly wanted to be able to tag nova compute hosts with those
new traits so we could require that vms that request/

/a host with uefi secure boot enabled and a hardware root of trust are
scheduled only to those nodes. /

/ /

/There are many other examples that effect both vms and bare metal such
as, ecc/interleaved memory, cluster on die, /

/l3 cache code and data prioritization, vt-d/vt-c, HPET, Hyper
threading, power states … all of these feature may be present on the
platform/

/but I also need to know if they are turned on. Ruling out state in
traits means all of this logic will eventually get pushed to scheduler
filters/

/which will be suboptimal long term as more state is tracked. Software
defined infrastructure may be the future but hardware defined software/

/is sadly the present…/

/ /

/I do however think there should be a sperateion between asking for a
host that provides x with a trait and asking for x to be configure via/

/A trait. The trait securebootenabled should never result in the
feature being enabled It should just find a host with it on. If you
want/

/To request it to be turned on you would request a host with
securebootcapable as a trait and have a flavor extra spec or image
property to request/

/Ironic to enabled it. these are two very different request and should
not be treated the same. /

Lemme know!

-jay

On Oct 23, 2017 5:01 AM, "Dmitry Tantsur" <dtantsur@redhat.com
dtantsur@redhat.com> wrote:

Hi Jay!

I appreciate your comments, but I think you're approaching the
problem from purely VM point of view. Things simply don't work the
same way in bare metal, at least not if we want to provide the same
user experience.



On Sun, Oct 22, 2017 at 2:25 PM, Jay Pipes <jaypipes@gmail.com
<mailto:jaypipes@gmail.com>> wrote:

    Sorry for delay, took a week off before starting a new job.
    Comments inline.

    On 10/16/2017 12:24 PM, Dmitry Tantsur wrote:

        Hi all,

        I promised John to dump my thoughts on traits to the ML, so
        here we go :)

        I see two roles of traits (or kinds of traits) for bare

metal:
1. traits that say what the node can do already (e.g. "the
node is
doing UEFI boot")
2. traits that say what the node can be configured to do
(e.g. "the node can
boot in UEFI mode")

    There's only one role for traits. #2 above. #1 is state
    information. Traits are not for state information. Traits are
    only for communicating capabilities of a resource provider
    (baremetal node).



These are not different, that's what I'm talking about here. No
users care about the difference between "this node was put in UEFI
mode by an operator in advance", "this node was put in UEFI mode by
an ironic driver on demand" and "this node is always in UEFI mode,
because it's AARCH64 and it does not have BIOS". These situation
produce the same result (the node is booted in UEFI mode), and thus
it's up to ironic to hide this difference.



My suggestion with traits is one way to do it, I'm not sure what you
suggest though.




    For example, let's say we add the following to the os-traits
    library [1]

    * STORAGE_RAID_0
    * STORAGE_RAID_1
    * STORAGE_RAID_5
    * STORAGE_RAID_6
    * STORAGE_RAID_10

    The Ironic administrator would add all RAID-related traits to
    the baremetal nodes that had the *capability* of supporting that
    particular RAID setup [2]

    When provisioned, the baremetal node would either have RAID
    configured in a certain level or not configured at all.


    A very important note: the Placement API and Nova scheduler (or
    future Ironic scheduler) doesn't care about this. At all. I know
    it sounds like I'm being callous, but I'm not. Placement and
    scheduling doesn't care about the state of things. It only cares
    about the capabilities of target destinations. That's it.



Yes, because VMs always start with a clean state, and hypervisor is
there to ensure that. We don't have this luxury in ironic :) E.g.
our SNMP driver is not even aware of boot modes (or RAID, or BIOS
configuration), which does not mean that a node using it cannot be
in UEFI mode (have a RAID or BIOS pre-configured, etc, etc).





        This seems confusing, but it's actually very useful. Say, I
        have a flavor that
        requests UEFI boot via a trait. It will match both the nodes
        that are already in
        UEFI mode, as well as nodes that can be put in UEFI mode.


    No :) It will only match nodes that have the UEFI capability.
    The set of providers that have the ability to be booted via UEFI
    is *always* a superset of the set of providers that *have been
    booted via UEFI*. Placement and scheduling decisions only care
    about that superset -- the providers with a particular

capability.

Well, no, it will. Again, you're purely basing on the VM idea, where
a VM is always *put* in UEFI mode, no matter how the hypervisor
looks like. It is simply not the case for us. You have to care what
state the node is, because many drivers cannot change this state.





        This idea goes further with deploy templates (new concept
        we've been thinking
        about). A flavor can request something like CUSTOM_RAID_5,
        and it will match the
        nodes that already have RAID 5, or, more interestingly, the
        nodes on which we
        can build RAID 5 before deployment. The UEFI example above
        can be treated in a
        similar way.

        This ends up with two sources of knowledge about traits in
        ironic:
        1. Operators setting something they know about hardware
        ("this node is in UEFI
        mode"),
        2. Ironic drivers reporting something they
           2.1. know about hardware ("this node is in UEFI mode" -
        again)
           2.2. can do about hardware ("I can put this node in UEFI
        mode")


    You're correct that both pieces of information are important.
    However, only the "can do about hardware" part is relevant to
    Placement and Nova.

        For case #1 we are planning on a new CRUD API to set/unset
        traits for a node.


    I would *strongly* advise against this. Traits are not for state
    information.

    Instead, consider having a DB (or JSON) schema that lists state
    information in fields that are explicitly for that state
    information.

    For example, a schema that looks like this:

    {
      "boot": {
        "mode": <one of 'bios' or 'uefi'>,
        "params": <dict>
      },
      "disk": {
        "raid": {
          "level": <int>,
          "controller": <one of 'sw' or 'hw'>,
          "driver": <string>,
          "params": <dict>
        },  ...
      },
      "network": {
        ...
      }
    }

    etc, etc.

    Don't use trait strings to represent state information.



I don't see an alternative proposal that will satisfy what we have
to solve.




    Best,
    -jay

        Case #2 is more interesting. We have two options, I think:

        a) Operators still set traits on nodes, drivers are simply
        validating them. E.g.
        an operators sets CUSTOM_RAID_5, and the node's RAID
        interface checks if it is
        possible to do. The downside is obvious - with a lot of
        deploy templates
        available it can be a lot of manual work.

        b) Drivers report the traits, and they get somehow added to
        the traits provided
        by an operator. Technically, there are sub-cases again:
           b.1) The new traits API returns a union of
        operator-provided and
        driver-provided traits
           b.2) The new traits API returns only operator-provided
        traits; driver-provided
        traits are returned e.g. via a new field
        (node.driver_traits). Then nova will
        have to merge the lists itself.

        My personal favorite is the last option: I'd like a clear
        distinction between
        different "sources" of traits, but I'd also like to reduce
        manual work for
        operators.

        A valid counter-argument is: what if an operator wants to
        override a
        driver-provided trait? E.g. a node can do RAID 5, but I
        don't want this
        particular node to do it for any reason. I'm not sure if
        it's a valid case, and
        what to do about it.

        Let me know what you think.

        Dmitry


    [1] http://git.openstack.org/cgit/openstack/os-traits/tree/
    [2] Based on how many attached disks the node had, the presence
    and abilities of a hardware RAID controller, etc



    ____________________________________________________________


OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/
openstack-dev
____________________________________________________________


OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:
unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 23, 2017 by Sylvain_Bauza (14,100 points)   1 3 4
...