settingsLogin | Registersettings

[openstack-dev] [TripleO][Heat] Selectively disabling deployment resources

0 votes

I've been working on this spec for TripleO:
https://review.openstack.org/#/c/431745/

which allows users to selectively disable Heat deployment resources
for a given server (or server in the case of a *DeloymentGroup
resource).

Some of the main use cases in TripleO for such a feature are scaling
out compute nodes where you do not need to rerun Puppet (or make any
changes at all) on non-compute nodes, or to exclude nodes from hanging
a stack-update if you know they are unreachable or degraded for some
reason. There are others, but those are 2 of the major use cases.

I started by taking an approach that would be specific to TripleO.
Basically mapping all the deployment resources to a nested stack
containing the logic to selectively disable servers from the
deployment (using yaql) based on a provided parameter value. Here's
the main patch: https://review.openstack.org/#/c/442681/

After considering that complexity, particularly the yaql expression,
I'm wondering if it would be better to add this support natively to
Heat.

I was looking at the restrictedactions key in the resourceregistry
and was thinking this might be a reasonable place to add such support.
It would require some changes to how restricted_actions work.

One change would be a method for specifying that restrictedactions
should not fail the stack operation if an action would have otherwise
been triggered. Currently the behavior is to raise an exception and
mark the stack failed if an action needs to be taken but has been
marked restricted. That would need to be tweaked to allow specifying
that that we don't want the stack to fail. One thought would be to
change the allowed values of restricted
actions to:

replacefail
replace
ignore
updatefail
update
ignore
replace
update

where replace and update were synonyms for replacefail/updatefail to
maintain backwards compatibility.

Another change would be to add logic to the Deployment resources
themselves to consider if any restricted_actions have been set on an
Server resources before triggering an updated deployment for a given
server.

It also might be nice to allow specifying restrictedactions on the
server's name property (which typically is the hostname) instead of
having to use the resource name. The reason being is that it is not
really feasibly to expect operators/users to have to represent the
full nested
stack structure in their resourceregistry. They would
have to query and record nested
stack names just to refer to a given
server resource. Each ResourceGroup nested stack would be have to be
individually represented, etc. Unless there is another way I'm
overlooking.

Whether or not the restricted_actions approach is taken, is Heat
interested in this functionality natively? I think it would make for a
much cleaner implementation than something TripleO specific. I can
work on a Heat spec if there's interest, though I'd like to get some
early feedback.

Thanks.

--
-- James Slagle
--


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Mar 8, 2017 in openstack-dev by James_Slagle (7,000 points)   1 3 4

5 Responses

0 votes

On 07/03/17 14:34, James Slagle wrote:
I've been working on this spec for TripleO:
https://review.openstack.org/#/c/431745/

which allows users to selectively disable Heat deployment resources
for a given server (or server in the case of a *DeloymentGroup
resource).

I'm not completely clear on what this means. You can selectively disable
resources with conditionals. But I think you mean that you want to
selectively disable changes to resources?

Some of the main use cases in TripleO for such a feature are scaling
out compute nodes where you do not need to rerun Puppet (or make any
changes at all) on non-compute nodes, or to exclude nodes from hanging
a stack-update if you know they are unreachable or degraded for some
reason. There are others, but those are 2 of the major use cases.

I think you're running up against a limitation of the scaling group
implementation in Heat. In AWS Autoscaling, you have a LaunchConfig
associated with a group that is used when scaling up to create new
members, but existing members are not changed when you specify a new
LaunchConfig unless you also specifically include a rolling update
UpdatePolicy. (That isn't a great interface in CloudFormation, but it
works and I can't actually think of anything better.)

Heat's AWS-style resources work similarly. Heat's native autoscaling
group resources don't have a separate LaunchConfig, and although they
used to work similarly to the AWS ones with respect to when they would
update existing members, IIRC somebody decided that was a "bug" and
"fixed" it.

In any event, TripleO uses ResourceGroup, and the very existence of
ResourceGroup is predicated on the idea that you can just generate the
nested template by making copies of the inline resource definition -
that is, the idea that you'll never need this feature which it turns
out you do, in fact, need. TripleO can't move away from ResourceGroup
because it relies on it to auto-assign pre-chosen names for specific
servers.

Senlin, for the record, gets this right.

I started by taking an approach that would be specific to TripleO.
Basically mapping all the deployment resources to a nested stack
containing the logic to selectively disable servers from the
deployment (using yaql) based on a provided parameter value. Here's
the main patch: https://review.openstack.org/#/c/442681/

After considering that complexity, particularly the yaql expression,
I'm wondering if it would be better to add this support natively to
Heat.

I was looking at the restrictedactions key in the resourceregistry
and was thinking this might be a reasonable place to add such support.
It would require some changes to how restricted_actions work.

One change would be a method for specifying that restrictedactions
should not fail the stack operation if an action would have otherwise
been triggered. Currently the behavior is to raise an exception and
mark the stack failed if an action needs to be taken but has been
marked restricted. That would need to be tweaked to allow specifying
that that we don't want the stack to fail. One thought would be to
change the allowed values of restricted
actions to:

replacefail
replace
ignore
updatefail
update
ignore
replace
update

where replace and update were synonyms for replacefail/updatefail to
maintain backwards compatibility.

Anything that involves the resource definition in the template changing
but Heat not modifying the resource is problematic, because that messes
with Heat's internal bookkeeping.

Another change would be to add logic to the Deployment resources
themselves to consider if any restricted_actions have been set on an
Server resources before triggering an updated deployment for a given
server.

Why not just a property, "nonewdeployments_please: true"?

It also might be nice to allow specifying restrictedactions on the
server's name property (which typically is the hostname) instead of
having to use the resource name. The reason being is that it is not
really feasibly to expect operators/users to have to represent the
full nested
stack structure in their resourceregistry. They would
have to query and record nested
stack names just to refer to a given
server resource. Each ResourceGroup nested stack would be have to be
individually represented, etc. Unless there is another way I'm
overlooking.

Whether or not the restricted_actions approach is taken, is Heat
interested in this functionality natively? I think it would make for a
much cleaner implementation than something TripleO specific. I can
work on a Heat spec if there's interest, though I'd like to get some
early feedback.

Thanks.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Mar 8, 2017 by Zane_Bitter (21,640 points)   4 6 11
0 votes

On Tue, Mar 07, 2017 at 02:34:50PM -0500, James Slagle wrote:
I've been working on this spec for TripleO:
https://review.openstack.org/#/c/431745/

which allows users to selectively disable Heat deployment resources
for a given server (or server in the case of a *DeloymentGroup
resource).

Some of the main use cases in TripleO for such a feature are scaling
out compute nodes where you do not need to rerun Puppet (or make any
changes at all) on non-compute nodes, or to exclude nodes from hanging
a stack-update if you know they are unreachable or degraded for some
reason. There are others, but those are 2 of the major use cases.

Thanks for raising this, I know it's been a pain point for some users of
TripleO.

However I think we're conflating two different issues here:

  1. Don't re-run puppet (or yum update) when no other changes have happened

  2. Disable deployment resources when changes have happened

(1) is actually very simple, and is the default behavior of Heat
(SoftwareDeployment resources never update unless either the config
referenced or the input_values change). We just need to provide an option
to disable the DeployIdentifier/UpdateIdentifier timestamps from being
generated in tripleoclient.

(2) is harder, because the whole point of SoftwareDeploymentGroup is to run
the exact same configuration on a group of servers, with no exceptions.

As Zane mentions (2) is related to the way ResourceGroup works, but the
problem here isn't ResourceGroup per-se, as it would in theory be pretty
easy to reimplement SoftwareDeploymentGroup to generate it's nested stack
without inheriting from ResourceGroup (which may be needed if you want a
flag to make existing Deployments in the group immutable).

I'd suggest we solve (1) and do some testing, it may be enough to solve the
"don't change computes on scale-out" case at least?

One way to potentially solve (2) would be to unroll the
SoftwareDeploymentGroup resources and instead generate the Deployment
resources via jinja2 - this would enable completely removing them on update
if that's what is desired, similar to what we already do for upgrades to
e.g not upgrade any compute nodes.

Steve

I started by taking an approach that would be specific to TripleO.
Basically mapping all the deployment resources to a nested stack
containing the logic to selectively disable servers from the
deployment (using yaql) based on a provided parameter value. Here's
the main patch: https://review.openstack.org/#/c/442681/

After considering that complexity, particularly the yaql expression,
I'm wondering if it would be better to add this support natively to
Heat.

I was looking at the restrictedactions key in the resourceregistry
and was thinking this might be a reasonable place to add such support.
It would require some changes to how restricted_actions work.

One change would be a method for specifying that restrictedactions
should not fail the stack operation if an action would have otherwise
been triggered. Currently the behavior is to raise an exception and
mark the stack failed if an action needs to be taken but has been
marked restricted. That would need to be tweaked to allow specifying
that that we don't want the stack to fail. One thought would be to
change the allowed values of restricted
actions to:

replacefail
replace
ignore
updatefail
update
ignore
replace
update

where replace and update were synonyms for replacefail/updatefail to
maintain backwards compatibility.

Another change would be to add logic to the Deployment resources
themselves to consider if any restricted_actions have been set on an
Server resources before triggering an updated deployment for a given
server.

It also might be nice to allow specifying restrictedactions on the
server's name property (which typically is the hostname) instead of
having to use the resource name. The reason being is that it is not
really feasibly to expect operators/users to have to represent the
full nested
stack structure in their resourceregistry. They would
have to query and record nested
stack names just to refer to a given
server resource. Each ResourceGroup nested stack would be have to be
individually represented, etc. Unless there is another way I'm
overlooking.

Whether or not the restricted_actions approach is taken, is Heat
interested in this functionality natively? I think it would make for a
much cleaner implementation than something TripleO specific. I can
work on a Heat spec if there's interest, though I'd like to get some
early feedback.

Thanks.

--
-- James Slagle
--


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Steve Hardy
Red Hat Engineering, Cloud


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Mar 8, 2017 by Steven_Hardy (16,900 points)   2 7 12
0 votes

On Tue, Mar 7, 2017 at 7:24 PM, Zane Bitter zbitter@redhat.com wrote:
On 07/03/17 14:34, James Slagle wrote:

I've been working on this spec for TripleO:
https://review.openstack.org/#/c/431745/

which allows users to selectively disable Heat deployment resources
for a given server (or server in the case of a *DeloymentGroup
resource).

I'm not completely clear on what this means. You can selectively disable
resources with conditionals. But I think you mean that you want to
selectively disable changes to resources?

Yes, that's right. The reason I can't use conditionals is that I still
want the SoftwareDeploymentGroup resources to be updated, but I may
want to selectively exclude servers from the group that is passed in
via the servers property. E.g., instead of updating the deployment
metadata for all computes, I may want to exclude a single compute
that is temporarily unreachable, without that failing the whole
stack-update.

I started by taking an approach that would be specific to TripleO.
Basically mapping all the deployment resources to a nested stack
containing the logic to selectively disable servers from the
deployment (using yaql) based on a provided parameter value. Here's
the main patch: https://review.openstack.org/#/c/442681/

After considering that complexity, particularly the yaql expression,
I'm wondering if it would be better to add this support natively to
Heat.

I was looking at the restrictedactions key in the resourceregistry
and was thinking this might be a reasonable place to add such support.
It would require some changes to how restricted_actions work.

One change would be a method for specifying that restrictedactions
should not fail the stack operation if an action would have otherwise
been triggered. Currently the behavior is to raise an exception and
mark the stack failed if an action needs to be taken but has been
marked restricted. That would need to be tweaked to allow specifying
that that we don't want the stack to fail. One thought would be to
change the allowed values of restricted
actions to:

replacefail
replace
ignore
updatefail
update
ignore
replace
update

where replace and update were synonyms for replacefail/updatefail to
maintain backwards compatibility.

Anything that involves the resource definition in the template changing but
Heat not modifying the resource is problematic, because that messes with
Heat's internal bookkeeping.

I don't think this case would violate that principle. The template +
environment files would match what Heat has done. After an update, the
2 would be in sync as to what servers the updated Deployment resource
was triggered.

Another change would be to add logic to the Deployment resources
themselves to consider if any restricted_actions have been set on an
Server resources before triggering an updated deployment for a given
server.

Why not just a property, "nonewdeployments_please: true"?

That would actually work and be pretty straightforward I think. We
could have a map parameter with server names and the property that the
user could use to set the value.

The reason why I was initially not considering this route was because
it doesn't allow the user to disable only some deployments for a given
server. It's all or nothing. However, it's much simpler than a totally
flexible option, and it addresses 2 of the largest use cases of this
feature. I'll look into this route a bit more.

--
-- James Slagle
--


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Mar 8, 2017 by James_Slagle (7,000 points)   1 3 4
0 votes

On Wed, Mar 8, 2017 at 4:08 AM, Steven Hardy shardy@redhat.com wrote:
On Tue, Mar 07, 2017 at 02:34:50PM -0500, James Slagle wrote:

I've been working on this spec for TripleO:
https://review.openstack.org/#/c/431745/

which allows users to selectively disable Heat deployment resources
for a given server (or server in the case of a *DeloymentGroup
resource).

Some of the main use cases in TripleO for such a feature are scaling
out compute nodes where you do not need to rerun Puppet (or make any
changes at all) on non-compute nodes, or to exclude nodes from hanging
a stack-update if you know they are unreachable or degraded for some
reason. There are others, but those are 2 of the major use cases.

Thanks for raising this, I know it's been a pain point for some users of
TripleO.

However I think we're conflating two different issues here:

  1. Don't re-run puppet (or yum update) when no other changes have happened

  2. Disable deployment resources when changes have happened

Yea, possibly, but (1) doesn't really solve the use cases in the spec.
It'd certainly be a small improvement, but it's not really what users
are asking for.

(2) is much more difficult to reason about because we in fact have to
execute puppet to fully determine if changes have happened.

I don't really think these two are conflated. For some purposes, the
2nd is just a more abstract definition of the first. For better or
worse, part of the reason people are asking for this feature is
because they don't want to undo manual changes. While that's not
something we should really spend a lot of time solving for, the fact
is that OpenStack architecture allows for horizontally scaling compute
nodes without have to touch every other single node in your deployment
but TripleO can't take advantage of that.

So, just giving users a way to opt out of the generated unique
identifier triggering the puppet applys and other deployments,
wouldn't help them if they unintentionally changed some other hiera
data that triggers a deployment.

Plus, we have some deployments that are going to execute every time
outside of unique identifiers being generated (hosts-config.yaml).

(1) is actually very simple, and is the default behavior of Heat
(SoftwareDeployment resources never update unless either the config
referenced or the input_values change). We just need to provide an option
to disable the DeployIdentifier/UpdateIdentifier timestamps from being
generated in tripleoclient.

(2) is harder, because the whole point of SoftwareDeploymentGroup is to run
the exact same configuration on a group of servers, with no exceptions.

As Zane mentions (2) is related to the way ResourceGroup works, but the
problem here isn't ResourceGroup per-se, as it would in theory be pretty
easy to reimplement SoftwareDeploymentGroup to generate it's nested stack
without inheriting from ResourceGroup (which may be needed if you want a
flag to make existing Deployments in the group immutable).

I'd suggest we solve (1) and do some testing, it may be enough to solve the
"don't change computes on scale-out" case at least?

Possibly, as long as no other deployments are triggered. I think of
the use case more as:

add a compute node(s), don't touch any existing nodes to minimize risk

as opposed to:

add a compute node(s), don't re-run puppet on any existing nodes as I
know that it's not needed

For the scale out case, the desire to minimize risk is a big part of
why other nodes don't need to be touched.

One way to potentially solve (2) would be to unroll the
SoftwareDeploymentGroup resources and instead generate the Deployment
resources via jinja2 - this would enable completely removing them on update
if that's what is desired, similar to what we already do for upgrades to
e.g not upgrade any compute nodes.

Thanks, I hadn't considered that approach, but will look into it. I'd
guess you'd still need a parameter or map data fed into the jinja2
templating, so that it would not generate the deployment resources
based on what was desired to be disabled. Or, this could use
conditionals perhaps.

--
-- James Slagle
--


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Mar 8, 2017 by James_Slagle (7,000 points)   1 3 4
0 votes

On 08/03/17 10:05, James Slagle wrote:
On Tue, Mar 7, 2017 at 7:24 PM, Zane Bitter zbitter@redhat.com wrote:

On 07/03/17 14:34, James Slagle wrote:

I've been working on this spec for TripleO:
https://review.openstack.org/#/c/431745/

which allows users to selectively disable Heat deployment resources
for a given server (or server in the case of a *DeloymentGroup
resource).

I'm not completely clear on what this means. You can selectively disable
resources with conditionals. But I think you mean that you want to
selectively disable changes to resources?

Yes, that's right. The reason I can't use conditionals is that I still
want the SoftwareDeploymentGroup resources to be updated, but I may
want to selectively exclude servers from the group that is passed in
via the servers property. E.g., instead of updating the deployment
metadata for all computes, I may want to exclude a single compute
that is temporarily unreachable, without that failing the whole
stack-update.

Have you seen the filter function?

http://git.openstack.org/cgit/openstack/heat/tree/heat/engine/hot/functions.py#n1279

I started by taking an approach that would be specific to TripleO.
Basically mapping all the deployment resources to a nested stack
containing the logic to selectively disable servers from the
deployment (using yaql) based on a provided parameter value. Here's
the main patch: https://review.openstack.org/#/c/442681/

After considering that complexity, particularly the yaql expression,
I'm wondering if it would be better to add this support natively to
Heat.

I was looking at the restrictedactions key in the resourceregistry
and was thinking this might be a reasonable place to add such support.
It would require some changes to how restricted_actions work.

One change would be a method for specifying that restrictedactions
should not fail the stack operation if an action would have otherwise
been triggered. Currently the behavior is to raise an exception and
mark the stack failed if an action needs to be taken but has been
marked restricted. That would need to be tweaked to allow specifying
that that we don't want the stack to fail. One thought would be to
change the allowed values of restricted
actions to:

replacefail
replace
ignore
updatefail
update
ignore
replace
update

where replace and update were synonyms for replacefail/updatefail to
maintain backwards compatibility.

Anything that involves the resource definition in the template changing but
Heat not modifying the resource is problematic, because that messes with
Heat's internal bookkeeping.

I don't think this case would violate that principle. The template +
environment files would match what Heat has done. After an update, the
2 would be in sync as to what servers the updated Deployment resource
was triggered.

I'm afraid I can't agree; it isn't that straightforward. Also, if you
want to implement a generic mechanism that applies to every kind of
resource (like restricted_actions do) then it isn't enough for it to
work in one particular use case.

Another change would be to add logic to the Deployment resources
themselves to consider if any restricted_actions have been set on an
Server resources before triggering an updated deployment for a given
server.

Why not just a property, "nonewdeployments_please: true"?

That would actually work and be pretty straightforward I think. We
could have a map parameter with server names and the property that the
user could use to set the value.

The tricky part, since this would presumably be implemented in the
software deployment API itself, would be how to keep the Heat
SoftwareDeployment resource in sync with what's actually happening, so
that the Right Thing happens again when you start doing new deployments.

cheers,
Zane.

The reason why I was initially not considering this route was because
it doesn't allow the user to disable only some deployments for a given
server. It's all or nothing. However, it's much simpler than a totally
flexible option, and it addresses 2 of the largest use cases of this
feature. I'll look into this route a bit more.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Mar 9, 2017 by Zane_Bitter (21,640 points)   4 6 11
...