settingsLogin | Registersettings

[openstack-dev] [heat] Application level HA via Heat

0 votes

Hi all,

So, lately I've been having various discussions around $subject, and I know
it's something several folks in our community are interested in, so I
wanted to get some ideas I've been pondering out there for discussion.

I'll start with a proposal of how we might replace HARestarter with
AutoScaling group, then give some initial ideas of how we might evolve that
into something capable of a sort-of active/active failover.

  1. HARestarter replacement.

My position on HARestarter has long been that equivalent functionality
should be available via AutoScalingGroups of size 1. Turns out that
shouldn't be too hard to do:

resources:
servergroup:
type: OS::Heat::AutoScalingGroup
properties:
min
size: 1
maxsize: 1
resource:
type: ha
server.yaml

serverreplacementpolicy:
type: OS::Heat::ScalingPolicy
properties:
# FIXME: this adjustmenttype doesn't exist yet
adjustment
type: replaceoldest
auto
scalinggroupid: {getresource: servergroup}
scaling_adjustment: 1

So, currently our ScalingPolicy resource can only support three adjustment
types, all of which change the group capacity. AutoScalingGroup already
supports batched replacements for rolling updates, so if we modify the
interface to allow a signal to trigger replacement of a group member, then
the snippet above should be logically equivalent to HARestarter AFAICT.

The steps to do this should be:

  • Standardize the ScalingPolicy-AutoScaling group interface, so
    aynchronous adjustments (e.g signals) between the two resources don't use
    the "adjust" method.

  • Add an option to replace a member to the signal interface of
    AutoScalingGroup

  • Add the new "replace adjustment type to ScalingPolicy

I posted a patch which implements the first step, and the second will be
required for TripleO, e.g we should be doing it soon.

https://review.openstack.org/#/c/143496/
https://review.openstack.org/#/c/140781/

  1. A possible next step towards active/active HA failover

The next part is the ability to notify before replacement that a scaling
action is about to happen (just like we do for LoadBalancer resources
already) and orchestrate some or all of the following:

  • Attempt to quiesce the currently active node (may be impossible if it's
    in a bad state)

  • Detach resources (e.g volumes primarily?) from the current active node,
    and attach them to the new active node

  • Run some config action to activate the new node (e.g run some config
    script to fsck and mount a volume, then start some application).

The first step is possible by putting a SofwareConfig/SoftwareDeployment
resource inside haserver.yaml (using NOSIGNAL so we don't fail if the
node is too bricked to respond and specifying DELETE action so it only runs
when we replace the resource).

The third step is possible either via a script inside the box which polls
for the volume attachment, or possibly via an update-only software config.

The second step is the missing piece AFAICS.

I've been wondering if we can do something inside a new heat resource,
which knows what the current "active" member of an ASG is, and gets
triggered on a "replace" signal to orchestrate e.g deleting and creating a
VolumeAttachment resource to move a volume between servers.

Something like:

resources:
servergroup:
type: OS::Heat::AutoScalingGroup
properties:
min
size: 2
maxsize: 2
resource:
type: ha
server.yaml

serverfailoverpolicy:
type: OS::Heat::FailoverPolicy
properties:
autoscalinggroupid: {getresource: servergroup}
resource:
type: OS::Cinder::VolumeAttachment
properties:
# FIXME: "refs" is a ResourceGroup interface not currently
# available in AutoScalingGroup
instance
uuid: {getattr: [servergroup, refs, 1]}

serverreplacementpolicy:
type: OS::Heat::ScalingPolicy
properties:
# FIXME: this adjustmenttype doesn't exist yet
adjustment
type: replaceoldest
auto
scalingpolicyid: {getresource: serverfailoverpolicy}
scaling
adjustment: 1

By chaining policies like this we could trigger an update on the attachment
resource (or a nested template via a provider resource containing many
attachments or other resources) every time the ScalingPolicy is triggered.

For the sake of clarity, I've not included the existing stuff like
ceilometer alarm resources etc above, but hopefully it gets the idea
accross so we can discuss further, what are peoples thoughts? I'm quite
happy to iterate on the idea if folks have suggestions for a better
interface etc :)

One problem I see with the above approach is you'd have to trigger a
failover after stack create to get the initial volume attached, still
pondering ideas on how best to solve that..

Thanks,

Steve

asked Dec 22, 2014 in openstack-dev by Steven_Hardy (16,900 points)   2 7 13
retagged Jan 28, 2015 by admin

9 Responses

0 votes

On 22/12/14 13:21, Steven Hardy wrote:
Hi all,

So, lately I've been having various discussions around $subject, and I know
it's something several folks in our community are interested in, so I
wanted to get some ideas I've been pondering out there for discussion.

I'll start with a proposal of how we might replace HARestarter with
AutoScaling group, then give some initial ideas of how we might evolve that
into something capable of a sort-of active/active failover.

  1. HARestarter replacement.

My position on HARestarter has long been that equivalent functionality
should be available via AutoScalingGroups of size 1. Turns out that
shouldn't be too hard to do:

resources:
servergroup:
type: OS::Heat::AutoScalingGroup
properties:
min
size: 1
maxsize: 1
resource:
type: ha
server.yaml

serverreplacementpolicy:
type: OS::Heat::ScalingPolicy
properties:
# FIXME: this adjustmenttype doesn't exist yet
adjustment
type: replaceoldest
auto
scalinggroupid: {getresource: servergroup}
scaling_adjustment: 1

One potential issue with this is that it is a little bit too
equivalent to HARestarter - it will replace your whole scaled unit
(ha_server.yaml in this case) rather than just the failed resource inside.

So, currently our ScalingPolicy resource can only support three adjustment
types, all of which change the group capacity. AutoScalingGroup already
supports batched replacements for rolling updates, so if we modify the
interface to allow a signal to trigger replacement of a group member, then
the snippet above should be logically equivalent to HARestarter AFAICT.

The steps to do this should be:

  • Standardize the ScalingPolicy-AutoScaling group interface, so
    aynchronous adjustments (e.g signals) between the two resources don't use
    the "adjust" method.

  • Add an option to replace a member to the signal interface of
    AutoScalingGroup

  • Add the new "replace adjustment type to ScalingPolicy

I think I am broadly in favour of this.

I posted a patch which implements the first step, and the second will be
required for TripleO, e.g we should be doing it soon.

https://review.openstack.org/#/c/143496/
https://review.openstack.org/#/c/140781/

  1. A possible next step towards active/active HA failover

The next part is the ability to notify before replacement that a scaling
action is about to happen (just like we do for LoadBalancer resources
already) and orchestrate some or all of the following:

  • Attempt to quiesce the currently active node (may be impossible if it's
    in a bad state)

  • Detach resources (e.g volumes primarily?) from the current active node,
    and attach them to the new active node

  • Run some config action to activate the new node (e.g run some config
    script to fsck and mount a volume, then start some application).

The first step is possible by putting a SofwareConfig/SoftwareDeployment
resource inside haserver.yaml (using NOSIGNAL so we don't fail if the
node is too bricked to respond and specifying DELETE action so it only runs
when we replace the resource).

The third step is possible either via a script inside the box which polls
for the volume attachment, or possibly via an update-only software config.

The second step is the missing piece AFAICS.

I've been wondering if we can do something inside a new heat resource,
which knows what the current "active" member of an ASG is, and gets
triggered on a "replace" signal to orchestrate e.g deleting and creating a
VolumeAttachment resource to move a volume between servers.

Something like:

resources:
servergroup:
type: OS::Heat::AutoScalingGroup
properties:
min
size: 2
maxsize: 2
resource:
type: ha
server.yaml

serverfailoverpolicy:
type: OS::Heat::FailoverPolicy
properties:
autoscalinggroupid: {getresource: servergroup}
resource:
type: OS::Cinder::VolumeAttachment
properties:
# FIXME: "refs" is a ResourceGroup interface not currently
# available in AutoScalingGroup
instance
uuid: {getattr: [servergroup, refs, 1]}

serverreplacementpolicy:
type: OS::Heat::ScalingPolicy
properties:
# FIXME: this adjustmenttype doesn't exist yet
adjustment
type: replaceoldest
auto
scalingpolicyid: {getresource: serverfailoverpolicy}
scaling
adjustment: 1

This actually fails because a VolumeAttachment needs to be updated in
place; if you try to switch servers but keep the same Volume when
replacing the attachment you'll get an error.

TBH {getattr: [servergroup, refs, 1]} is doing most of the heavy
lifting here, so in theory you could just have an
OS::Cinder::VolumeAttachment instead of the FailoverPolicy and then all
you need is a way of triggering a stack update with the same template &
params. I know Ton added a PATCH method to update in Juno so that you
don't have to pass parameters any more, and I believe it's planned to do
the same with the template.

By chaining policies like this we could trigger an update on the attachment
resource (or a nested template via a provider resource containing many
attachments or other resources) every time the ScalingPolicy is triggered.

For the sake of clarity, I've not included the existing stuff like
ceilometer alarm resources etc above, but hopefully it gets the idea
accross so we can discuss further, what are peoples thoughts? I'm quite
happy to iterate on the idea if folks have suggestions for a better
interface etc :)

One problem I see with the above approach is you'd have to trigger a
failover after stack create to get the initial volume attached, still
pondering ideas on how best to solve that..

To me this is falling into the same old trap of "hey, we want to run
this custom workflow, all we need to do is add a new resource type to
hang some code on". That's pretty much how we got HARestarter.

Also, like HARestarter, this cannot hope to cover the range of possible
actions that might be needed by various applications.

IMHO the "right" way to implement this is that the Ceilometer alarm
triggers a workflow in Mistral that takes the appropriate action defined
by the user, which may (or may not) include updating the Heat stack to a
new template where the shared storage gets attached to a different server.

cheers,
Zane.

responded Dec 22, 2014 by Zane_Bitter (21,640 points)   4 6 12
0 votes

On Tue, Dec 23, 2014 at 6:42 AM, Zane Bitter wrote:

On 22/12/14 13:21, Steven Hardy wrote:

Hi all,

So, lately I've been having various discussions around $subject, and I
know
it's something several folks in our community are interested in, so I
wanted to get some ideas I've been pondering out there for discussion.

I'll start with a proposal of how we might replace HARestarter with
AutoScaling group, then give some initial ideas of how we might evolve
that
into something capable of a sort-of active/active failover.

  1. HARestarter replacement.

My position on HARestarter has long been that equivalent functionality
should be available via AutoScalingGroups of size 1. Turns out that
shouldn't be too hard to do:

resources:
servergroup:
type: OS::Heat::AutoScalingGroup
properties:
min
size: 1
maxsize: 1
resource:
type: ha
server.yaml

serverreplacementpolicy:
type: OS::Heat::ScalingPolicy
properties:
# FIXME: this adjustmenttype doesn't exist yet
adjustment
type: replaceoldest
auto
scalinggroupid: {getresource: servergroup}
scaling_adjustment: 1

One potential issue with this is that it is a little bit too equivalent
to HARestarter - it will replace your whole scaled unit (ha_server.yaml in
this case) rather than just the failed resource inside.

So, currently our ScalingPolicy resource can only support three adjustment

types, all of which change the group capacity. AutoScalingGroup already
supports batched replacements for rolling updates, so if we modify the
interface to allow a signal to trigger replacement of a group member, then
the snippet above should be logically equivalent to HARestarter AFAICT.

The steps to do this should be:

  • Standardize the ScalingPolicy-AutoScaling group interface, so
    aynchronous adjustments (e.g signals) between the two resources don't use
    the "adjust" method.

  • Add an option to replace a member to the signal interface of
    AutoScalingGroup

  • Add the new "replace adjustment type to ScalingPolicy

I think I am broadly in favour of this.

I posted a patch which implements the first step, and the second will be

required for TripleO, e.g we should be doing it soon.

https://review.openstack.org/#/c/143496/
https://review.openstack.org/#/c/140781/

  1. A possible next step towards active/active HA failover

The next part is the ability to notify before replacement that a scaling
action is about to happen (just like we do for LoadBalancer resources
already) and orchestrate some or all of the following:

  • Attempt to quiesce the currently active node (may be impossible if it's
    in a bad state)

  • Detach resources (e.g volumes primarily?) from the current active node,
    and attach them to the new active node

  • Run some config action to activate the new node (e.g run some config
    script to fsck and mount a volume, then start some application).

The first step is possible by putting a SofwareConfig/SoftwareDeployment
resource inside haserver.yaml (using NOSIGNAL so we don't fail if the
node is too bricked to respond and specifying DELETE action so it only
runs
when we replace the resource).

The third step is possible either via a script inside the box which polls
for the volume attachment, or possibly via an update-only software config.

The second step is the missing piece AFAICS.

I've been wondering if we can do something inside a new heat resource,
which knows what the current "active" member of an ASG is, and gets
triggered on a "replace" signal to orchestrate e.g deleting and creating a
VolumeAttachment resource to move a volume between servers.

Something like:

resources:
servergroup:
type: OS::Heat::AutoScalingGroup
properties:
min
size: 2
maxsize: 2
resource:
type: ha
server.yaml

serverfailoverpolicy:
type: OS::Heat::FailoverPolicy
properties:
autoscalinggroupid: {getresource: servergroup}
resource:
type: OS::Cinder::VolumeAttachment
properties:
# FIXME: "refs" is a ResourceGroup interface not currently
# available in AutoScalingGroup
instance
uuid: {getattr: [servergroup, refs, 1]}

serverreplacementpolicy:
type: OS::Heat::ScalingPolicy
properties:
# FIXME: this adjustmenttype doesn't exist yet
adjustment
type: replaceoldest
auto
scalingpolicyid: {getresource: serverfailoverpolicy}
scaling
adjustment: 1

This actually fails because a VolumeAttachment needs to be updated in
place; if you try to switch servers but keep the same Volume when replacing
the attachment you'll get an error.

TBH {getattr: [servergroup, refs, 1]} is doing most of the heavy lifting
here, so in theory you could just have an OS::Cinder::VolumeAttachment
instead of the FailoverPolicy and then all you need is a way of triggering
a stack update with the same template & params. I know Ton added a PATCH
method to update in Juno so that you don't have to pass parameters any
more, and I believe it's planned to do the same with the template.

By chaining policies like this we could trigger an update on the

attachment
resource (or a nested template via a provider resource containing many
attachments or other resources) every time the ScalingPolicy is triggered.

For the sake of clarity, I've not included the existing stuff like
ceilometer alarm resources etc above, but hopefully it gets the idea
accross so we can discuss further, what are peoples thoughts? I'm quite
happy to iterate on the idea if folks have suggestions for a better
interface etc :)

One problem I see with the above approach is you'd have to trigger a
failover after stack create to get the initial volume attached, still
pondering ideas on how best to solve that..

To me this is falling into the same old trap of "hey, we want to run this
custom workflow, all we need to do is add a new resource type to hang some
code on". That's pretty much how we got HARestarter.

Also, like HARestarter, this cannot hope to cover the range of possible
actions that might be needed by various applications.

IMHO the "right" way to implement this is that the Ceilometer alarm
triggers a workflow in Mistral that takes the appropriate action defined by
the user, which may (or may not) include updating the Heat stack to a new
template where the shared storage gets attached to a different server.

I agree, we should really be changing our policies to be implemented as
mistral workflows. A good first step would be to have a mistral workflow
heat resource
so that users can start getting more flexibility in what they do with alarm
actions.

-Angus

cheers,
Zane.


OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:

responded Dec 22, 2014 by Angus_Salkeld (5,520 points)   1 4 6
0 votes

On Mon, Dec 22, 2014 at 03:42:37PM -0500, Zane Bitter wrote:
On 22/12/14 13:21, Steven Hardy wrote:

Hi all,

So, lately I've been having various discussions around $subject, and I know
it's something several folks in our community are interested in, so I
wanted to get some ideas I've been pondering out there for discussion.

I'll start with a proposal of how we might replace HARestarter with
AutoScaling group, then give some initial ideas of how we might evolve that
into something capable of a sort-of active/active failover.

  1. HARestarter replacement.

My position on HARestarter has long been that equivalent functionality
should be available via AutoScalingGroups of size 1. Turns out that
shouldn't be too hard to do:

resources:
servergroup:
type: OS::Heat::AutoScalingGroup
properties:
min
size: 1
maxsize: 1
resource:
type: ha
server.yaml

serverreplacementpolicy:
type: OS::Heat::ScalingPolicy
properties:
# FIXME: this adjustmenttype doesn't exist yet
adjustment
type: replaceoldest
auto
scalinggroupid: {getresource: servergroup}
scaling_adjustment: 1

One potential issue with this is that it is a little bit too equivalent to
HARestarter - it will replace your whole scaled unit (ha_server.yaml in this
case) rather than just the failed resource inside.

Personally I don't see that as a problem, because the interface makes that
explicit - if you put a resource in an AutoScalingGroup, you expect it to
get created/deleted on group adjustment, so anything you don't want
replaced stays outside the group.

Happy to consider other alternatives which do less destructive replacement,
but to me this seems like the simplest possible way to replace HARestarter
with something we can actually support long term.

Even if "just replace failed resource" is somehow made available later,
we'll still want to support AutoScalingGroup, and "replace_oldest" is
likely to be useful in other situations, not just this use-case.

Do you have specific ideas of how the just-replace-failed-resource feature
might be implemented? A way for a signal to declare a resource failed so
convergence auto-healing does a less destructive replacement?

So, currently our ScalingPolicy resource can only support three adjustment
types, all of which change the group capacity. AutoScalingGroup already
supports batched replacements for rolling updates, so if we modify the
interface to allow a signal to trigger replacement of a group member, then
the snippet above should be logically equivalent to HARestarter AFAICT.

The steps to do this should be:

  • Standardize the ScalingPolicy-AutoScaling group interface, so
    aynchronous adjustments (e.g signals) between the two resources don't use
    the "adjust" method.

  • Add an option to replace a member to the signal interface of
    AutoScalingGroup

  • Add the new "replace adjustment type to ScalingPolicy

I think I am broadly in favour of this.

Ok, great - I think we'll probably want replaceoldest, replacenewest, and
replace_specific, such that both alarm and operator driven replacement have
flexibility over what member is replaced.

I posted a patch which implements the first step, and the second will be
required for TripleO, e.g we should be doing it soon.

https://review.openstack.org/#/c/143496/
https://review.openstack.org/#/c/140781/

  1. A possible next step towards active/active HA failover

The next part is the ability to notify before replacement that a scaling
action is about to happen (just like we do for LoadBalancer resources
already) and orchestrate some or all of the following:

  • Attempt to quiesce the currently active node (may be impossible if it's
    in a bad state)

  • Detach resources (e.g volumes primarily?) from the current active node,
    and attach them to the new active node

  • Run some config action to activate the new node (e.g run some config
    script to fsck and mount a volume, then start some application).

The first step is possible by putting a SofwareConfig/SoftwareDeployment
resource inside haserver.yaml (using NOSIGNAL so we don't fail if the
node is too bricked to respond and specifying DELETE action so it only runs
when we replace the resource).

The third step is possible either via a script inside the box which polls
for the volume attachment, or possibly via an update-only software config.

The second step is the missing piece AFAICS.

I've been wondering if we can do something inside a new heat resource,
which knows what the current "active" member of an ASG is, and gets
triggered on a "replace" signal to orchestrate e.g deleting and creating a
VolumeAttachment resource to move a volume between servers.

Something like:

resources:
servergroup:
type: OS::Heat::AutoScalingGroup
properties:
min
size: 2
maxsize: 2
resource:
type: ha
server.yaml

serverfailoverpolicy:
type: OS::Heat::FailoverPolicy
properties:
autoscalinggroupid: {getresource: servergroup}
resource:
type: OS::Cinder::VolumeAttachment
properties:
# FIXME: "refs" is a ResourceGroup interface not currently
# available in AutoScalingGroup
instance
uuid: {getattr: [servergroup, refs, 1]}

serverreplacementpolicy:
type: OS::Heat::ScalingPolicy
properties:
# FIXME: this adjustmenttype doesn't exist yet
adjustment
type: replaceoldest
auto
scalingpolicyid: {getresource: serverfailoverpolicy}
scaling
adjustment: 1

This actually fails because a VolumeAttachment needs to be updated in place;
if you try to switch servers but keep the same Volume when replacing the
attachment you'll get an error.

Doh, you're right, so FailoverPolicy would need to know how to delete then
recreate the resource instead of doing an in-place update.

TBH {getattr: [servergroup, refs, 1]} is doing most of the heavy lifting
here, so in theory you could just have an OS::Cinder::VolumeAttachment
instead of the FailoverPolicy and then all you need is a way of triggering a
stack update with the same template & params. I know Ton added a PATCH
method to update in Juno so that you don't have to pass parameters any more,
and I believe it's planned to do the same with the template.

Interesting, any thoughts on what the template-level interface to that
PATCH update might look like? (I'm guessing you'll probably say a mistral
resource?)

By chaining policies like this we could trigger an update on the attachment
resource (or a nested template via a provider resource containing many
attachments or other resources) every time the ScalingPolicy is triggered.

For the sake of clarity, I've not included the existing stuff like
ceilometer alarm resources etc above, but hopefully it gets the idea
accross so we can discuss further, what are peoples thoughts? I'm quite
happy to iterate on the idea if folks have suggestions for a better
interface etc :)

One problem I see with the above approach is you'd have to trigger a
failover after stack create to get the initial volume attached, still
pondering ideas on how best to solve that..

To me this is falling into the same old trap of "hey, we want to run this
custom workflow, all we need to do is add a new resource type to hang some
code on". That's pretty much how we got HARestarter.

Also, like HARestarter, this cannot hope to cover the range of possible
actions that might be needed by various applications.

IMHO the "right" way to implement this is that the Ceilometer alarm triggers
a workflow in Mistral that takes the appropriate action defined by the user,
which may (or may not) include updating the Heat stack to a new template
where the shared storage gets attached to a different server.

Ok, I'm quite happy to accept this may be a better long-term solution, but
can anyone comment on the current maturity level of Mistral? Questions
which spring to mind are:

  • Is the DSL stable now?
  • What's the roadmap re incubation (there are a lot of TBD's here:
    https://wiki.openstack.org/wiki/Mistral/Incubation)
  • How does deferred authentication work for alarm triggered workflows, e.g
    if a ceilometer alarm (which authenticates as a stack domain user) needs
    to signal Mistral to start a workflow?

I guess a first step is creating a contrib Mistral resource and
investigating it, but it would be great if anyone has first-hand
experiences they can share before we burn too much time digging into it.

Cheers,

Steve

responded Dec 24, 2014 by Steven_Hardy (16,900 points)   2 7 13
0 votes

Hi

Ok, I'm quite happy to accept this may be a better long-term solution, but
can anyone comment on the current maturity level of Mistral? Questions
which spring to mind are:

  • Is the DSL stable now?

You can think ?yes? because although we keep adding new features we do it in a backwards compatible manner. I personally try to be very cautious about this.

Ooh yeah, this page is very very obsolete which is actually my fault because I didn?t pay a lot of attention to this after I heard all these rumors about TC changing the whole approach around getting projects incubated/integrated.

I think incubation readiness from a technical perspective is good (various style checks, procedures etc.), even if there?s still something that we need to adjust it must not be difficult and time consuming. The main question for the last half a year has been ?What OpenStack program best fits Mistral??. So far we?ve had two candidates: Orchestration and some new program (e.g. Workflow Service). However, nothing is decided yet on that.

  • How does deferred authentication work for alarm triggered workflows, e.g
    if a ceilometer alarm (which authenticates as a stack domain user) needs
    to signal Mistral to start a workflow?

It works via Keystone trusts. It works but there?s still an issue that we are to fix. If we authenticate by a previously created trust and try to call Nova then it fails with an authentication error. I know it?s been solved in other projects (e.g. Heat) so we need to look at it.

I guess a first step is creating a contrib Mistral resource and
investigating it, but it would be great if anyone has first-hand
experiences they can share before we burn too much time digging into it.

Yes, we already started discussing how we can create Mistral resource for Heat. Looks like there?s a couple of volunteers who can do that. Anyway, I?m totally for it and any help from our side can be provided (including implementation itself)

Renat Akhmerov
@ Mirantis Inc.

responded Dec 24, 2014 by Renat_Akhmerov (12,320 points)   2 5 8
0 votes

Excerpts from Renat Akhmerov's message of 2014-12-24 03:40:22 -0800:

Hi

Ok, I'm quite happy to accept this may be a better long-term solution, but
can anyone comment on the current maturity level of Mistral? Questions
which spring to mind are:

  • Is the DSL stable now?

You can think ?yes? because although we keep adding new features we do it in a backwards compatible manner. I personally try to be very cautious about this.

Ooh yeah, this page is very very obsolete which is actually my fault because I didn?t pay a lot of attention to this after I heard all these rumors about TC changing the whole approach around getting projects incubated/integrated.

I think incubation readiness from a technical perspective is good (various style checks, procedures etc.), even if there?s still something that we need to adjust it must not be difficult and time consuming. The main question for the last half a year has been ?What OpenStack program best fits Mistral??. So far we?ve had two candidates: Orchestration and some new program (e.g. Workflow Service). However, nothing is decided yet on that.

It's probably worth re-thinking the discussion above given the governance
changes that are being worked on:

http://governance.openstack.org/resolutions/20141202-project-structure-reform-spec.html

responded Dec 24, 2014 by Clint_Byrum (40,940 points)   4 6 10
0 votes

Thanks Clint,

I actually didn?t see this before (like I said just rumors) so need to read it carefully.

Renat Akhmerov
@ Mirantis Inc.

On 25 Dec 2014, at 00:18, Clint Byrum wrote:

Excerpts from Renat Akhmerov's message of 2014-12-24 03:40:22 -0800:

Hi

Ok, I'm quite happy to accept this may be a better long-term solution, but
can anyone comment on the current maturity level of Mistral? Questions
which spring to mind are:

  • Is the DSL stable now?

You can think ?yes? because although we keep adding new features we do it in a backwards compatible manner. I personally try to be very cautious about this.

Ooh yeah, this page is very very obsolete which is actually my fault because I didn?t pay a lot of attention to this after I heard all these rumors about TC changing the whole approach around getting projects incubated/integrated.

I think incubation readiness from a technical perspective is good (various style checks, procedures etc.), even if there?s still something that we need to adjust it must not be difficult and time consuming. The main question for the last half a year has been ?What OpenStack program best fits Mistral??. So far we?ve had two candidates: Orchestration and some new program (e.g. Workflow Service). However, nothing is decided yet on that.

It's probably worth re-thinking the discussion above given the governance
changes that are being worked on:

http://governance.openstack.org/resolutions/20141202-project-structure-reform-spec.html


OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Dec 25, 2014 by Renat_Akhmerov (12,320 points)   2 5 8
0 votes

On 24/12/14 05:17, Steven Hardy wrote:
On Mon, Dec 22, 2014 at 03:42:37PM -0500, Zane Bitter wrote:

On 22/12/14 13:21, Steven Hardy wrote:

Hi all,

So, lately I've been having various discussions around $subject, and I know
it's something several folks in our community are interested in, so I
wanted to get some ideas I've been pondering out there for discussion.

I'll start with a proposal of how we might replace HARestarter with
AutoScaling group, then give some initial ideas of how we might evolve that
into something capable of a sort-of active/active failover.

  1. HARestarter replacement.

My position on HARestarter has long been that equivalent functionality
should be available via AutoScalingGroups of size 1. Turns out that
shouldn't be too hard to do:

resources:
servergroup:
type: OS::Heat::AutoScalingGroup
properties:
min
size: 1
maxsize: 1
resource:
type: ha
server.yaml

serverreplacementpolicy:
type: OS::Heat::ScalingPolicy
properties:
# FIXME: this adjustmenttype doesn't exist yet
adjustment
type: replaceoldest
auto
scalinggroupid: {getresource: servergroup}
scaling_adjustment: 1

One potential issue with this is that it is a little bit too equivalent to
HARestarter - it will replace your whole scaled unit (ha_server.yaml in this
case) rather than just the failed resource inside.

Personally I don't see that as a problem, because the interface makes that
explicit - if you put a resource in an AutoScalingGroup, you expect it to
get created/deleted on group adjustment, so anything you don't want
replaced stays outside the group.

I guess I was thinking about having the same mechanism work when the
size of the scaling group is not fixed at 1.

Happy to consider other alternatives which do less destructive replacement,
but to me this seems like the simplest possible way to replace HARestarter
with something we can actually support long term.

Yeah, I just get uneasy about features that don't compose. Here you have
to decide between the replacement policy feature and the feature of
being able to scale out arbitrary stacks. The two uses are so different
that they almost don't make sense as the same resource. The result will
be a lot of people implementing scaling groups inside scaling groups in
order to take advantage of both sets of behaviour.

Even if "just replace failed resource" is somehow made available later,
we'll still want to support AutoScalingGroup, and "replace_oldest" is
likely to be useful in other situations, not just this use-case.

Do you have specific ideas of how the just-replace-failed-resource feature
might be implemented? A way for a signal to declare a resource failed so
convergence auto-healing does a less destructive replacement?

So, currently our ScalingPolicy resource can only support three adjustment
types, all of which change the group capacity. AutoScalingGroup already
supports batched replacements for rolling updates, so if we modify the
interface to allow a signal to trigger replacement of a group member, then
the snippet above should be logically equivalent to HARestarter AFAICT.

The steps to do this should be:

  • Standardize the ScalingPolicy-AutoScaling group interface, so
    aynchronous adjustments (e.g signals) between the two resources don't use
    the "adjust" method.

  • Add an option to replace a member to the signal interface of
    AutoScalingGroup

  • Add the new "replace adjustment type to ScalingPolicy

I think I am broadly in favour of this.

Ok, great - I think we'll probably want replaceoldest, replacenewest, and
replace_specific, such that both alarm and operator driven replacement have
flexibility over what member is replaced.

We probably want to allow users to specify the replacement policy (e.g.
oldest first vs. newest first) for the scaling group itself to use when
scaling down or during rolling updates. If we had that, we'd probably
only need a single "replace" adjustment type - if a particular member is
specified in the message then it would replace that specific one,
otherwise the scaling group would choose which to replace based on the
specified policy.

I posted a patch which implements the first step, and the second will be
required for TripleO, e.g we should be doing it soon.

https://review.openstack.org/#/c/143496/
https://review.openstack.org/#/c/140781/

  1. A possible next step towards active/active HA failover

The next part is the ability to notify before replacement that a scaling
action is about to happen (just like we do for LoadBalancer resources
already) and orchestrate some or all of the following:

  • Attempt to quiesce the currently active node (may be impossible if it's
    in a bad state)

  • Detach resources (e.g volumes primarily?) from the current active node,
    and attach them to the new active node

  • Run some config action to activate the new node (e.g run some config
    script to fsck and mount a volume, then start some application).

The first step is possible by putting a SofwareConfig/SoftwareDeployment
resource inside haserver.yaml (using NOSIGNAL so we don't fail if the
node is too bricked to respond and specifying DELETE action so it only runs
when we replace the resource).

The third step is possible either via a script inside the box which polls
for the volume attachment, or possibly via an update-only software config.

The second step is the missing piece AFAICS.

I've been wondering if we can do something inside a new heat resource,
which knows what the current "active" member of an ASG is, and gets
triggered on a "replace" signal to orchestrate e.g deleting and creating a
VolumeAttachment resource to move a volume between servers.

Something like:

resources:
servergroup:
type: OS::Heat::AutoScalingGroup
properties:
min
size: 2
maxsize: 2
resource:
type: ha
server.yaml

serverfailoverpolicy:
type: OS::Heat::FailoverPolicy
properties:
autoscalinggroupid: {getresource: servergroup}
resource:
type: OS::Cinder::VolumeAttachment
properties:
# FIXME: "refs" is a ResourceGroup interface not currently
# available in AutoScalingGroup
instance
uuid: {getattr: [servergroup, refs, 1]}

serverreplacementpolicy:
type: OS::Heat::ScalingPolicy
properties:
# FIXME: this adjustmenttype doesn't exist yet
adjustment
type: replaceoldest
auto
scalingpolicyid: {getresource: serverfailoverpolicy}
scaling
adjustment: 1

This actually fails because a VolumeAttachment needs to be updated in place;
if you try to switch servers but keep the same Volume when replacing the
attachment you'll get an error.

Doh, you're right, so FailoverPolicy would need to know how to delete then
recreate the resource instead of doing an in-place update.

Other way around.

Well, actually there are two options, I guess. We could have a
FailoverPolicy that deletes the old resource before creating the new one
- this is the opposite of how existing stack updates work, so that
implies that this would be new code that doesn't rely on the existing
stack updates. The other option is to use the usual update mechanism to
do an in-place update if possible - but in that case you don't require
the FailoverPolicy resource, a regular update on the main template would
have the same effect (as discussed below).

TBH {getattr: [servergroup, refs, 1]} is doing most of the heavy lifting
here, so in theory you could just have an OS::Cinder::VolumeAttachment
instead of the FailoverPolicy and then all you need is a way of triggering a
stack update with the same template & params. I know Ton added a PATCH
method to update in Juno so that you don't have to pass parameters any more,
and I believe it's planned to do the same with the template.

Interesting, any thoughts on what the template-level interface to that
PATCH update might look like? (I'm guessing you'll probably say a mistral
resource?)

Hmm, interesting question. It would be possible to pass a stack ID as a
property to the scaling policy (in the given example you'd pass
{getparam: OS::stackid}) to have it trigger an update on some stack.
(In fact, assuming that OS::Heat::FailoverPolicy is implemented as a
nested stack, that's identical in implementation to what you proposed.)
In a post-convergence world you can even imagine that it wouldn't need
to be specified, and that an update to a child stack would always cause
a re-evaluation of the parent.

Of course if you want a pluggable framework with potentially multiple
sources of alarms and user-defined (rather than hard-coded) actions,
then it's hard to go past Mistral.

By chaining policies like this we could trigger an update on the attachment
resource (or a nested template via a provider resource containing many
attachments or other resources) every time the ScalingPolicy is triggered.

For the sake of clarity, I've not included the existing stuff like
ceilometer alarm resources etc above, but hopefully it gets the idea
accross so we can discuss further, what are peoples thoughts? I'm quite
happy to iterate on the idea if folks have suggestions for a better
interface etc :)

One problem I see with the above approach is you'd have to trigger a
failover after stack create to get the initial volume attached, still
pondering ideas on how best to solve that..

To me this is falling into the same old trap of "hey, we want to run this
custom workflow, all we need to do is add a new resource type to hang some
code on". That's pretty much how we got HARestarter.

Also, like HARestarter, this cannot hope to cover the range of possible
actions that might be needed by various applications.

IMHO the "right" way to implement this is that the Ceilometer alarm triggers
a workflow in Mistral that takes the appropriate action defined by the user,
which may (or may not) include updating the Heat stack to a new template
where the shared storage gets attached to a different server.

Ok, I'm quite happy to accept this may be a better long-term solution, but
can anyone comment on the current maturity level of Mistral? Questions
which spring to mind are:

  • Is the DSL stable now?
  • What's the roadmap re incubation (there are a lot of TBD's here:
    https://wiki.openstack.org/wiki/Mistral/Incubation)
  • How does deferred authentication work for alarm triggered workflows, e.g
    if a ceilometer alarm (which authenticates as a stack domain user) needs
    to signal Mistral to start a workflow?

I guess a first step is creating a contrib Mistral resource and
investigating it, but it would be great if anyone has first-hand
experiences they can share before we burn too much time digging into it.

Cheers,

Steve


OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jan 2, 2015 by Zane_Bitter (21,640 points)   4 6 12
retagged Jan 28, 2015 by admin
0 votes

If we replace a autoscaling group member, we can't make sure the attached resources keep the same, why not to call the evacuate or rebuild api of nova,
just to add meters for ha(vm state or host state) in ceilometer, and then signal to HA resource(such as HARestarter)?

-----邮件原件-----
发件人: Steven Hardy [mailto:shardy@redhat.com]
发送时间: 2014年12月23日 2:21
收件人: openstack-dev@lists.openstack.org
主题: [openstack-dev] [heat] Application level HA via Heat

Hi all,

So, lately I've been having various discussions around $subject, and I know it's something several folks in our community are interested in, so I wanted to get some ideas I've been pondering out there for discussion.

I'll start with a proposal of how we might replace HARestarter with AutoScaling group, then give some initial ideas of how we might evolve that into something capable of a sort-of active/active failover.

  1. HARestarter replacement.

My position on HARestarter has long been that equivalent functionality should be available via AutoScalingGroups of size 1. Turns out that shouldn't be too hard to do:

resources:
servergroup:
type: OS::Heat::AutoScalingGroup
properties:
min
size: 1
maxsize: 1
resource:
type: ha
server.yaml

serverreplacementpolicy:
type: OS::Heat::ScalingPolicy
properties:
# FIXME: this adjustmenttype doesn't exist yet
adjustment
type: replaceoldest
auto
scalinggroupid: {getresource: servergroup}
scaling_adjustment: 1

So, currently our ScalingPolicy resource can only support three adjustment types, all of which change the group capacity. AutoScalingGroup already supports batched replacements for rolling updates, so if we modify the interface to allow a signal to trigger replacement of a group member, then the snippet above should be logically equivalent to HARestarter AFAICT.

The steps to do this should be:

  • Standardize the ScalingPolicy-AutoScaling group interface, so aynchronous adjustments (e.g signals) between the two resources don't use the "adjust" method.

  • Add an option to replace a member to the signal interface of AutoScalingGroup

  • Add the new "replace adjustment type to ScalingPolicy

I posted a patch which implements the first step, and the second will be required for TripleO, e.g we should be doing it soon.

https://review.openstack.org/#/c/143496/
https://review.openstack.org/#/c/140781/

  1. A possible next step towards active/active HA failover

The next part is the ability to notify before replacement that a scaling action is about to happen (just like we do for LoadBalancer resources
already) and orchestrate some or all of the following:

  • Attempt to quiesce the currently active node (may be impossible if it's
    in a bad state)

  • Detach resources (e.g volumes primarily?) from the current active node,
    and attach them to the new active node

  • Run some config action to activate the new node (e.g run some config
    script to fsck and mount a volume, then start some application).

The first step is possible by putting a SofwareConfig/SoftwareDeployment resource inside haserver.yaml (using NOSIGNAL so we don't fail if the node is too bricked to respond and specifying DELETE action so it only runs when we replace the resource).

The third step is possible either via a script inside the box which polls for the volume attachment, or possibly via an update-only software config.

The second step is the missing piece AFAICS.

I've been wondering if we can do something inside a new heat resource, which knows what the current "active" member of an ASG is, and gets triggered on a "replace" signal to orchestrate e.g deleting and creating a VolumeAttachment resource to move a volume between servers.

Something like:

resources:
servergroup:
type: OS::Heat::AutoScalingGroup
properties:
min
size: 2
maxsize: 2
resource:
type: ha
server.yaml

serverfailoverpolicy:
type: OS::Heat::FailoverPolicy
properties:
autoscalinggroupid: {getresource: servergroup}
resource:
type: OS::Cinder::VolumeAttachment
properties:
# FIXME: "refs" is a ResourceGroup interface not currently
# available in AutoScalingGroup
instance
uuid: {getattr: [servergroup, refs, 1]}

serverreplacementpolicy:
type: OS::Heat::ScalingPolicy
properties:
# FIXME: this adjustmenttype doesn't exist yet
adjustment
type: replaceoldest
auto
scalingpolicyid: {getresource: serverfailoverpolicy}
scaling
adjustment: 1

By chaining policies like this we could trigger an update on the attachment resource (or a nested template via a provider resource containing many attachments or other resources) every time the ScalingPolicy is triggered.

For the sake of clarity, I've not included the existing stuff like ceilometer alarm resources etc above, but hopefully it gets the idea accross so we can discuss further, what are peoples thoughts? I'm quite happy to iterate on the idea if folks have suggestions for a better interface etc :)

One problem I see with the above approach is you'd have to trigger a failover after stack create to get the initial volume attached, still pondering ideas on how best to solve that..

Thanks,

Steve


OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Apr 2, 2015 by Huangtianhua (1,080 points)   1 2 3
0 votes

Sorry to chime in but i will throw in another use case for Steven since is
about the HA/ auto-scaling and i think it does match what i asked back in
October.

http://lists.openstack.org/pipermail/openstack-dev/2014-October/049375.html

If you need more info let me know

Dani

On Thu, Apr 2, 2015 at 10:59 AM, Huangtianhua huangtianhua@huawei.com
wrote:

If we replace a autoscaling group member, we can't make sure the attached
resources keep the same, why not to call the evacuate or rebuild api of
nova,
just to add meters for ha(vm state or host state) in ceilometer, and then
signal to HA resource(such as HARestarter)?

-----邮件原件-----
发件人: Steven Hardy [mailto:shardy@redhat.com]
发送时间: 2014年12月23日 2:21
收件人: openstack-dev@lists.openstack.org
主题: [openstack-dev] [heat] Application level HA via Heat

Hi all,

So, lately I've been having various discussions around $subject, and I
know it's something several folks in our community are interested in, so I
wanted to get some ideas I've been pondering out there for discussion.

I'll start with a proposal of how we might replace HARestarter with
AutoScaling group, then give some initial ideas of how we might evolve that
into something capable of a sort-of active/active failover.

  1. HARestarter replacement.

My position on HARestarter has long been that equivalent functionality
should be available via AutoScalingGroups of size 1. Turns out that
shouldn't be too hard to do:

resources:
servergroup:
type: OS::Heat::AutoScalingGroup
properties:
min
size: 1
maxsize: 1
resource:
type: ha
server.yaml

serverreplacementpolicy:
type: OS::Heat::ScalingPolicy
properties:
# FIXME: this adjustmenttype doesn't exist yet
adjustment
type: replaceoldest
auto
scalinggroupid: {getresource: servergroup}
scaling_adjustment: 1

So, currently our ScalingPolicy resource can only support three adjustment
types, all of which change the group capacity. AutoScalingGroup already
supports batched replacements for rolling updates, so if we modify the
interface to allow a signal to trigger replacement of a group member, then
the snippet above should be logically equivalent to HARestarter AFAICT.

The steps to do this should be:

  • Standardize the ScalingPolicy-AutoScaling group interface, so
    aynchronous adjustments (e.g signals) between the two resources don't use
    the "adjust" method.

  • Add an option to replace a member to the signal interface of
    AutoScalingGroup

  • Add the new "replace adjustment type to ScalingPolicy

I posted a patch which implements the first step, and the second will be
required for TripleO, e.g we should be doing it soon.

https://review.openstack.org/#/c/143496/
https://review.openstack.org/#/c/140781/

  1. A possible next step towards active/active HA failover

The next part is the ability to notify before replacement that a scaling
action is about to happen (just like we do for LoadBalancer resources
already) and orchestrate some or all of the following:

  • Attempt to quiesce the currently active node (may be impossible if it's
    in a bad state)

  • Detach resources (e.g volumes primarily?) from the current active node,
    and attach them to the new active node

  • Run some config action to activate the new node (e.g run some config
    script to fsck and mount a volume, then start some application).

The first step is possible by putting a SofwareConfig/SoftwareDeployment
resource inside haserver.yaml (using NOSIGNAL so we don't fail if the
node is too bricked to respond and specifying DELETE action so it only runs
when we replace the resource).

The third step is possible either via a script inside the box which polls
for the volume attachment, or possibly via an update-only software config.

The second step is the missing piece AFAICS.

I've been wondering if we can do something inside a new heat resource,
which knows what the current "active" member of an ASG is, and gets
triggered on a "replace" signal to orchestrate e.g deleting and creating a
VolumeAttachment resource to move a volume between servers.

Something like:

resources:
servergroup:
type: OS::Heat::AutoScalingGroup
properties:
min
size: 2
maxsize: 2
resource:
type: ha
server.yaml

serverfailoverpolicy:
type: OS::Heat::FailoverPolicy
properties:
autoscalinggroupid: {getresource: servergroup}
resource:
type: OS::Cinder::VolumeAttachment
properties:
# FIXME: "refs" is a ResourceGroup interface not currently
# available in AutoScalingGroup
instance
uuid: {getattr: [servergroup, refs, 1]}

serverreplacementpolicy:
type: OS::Heat::ScalingPolicy
properties:
# FIXME: this adjustmenttype doesn't exist yet
adjustment
type: replaceoldest
auto
scalingpolicyid: {getresource: serverfailoverpolicy}
scaling
adjustment: 1

By chaining policies like this we could trigger an update on the
attachment resource (or a nested template via a provider resource
containing many attachments or other resources) every time the
ScalingPolicy is triggered.

For the sake of clarity, I've not included the existing stuff like
ceilometer alarm resources etc above, but hopefully it gets the idea
accross so we can discuss further, what are peoples thoughts? I'm quite
happy to iterate on the idea if folks have suggestions for a better
interface etc :)

One problem I see with the above approach is you'd have to trigger a
failover after stack create to get the initial volume attached, still
pondering ideas on how best to solve that..

Thanks,

Steve


OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Apr 3, 2015 by Daniel_Comnea (3,260 points)   1 5 10
...