settingsLogin | Registersettings

[openstack-dev] [TripleO] Forming our plans around Ansible

0 votes

I proposed a session for the PTG
(https://etherpad.openstack.org/p/tripleo-ptg-queens) about forming a
common plan and vision around Ansible in TripleO.

I think it's important however that we kick this discussion off more
broadly before the PTG, so that we can hopefully have some agreement
for deeper discussions and prototyping when we actually meet in
person.

Right now, we have multiple uses of Ansible in TripleO:

(0) tripleo-quickstart which follows the common and well accepted
approach to bundling a set of Ansible playbooks/roles.

(1) Mistral calling Ansible. This is the approach used by
tripleo-validations where Mistral directly executes ansible playbooks
using a dynamic inventory. The inventory is constructed from the
server related stack outputs of the overcloud stack.

(2) Ansible running playbooks against localhost triggered by the
heat-config Ansible hook. This approach is used by
tripleo-heat-templates for upgrade tasks and various tasks for
deploying containers.

(3) Mistral calling Heat calling Mistral calling Ansible. In this
approach, we have Mistral resources in tripleo-heat-templates that are
created as part of the overcloud stack and in turn, the created
Mistral action executions run ansible. This has been prototyped with
using ceph-ansible to install Ceph as part of the overcloud
deployment, and some of the work has already landed. There are also
proposed WIP patches using this approach to install Kubernetes.

There are also some ideas forming around pulling the Ansible playbooks
and vars out of Heat so that they can be rerun (or run initially)
independently from the Heat SoftwareDeployment delivery mechanism:

(4) https://review.openstack.org/#/c/454816/

(5) Another idea I'd like to prototype is a local tool that runs on
the undercloud and pulls all of the SoftwareDeployment data out of
Heat as the stack is being created and generates corresponding Ansible
playbooks to apply those deployments. Once a given playbook is
generated by the tool, the tool would signal back to Heat that the
deployment is complete. Heat then creates the whole stack without
actually applying a single deployment to an overcloud node. At that
point, Ansible (or Mistral->Ansible for an API) would be used to do
the actual deployment of the Overcloud with the Undercloud as the
ansible runner.

All of this work has merit as we investigate longer term plans, and
it's all at different stages with some being for dev/CI (0), some
being used already in production (1 and 2), some just at the
experimental stage (3 and 4), and some does not exist other than an
idea (5).

My intent with this mail is to start a discussion around what we've
learned from these approaches and start discussing a consolidated plan
around Ansible. And I'm not saying that whatever we come up with
should only use Ansible a certain way. Just that we ought to look at
how users/operators interact with Ansible and TripleO today and try
and come up with the best solution(s) going forward.

I think that (1) has been pretty successful, and my idea with (5)
would use a similar approach once the playbooks were generated.
Further, my idea with (5) would give us a fully backwards compatible
solution with our existing template interfaces from
tripleo-heat-templates. Longer term (or even in parallel for some
time), the generated playbooks could stop being generated (and just
exist in git), and we could consider moving away from Heat more
permanently

I recognize that saying "moving away from Heat" may be quite
controversial. While it's not 100% the same discussion as what we are
doing with Ansible, I think it is a big part of the discussion and if
we want to continue with Heat as the primary orchestration tool in
TripleO.

I've been hearing a lot of feedback from various operators about how
difficult the baremetal deployment is with Heat. While feedback about
Ironic is generally positive, a lot of the negative feedback is around
the Heat->Nova->Ironic interaction. And, if we also move more towards
Ansible for the service deployment, I wonder if there is still a long
term place for Heat at all.

Personally, I'm pretty apprehensive about the approach taken in (3). I
feel that it is a lot of complexity that could be done simpler if we
took a step back and thought more about a longer term approach. I
recognize that it's mostly an experiment/POC at this stage, and I'm
not trying to directly knock down the approach. It's just that when I
start to see more patches (Kubernetes installation) using the same
approach, I figure it's worth discussing more broadly vs trying to
have a discussion by -1'ing patch reviews, etc.

I'm interested in all feedback of course. And I plan to take a shot at
working on the prototype I mentioned in (5) if anyone would like to
collaborate around that.

I think if we can form some broad agreement before the PTG, we have a
chance at making some meaningful progress during Queens.

--
-- James Slagle
--


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Jul 12, 2017 in openstack-dev by James_Slagle (7,000 points)   1 3 3

32 Responses

0 votes

On Fri, Jul 7, 2017 at 6:50 PM, James Slagle james.slagle@gmail.com wrote:

I proposed a session for the PTG
(https://etherpad.openstack.org/p/tripleo-ptg-queens) about forming a
common plan and vision around Ansible in TripleO.

I think it's important however that we kick this discussion off more
broadly before the PTG, so that we can hopefully have some agreement
for deeper discussions and prototyping when we actually meet in
person.

Right now, we have multiple uses of Ansible in TripleO:

(0) tripleo-quickstart which follows the common and well accepted
approach to bundling a set of Ansible playbooks/roles.

(1) Mistral calling Ansible. This is the approach used by
tripleo-validations where Mistral directly executes ansible playbooks
using a dynamic inventory. The inventory is constructed from the
server related stack outputs of the overcloud stack.

(2) Ansible running playbooks against localhost triggered by the
heat-config Ansible hook. This approach is used by
tripleo-heat-templates for upgrade tasks and various tasks for
deploying containers.

(3) Mistral calling Heat calling Mistral calling Ansible. In this
approach, we have Mistral resources in tripleo-heat-templates that are
created as part of the overcloud stack and in turn, the created
Mistral action executions run ansible. This has been prototyped with
using ceph-ansible to install Ceph as part of the overcloud
deployment, and some of the work has already landed. There are also
proposed WIP patches using this approach to install Kubernetes.

There are also some ideas forming around pulling the Ansible playbooks
and vars out of Heat so that they can be rerun (or run initially)
independently from the Heat SoftwareDeployment delivery mechanism:

(4) https://review.openstack.org/#/c/454816/

(5) Another idea I'd like to prototype is a local tool that runs on
the undercloud and pulls all of the SoftwareDeployment data out of
Heat as the stack is being created and generates corresponding Ansible
playbooks to apply those deployments. Once a given playbook is
generated by the tool, the tool would signal back to Heat that the
deployment is complete. Heat then creates the whole stack without
actually applying a single deployment to an overcloud node. At that
point, Ansible (or Mistral->Ansible for an API) would be used to do
the actual deployment of the Overcloud with the Undercloud as the
ansible runner.

All of this work has merit as we investigate longer term plans, and
it's all at different stages with some being for dev/CI (0), some
being used already in production (1 and 2), some just at the
experimental stage (3 and 4), and some does not exist other than an
idea (5).

My intent with this mail is to start a discussion around what we've
learned from these approaches and start discussing a consolidated plan
around Ansible. And I'm not saying that whatever we come up with
should only use Ansible a certain way. Just that we ought to look at
how users/operators interact with Ansible and TripleO today and try
and come up with the best solution(s) going forward.

I think that (1) has been pretty successful, and my idea with (5)
would use a similar approach once the playbooks were generated.
Further, my idea with (5) would give us a fully backwards compatible
solution with our existing template interfaces from
tripleo-heat-templates. Longer term (or even in parallel for some
time), the generated playbooks could stop being generated (and just
exist in git), and we could consider moving away from Heat more
permanently

I recognize that saying "moving away from Heat" may be quite
controversial. While it's not 100% the same discussion as what we are
doing with Ansible, I think it is a big part of the discussion and if
we want to continue with Heat as the primary orchestration tool in
TripleO.

I've been hearing a lot of feedback from various operators about how
difficult the baremetal deployment is with Heat. While feedback about
Ironic is generally positive, a lot of the negative feedback is around
the Heat->Nova->Ironic interaction. And, if we also move more towards
Ansible for the service deployment, I wonder if there is still a long
term place for Heat at all.

Personally, I'm pretty apprehensive about the approach taken in (3). I
feel that it is a lot of complexity that could be done simpler if we
took a step back and thought more about a longer term approach. I
recognize that it's mostly an experiment/POC at this stage, and I'm
not trying to directly knock down the approach. It's just that when I
start to see more patches (Kubernetes installation) using the same
approach, I figure it's worth discussing more broadly vs trying to
have a discussion by -1'ing patch reviews, etc.

I'm interested in all feedback of course. And I plan to take a shot at
working on the prototype I mentioned in (5) if anyone would like to
collaborate around that.

I think if we can form some broad agreement before the PTG, we have a
chance at making some meaningful progress during Queens.

--
-- James Slagle
--

I can't offer much in-depth feedback on the pros and cons of each scenario.
My main point would be to try and simplify as much as we can, rather then
adding yet more tooling to the stack. At the moment ooo is spread across
multi repos and events are handed around multiple tool sets and queues.
This adds to a very steep learning curve for the folk who have to operate
these systems, as there are multiple moving parts to contend with. At the
moment things seem 'duck taped' together, so we should avoid adding more
complexity, and refactor down to a simpler architecture instead.

With that in mind [1] sounds viable to myself, but with the caveat that
others might have a better view of how much of a fit that is for what we
need.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 7, 2017 by Luke_Hinds (1,500 points)   1
0 votes

On Fri, Jul 7, 2017 at 5:00 PM, Luke Hinds lhinds@redhat.com wrote:
I can't offer much in-depth feedback on the pros and cons of each scenario.
My main point would be to try and simplify as much as we can, rather then
adding yet more tooling to the stack. At the moment ooo is spread across
multi repos and events are handed around multiple tool sets and queues. This
adds to a very steep learning curve for the folk who have to operate these
systems, as there are multiple moving parts to contend with. At the moment
things seem 'duck taped' together, so we should avoid adding more
complexity, and refactor down to a simpler architecture instead.

With that in mind [1] sounds viable to myself, but with the caveat that
others might have a better view of how much of a fit that is for what we
need.

Agreed, I think the goal ought to be a move towards simplification
with Ansible at the core.

An ideal scenario for me personally would be a situation where
operators could just run Ansible in the typical way that they do today
for any other project. Additionally, we'd have a way to execute the
same Ansible playbook/roles/vars/whatever via Mistral so that we had a
common API for our CLI and UI.

Perhaps the default would be to go through the API, and more advanced
usage could interface with Ansible directly.

Additionally, we must have a way to maintain backwards compatibility
with our existing template interfaces, or at least offer some form of
migration tooling.

Thanks for your feedback.

--
-- James Slagle
--


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 7, 2017 by James_Slagle (7,000 points)   1 3 3
0 votes

On Fri, Jul 7, 2017 at 1:50 PM, James Slagle james.slagle@gmail.com wrote:
(0) tripleo-quickstart which follows the common and well accepted
approach to bundling a set of Ansible playbooks/roles.

I don't want to de-rail the thread but I really want to bring some
attention to a pattern that tripleo-quickstart has been using across
it's playbooks and roles.
I sincerely hope that we can find a better implementation should we
start developing new things from scratch.

I'll sound like a broken record for those that have heard me mention
this before but for those that haven't, here's a concrete example of
how things are done today:
(Sorry for the link overload, making sure the relevant information is available)

For an example tripleo-quickstart job, here's the console 1 and it's
corresponding ARA report 2:
- A bash script is created 35 from a jinja template 6
- A task executes the bash script 79

My understanding is that things are done this way in order to provide
automated documentation and make the builds reproducible.

One of Ansible's greatest strength is supposed to be it's simplicity:
making things readable and straightforward ("Automation for Everyone"
is it's motto).
It's hard for me to put succintly into words how complicated and
counter-intuitive the current pattern is making things so I'll provide
some examples.

1) When a task running a bash script fails, you don't know what failed
from the ansible-playbook output.
You need to find the appropriate log file and look at the output
of the bash script there.

2) There is logic, conditionals and variables inside the templated
bash scripts making it non-trivial to guess what the script actually
ends up looking like once it is "compiled".
If you happen to know that this task actually ran a templated bash
script in the first place, you need to know or remember where it is
located in the logs after the job is complete and then open it up.

3) There can be more than one operation inside a bash script so you
don't know which of those operations failed unless you look at the
logs.
This reduces granularity which makes it harder to profile,
identify and troubleshoot errors.

4) You don't know what the bash script actually did (if it did
anything at all) unless you look at the logs

5) Idempotency is handled (or not) inside the bash scripts, oblivious
to Ansible really knowing if running the bash script changed something
or not

Here's an example ARA report from openstack-ansible where you're
easily able to tell what went wrong and what happened 10.

Now, I'm not being selfish and trying to say that things should be
written in a specific way so that it can make ARA more useful.
Yes, ARA would be more useful. But this is about following Ansible
best practices and making it more intuitive to understand how things
work and what happens when tasks run.
Puppet is designed the same way: there are resources and modules to do
things. You don't template bash scripts and then use Exec resources.

Documentation and reproducible builds are great things to have, but
not with this kind of tradeoff IMO.
Surely there are other means of providing documentation and reproducible builds.

TripleO is complicated enough already.
Actively making it simpler in every way we can, not just for
developers but for users and operators, should be a priority and a
theme throughout the refactor around Ansible.
We should agree on the best practices and use them.

David Moreau Simard
Senior Software Engineer | Openstack RDO

dmsimard = [irc, github, twitter]


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 7, 2017 by dms_at_redhat.com (3,780 points)   3 4
0 votes

On Fri, Jul 7, 2017 at 10:17 PM, James Slagle james.slagle@gmail.com
wrote:

On Fri, Jul 7, 2017 at 5:00 PM, Luke Hinds lhinds@redhat.com wrote:

I can't offer much in-depth feedback on the pros and cons of each
scenario.
My main point would be to try and simplify as much as we can, rather then
adding yet more tooling to the stack. At the moment ooo is spread across
multi repos and events are handed around multiple tool sets and queues.
This
adds to a very steep learning curve for the folk who have to operate
these
systems, as there are multiple moving parts to contend with. At the
moment
things seem 'duck taped' together, so we should avoid adding more
complexity, and refactor down to a simpler architecture instead.

With that in mind [1] sounds viable to myself, but with the caveat that
others might have a better view of how much of a fit that is for what we
need.

Agreed, I think the goal ought to be a move towards simplification
with Ansible at the core.

An ideal scenario for me personally would be a situation where
operators could just run Ansible in the typical way that they do today
for any other project. Additionally, we'd have a way to execute the
same Ansible playbook/roles/vars/whatever via Mistral so that we had a
common API for our CLI and UI.

Perhaps the default would be to go through the API, and more advanced
usage could interface with Ansible directly.

I like the sound of this approach, as we then have a API for driving
complex deployment and upgrades, but if an operator needs to troubleshoot
or customise, they can do so with pure play ansible. Yet mistral is there
to drive the main complexity of a full openstack deployment.

Additionally, we must have a way to maintain backwards compatibility
with our existing template interfaces, or at least offer some form of
migration tooling.

Thanks for your feedback.

--
-- James Slagle
--


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Luke Hinds | NFV Partner Engineering | Office of Technology | Red Hat
e: lhinds@redhat.com | irc: lhinds @freenode | m: +44 77 45 63 98 84 | t: +44
12 52 36 2483


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 7, 2017 by Luke_Hinds (1,500 points)   1
0 votes

On Fri, Jul 7, 2017 at 5:31 PM, David Moreau Simard dms@redhat.com wrote:
On Fri, Jul 7, 2017 at 1:50 PM, James Slagle james.slagle@gmail.com wrote:

(0) tripleo-quickstart which follows the common and well accepted
approach to bundling a set of Ansible playbooks/roles.

I don't want to de-rail the thread but I really want to bring some
attention to a pattern that tripleo-quickstart has been using across
it's playbooks and roles.
I sincerely hope that we can find a better implementation should we
start developing new things from scratch.

Yes, just to clarify...by "well accepted" I just meant how the git
repo is organized and how you are expected to interface with those
playbooks and roles as opposed to what those playbooks/roles actually
do.

I'll sound like a broken record for those that have heard me mention
this before but for those that haven't, here's a concrete example of
how things are done today:
(Sorry for the link overload, making sure the relevant information is available)

For an example tripleo-quickstart job, here's the console [1] and it's
corresponding ARA report [2]:
- A bash script is created [3][4][5] from a jinja template [6]
- A task executes the bash script [7][8][9]

From my limited experience, I believe the intent was that the
playbooks should do what a user is expected to do so that it's as
close to reproducing the user interface of TripleO 1:1.

For example, we document users running commands from a shell prompt.
Therefore, oooq ought to do the same thing as close as possible.
Obviously there will be gaps, just as there is with tripleo.sh, but I
feel that both tools (tripleo.sh/oooq) were trying to be faithful to
our published docs as mush as possible, and I think there's something
to be commended there.

Not saying it's right or wong, just that I believe that was the intent.

An alternative would be custom ansible modules that exposed tasks for
interfacing with our API directly. That would also be valuable, as
that code path is mostly untested now outside of the UI and CLI.

I think that tripleo-quickstart is a slightly different class of
"thing" from the other current Ansible uses I mentioned, in that it
sits at a layer above everything else. It's meant to automate TripleO
itself vs TripleO automating things. Regardless, we should certainly
consider how it fits into a larger plan.

--
-- James Slagle
--


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 7, 2017 by James_Slagle (7,000 points)   1 3 3
0 votes

What i'd like to dig more is how Ansible and Heat can live together. And
what features do Heat offer that are not covered by Ansible as well? Is
there still the need to have Heat as the main engine, or could that be
replaced by Ansible totally in the future?

On Sat, Jul 8, 2017 at 12:20 AM, James Slagle james.slagle@gmail.com
wrote:

On Fri, Jul 7, 2017 at 5:31 PM, David Moreau Simard dms@redhat.com
wrote:

On Fri, Jul 7, 2017 at 1:50 PM, James Slagle james.slagle@gmail.com
wrote:

(0) tripleo-quickstart which follows the common and well accepted
approach to bundling a set of Ansible playbooks/roles.

I don't want to de-rail the thread but I really want to bring some
attention to a pattern that tripleo-quickstart has been using across
it's playbooks and roles.
I sincerely hope that we can find a better implementation should we
start developing new things from scratch.

Yes, just to clarify...by "well accepted" I just meant how the git
repo is organized and how you are expected to interface with those
playbooks and roles as opposed to what those playbooks/roles actually
do.

I'll sound like a broken record for those that have heard me mention
this before but for those that haven't, here's a concrete example of
how things are done today:
(Sorry for the link overload, making sure the relevant information is
available)

For an example tripleo-quickstart job, here's the console [1] and it's
corresponding ARA report [2]:
- A bash script is created [3][4][5] from a jinja template [6]
- A task executes the bash script [7][8][9]

From my limited experience, I believe the intent was that the
playbooks should do what a user is expected to do so that it's as
close to reproducing the user interface of TripleO 1:1.

For example, we document users running commands from a shell prompt.
Therefore, oooq ought to do the same thing as close as possible.
Obviously there will be gaps, just as there is with tripleo.sh, but I
feel that both tools (tripleo.sh/oooq) were trying to be faithful to
our published docs as mush as possible, and I think there's something
to be commended there.

Not saying it's right or wong, just that I believe that was the intent.

An alternative would be custom ansible modules that exposed tasks for
interfacing with our API directly. That would also be valuable, as
that code path is mostly untested now outside of the UI and CLI.

I think that tripleo-quickstart is a slightly different class of
"thing" from the other current Ansible uses I mentioned, in that it
sits at a layer above everything else. It's meant to automate TripleO
itself vs TripleO automating things. Regardless, we should certainly
consider how it fits into a larger plan.

--
-- James Slagle
--


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--

Yolanda Robla Mota

Principal Software Engineer, RHCE

Red Hat

C/Avellana 213

Urb Portugal

yroblamo@redhat.com M: +34605641639


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 9, 2017 by yroblamo_at_redhat.c (1,040 points)   2 2
0 votes

On Fri, Jul 7, 2017 at 6:20 PM, James Slagle james.slagle@gmail.com wrote:

On Fri, Jul 7, 2017 at 5:31 PM, David Moreau Simard dms@redhat.com
wrote:

On Fri, Jul 7, 2017 at 1:50 PM, James Slagle james.slagle@gmail.com
wrote:

(0) tripleo-quickstart which follows the common and well accepted
approach to bundling a set of Ansible playbooks/roles.

I don't want to de-rail the thread but I really want to bring some
attention to a pattern that tripleo-quickstart has been using across
it's playbooks and roles.
I sincerely hope that we can find a better implementation should we
start developing new things from scratch.

Yes, just to clarify...by "well accepted" I just meant how the git
repo is organized and how you are expected to interface with those
playbooks and roles as opposed to what those playbooks/roles actually
do.

I'll sound like a broken record for those that have heard me mention
this before but for those that haven't, here's a concrete example of
how things are done today:
(Sorry for the link overload, making sure the relevant information is
available)

For an example tripleo-quickstart job, here's the console [1] and it's
corresponding ARA report [2]:
- A bash script is created [3][4][5] from a jinja template [6]
- A task executes the bash script [7][8][9]

From my limited experience, I believe the intent was that the
playbooks should do what a user is expected to do so that it's as
close to reproducing the user interface of TripleO 1:1.

For example, we document users running commands from a shell prompt.
Therefore, oooq ought to do the same thing as close as possible.
Obviously there will be gaps, just as there is with tripleo.sh, but I
feel that both tools (tripleo.sh/oooq) were trying to be faithful to
our published docs as mush as possible, and I think there's something
to be commended there.

That is exactly right James, CI should be as close to a user driven install
as possible IMHO.

David you are conflating two use cases as far as I can tell. The first use
case (a) ansible used in the project/product that is launched by
openstack/project commands, and the second use case (b) ansible as a
wrapper around commands that users are expected to execute.

Using navtive ansible modules as part of the project/product (a) as James
is describing is perfectly fine and ansible, ARA and other tools work
really well here.

If the CI reinterprets user level commands (b) directly into ansible module
calls you basically loose the 1:1 mapping between CI, documentation and
user experience.
The most important function of CI is guarantee that users can follow the
documentation and have a defect free experience [docs]. Having to "look at
the logs" is a very small
price to pay to preserve that experience. I think we'll be able to get
the logs from the templated bash into ARA, we just need a little time to
get that done.
IMHO CI is a very different topic than what James is talking about in this
thread and hopefully won't interupt this converstation further.

Thanks

[docs]
https://docs.openstack.org/tripleo-quickstart/latest/design.html#problem-help-make-the-deployment-steps-easier-to-understand

Not saying it's right or wong, just that I believe that was the intent.

An alternative would be custom ansible modules that exposed tasks for
interfacing with our API directly. That would also be valuable, as
that code path is mostly untested now outside of the UI and CLI.

I think that tripleo-quickstart is a slightly different class of
"thing" from the other current Ansible uses I mentioned, in that it
sits at a layer above everything else. It's meant to automate TripleO
itself vs TripleO automating things. Regardless, we should certainly
consider how it fits into a larger plan.

--
-- James Slagle
--


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 10, 2017 by Wesley_Hayutin (2,320 points)   2
0 votes

On 07/07/2017 07:50 PM, James Slagle wrote:
I proposed a session for the PTG
(https://etherpad.openstack.org/p/tripleo-ptg-queens) about forming a
common plan and vision around Ansible in TripleO.

I think it's important however that we kick this discussion off more
broadly before the PTG, so that we can hopefully have some agreement
for deeper discussions and prototyping when we actually meet in
person.

Right now, we have multiple uses of Ansible in TripleO:

having worked on one of the versions listed, I would like to add some
comments

(0) tripleo-quickstart which follows the common and well accepted
approach to bundling a set of Ansible playbooks/roles.

this approach does not consume config data from heat; I don't think it
fits in the same category of the others

(1) Mistral calling Ansible. This is the approach used by
tripleo-validations where Mistral directly executes ansible playbooks
using a dynamic inventory. The inventory is constructed from the
server related stack outputs of the overcloud stack.

this approach is actually very similar to (3), with the main difference
that ansible is executed only after the stack is complete to be able
to build the dynamic inventory; in fact the flow looks like this:

tripleoclient -> mistral -> heat -> tripleoclient -> mistral (<< heat)

we couldn't use this same approach for ceph-ansible because we needed
the workflow to be executed during a specific overcloud deployment step;
if we had migrated to splitstack already, it might have been possible
(not sure though, more about this later)

(2) Ansible running playbooks against localhost triggered by the
heat-config Ansible hook. This approach is used by
tripleo-heat-templates for upgrade tasks and various tasks for
deploying containers.

we couldn't use this approach either because we needed to run an
unmodified version of ceph-ansible and provide to it the list of role
hosts in one shot so that ceph-ansible could manage the task
dependencies and ordering by itself; running on localhost wouldn't fit

(3) Mistral calling Heat calling Mistral calling Ansible. In this
approach, we have Mistral resources in tripleo-heat-templates that are
created as part of the overcloud stack and in turn, the created
Mistral action executions run ansible. This has been prototyped with
using ceph-ansible to install Ceph as part of the overcloud
deployment, and some of the work has already landed. There are also
proposed WIP patches using this approach to install Kubernetes.

as per my comment about (1), this allows for execution of the workflows
to happen during the stack creation (at one or multiple deployment steps)

workflow tasks are described on a per-service basis, within the heat
templates and executions have access to the existing roles
config_settings which we also use for puppet

it allows interleaving of the puppet/workflow steps, which is a feature
we use for ceph-ansible for example to configure the firewall on the
nodes (using the established puppet manifests) before ceph-ansible
starts; we run ceph-ansible unmodified and users can provide arbitrary
extra vars to ceph-ansible via a heat parameter; the flow looks like this:

tripleoclient -> mistral -> heat -> mistral

also note, the workflows can run ansible (like it happens for
ceph-ansible) but don't need to, workflows can use any mistral action
and even define custom ones

I have proposed a topic for the ptg to discuss the above, I am sure it
can be extended and improved but IMHO it provides for a compelling set
of features (all of which we wanted/use for ceph-ansible)

There are also some ideas forming around pulling the Ansible playbooks
and vars out of Heat so that they can be rerun (or run initially)
independently from the Heat SoftwareDeployment delivery mechanism:

(4) https://review.openstack.org/#/c/454816/

(5) Another idea I'd like to prototype is a local tool that runs on
the undercloud and pulls all of the SoftwareDeployment data out of
Heat as the stack is being created and generates corresponding Ansible
playbooks to apply those deployments. Once a given playbook is
generated by the tool, the tool would signal back to Heat that the
deployment is complete. Heat then creates the whole stack without
actually applying a single deployment to an overcloud node. At that
point, Ansible (or Mistral->Ansible for an API) would be used to do
the actual deployment of the Overcloud with the Undercloud as the
ansible runner.

this seems interesting to me; do I understand correctly that if we keep
understanding of the deployment steps in heat then the flow would look like:

tripleoclient -> loop(mistral -> heat)

if so I think we'd need to move (or duplicate) some understanding about
the deployment steps from heat into mistral (as opposed to the approach
in (3) which keeps all the understanding in heat); I am not sure if
having this information in two tools will help in the long term but I
guess it has to be weighted with its pros

[...]

I recognize that saying "moving away from Heat" may be quite
controversial. While it's not 100% the same discussion as what we are
doing with Ansible, I think it is a big part of the discussion and if
we want to continue with Heat as the primary orchestration tool in
TripleO.

I think this is a key question for the conversation we'll have; the
approach in (3) is based on the idea that heat stays and keeps
understanding of what/when is happening in the templates; I think we are
testing use of heat for the deployment of the undercloud cloud with the
intent to reuse this understanding.
--
Giulio Fidente
GPG KEY: 08D733BA


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 10, 2017 by Giulio_Fidente (3,980 points)   4 4
0 votes

On Fri, Jul 7, 2017 at 6:50 PM, James Slagle james.slagle@gmail.com wrote:
I proposed a session for the PTG
(https://etherpad.openstack.org/p/tripleo-ptg-queens) about forming a
common plan and vision around Ansible in TripleO.

I think it's important however that we kick this discussion off more
broadly before the PTG, so that we can hopefully have some agreement
for deeper discussions and prototyping when we actually meet in
person.

Thanks for starting this James, it's a topic that I've also been
giving quite a lot of thought to lately (and as you've seen, have
pushed some related patches) so it's good to get some broader
discussions going.

Right now, we have multiple uses of Ansible in TripleO:

(0) tripleo-quickstart which follows the common and well accepted
approach to bundling a set of Ansible playbooks/roles.

FWIW I agree with Giulio that quickstart is a separate case, and while
I also do agree with David that there's plenty of scope for
improvement of the oooq user experience, but I'm going to focus on the
TripleO deployment aspects below.

(1) Mistral calling Ansible. This is the approach used by
tripleo-validations where Mistral directly executes ansible playbooks
using a dynamic inventory. The inventory is constructed from the
server related stack outputs of the overcloud stack.

(2) Ansible running playbooks against localhost triggered by the
heat-config Ansible hook. This approach is used by
tripleo-heat-templates for upgrade tasks and various tasks for
deploying containers.

(3) Mistral calling Heat calling Mistral calling Ansible. In this
approach, we have Mistral resources in tripleo-heat-templates that are
created as part of the overcloud stack and in turn, the created
Mistral action executions run ansible. This has been prototyped with
using ceph-ansible to install Ceph as part of the overcloud
deployment, and some of the work has already landed. There are also
proposed WIP patches using this approach to install Kubernetes.

There are also some ideas forming around pulling the Ansible playbooks
and vars out of Heat so that they can be rerun (or run initially)
independently from the Heat SoftwareDeployment delivery mechanism:

(4) https://review.openstack.org/#/c/454816/

(5) Another idea I'd like to prototype is a local tool that runs on
the undercloud and pulls all of the SoftwareDeployment data out of
Heat as the stack is being created and generates corresponding Ansible
playbooks to apply those deployments. Once a given playbook is
generated by the tool, the tool would signal back to Heat that the
deployment is complete. Heat then creates the whole stack without
actually applying a single deployment to an overcloud node. At that
point, Ansible (or Mistral->Ansible for an API) would be used to do
the actual deployment of the Overcloud with the Undercloud as the
ansible runner.

Yeah so my idea with (4), and subsequent patches such as[1] is to
gradually move the deploy steps performed to configure services (on
baremetal and in containers) to a single ansible playbook.

There's currently still heat orchestration around the host preparation
(although this is performed via ansible) and iteration over each step
(where we re-apply the same deploy-steps playbook with an incrementing
step variable, but this could be replaced by e.g an ansible or mistral
loop), but my idea was to enable end-to-end configuration of nodes via
ansible-playbook, without the need for any special tooks (e.g we
refactor t-h-t enough that we don't need any special tools, and we
make deploy-steps-playbook.yaml the only method of deployment (for
baremetal and container cases)

[1] https://review.openstack.org/#/c/462211/

All of this work has merit as we investigate longer term plans, and
it's all at different stages with some being for dev/CI (0), some
being used already in production (1 and 2), some just at the
experimental stage (3 and 4), and some does not exist other than an
idea (5).

I'd like to get the remaining work for (4) done so it's a supportable
option for minor updates, but there's still a bit more t-h-t
refactoring required to enable it I think, but I think we're already
pretty close to being able to run end-to-end ansible for most of the
PostDeploy steps without any special tooling.

Note this related patch from Matthieu:

https://review.openstack.org/#/c/444224/

I think we'll need to go further here but it's a starting point which
shows how we could expose ansible tasks from the heat stack outputs as
a first step to enabling standalone configuration via ansible (or
mistral->ansible)

My intent with this mail is to start a discussion around what we've
learned from these approaches and start discussing a consolidated plan
around Ansible. And I'm not saying that whatever we come up with
should only use Ansible a certain way. Just that we ought to look at
how users/operators interact with Ansible and TripleO today and try
and come up with the best solution(s) going forward.

I think that (1) has been pretty successful, and my idea with (5)
would use a similar approach once the playbooks were generated.
Further, my idea with (5) would give us a fully backwards compatible
solution with our existing template interfaces from
tripleo-heat-templates. Longer term (or even in parallel for some
time), the generated playbooks could stop being generated (and just
exist in git), and we could consider moving away from Heat more
permanently

Yeah I think working towards aligning more TripleO configuration with
the approach taken by tripleo-validations is fine, and we can e.g add
more heat generated data about the nodes to the dynamic ansible
inventory:

https://github.com/openstack/tripleo-validations/blob/master/tripleo_validations/inventory.py

We've been gradually adding data there, which I hope will enable a
cleaner "split stack", where the nodes are deployed via heat, then
ansible can do the configuration based on data exposed via stack
outputs (which again is a pattern that I think has been proven to work
quite well for tripleo-validations, and is also something I've been
using locally for dev testing quite successfully).

I recognize that saying "moving away from Heat" may be quite
controversial. While it's not 100% the same discussion as what we are
doing with Ansible, I think it is a big part of the discussion and if
we want to continue with Heat as the primary orchestration tool in
TripleO.

Yeah, I think the first step is to focus on a clean "split stack"
model where the nodes/networks etc are still deployed via heat, then
ansible handles the configuration of the nodes.

In the long term I could see benefits in a "tripleo lite" model,
where, say, we only used mistral+Ironic+ansible, but IMO we're not at
the point yet where that's achievable, primarily because there's
coupling between the heat parameter interfaces and multiple
integrations we can't break (e.g users with environment files,
tripleo-ui, vendor integrations, etc).

It's a good discussion to kick off regardless though, so personally
I'd like to focus on these as the first "baby steps":

  1. How to perform end-to-end configuration via ansible (outside of
    heat, but probably still using data and possibly playbooks generated
    by heat)

  2. How to deploy nodes directly via Ironic, with a mistral workflow
    (e.g no Nova and potentially no Neutron?), I started that in
    https://review.openstack.org/#/c/313048/ but could use some help
    completing it.

I've been hearing a lot of feedback from various operators about how
difficult the baremetal deployment is with Heat. While feedback about
Ironic is generally positive, a lot of the negative feedback is around
the Heat->Nova->Ironic interaction. And, if we also move more towards
Ansible for the service deployment, I wonder if there is still a long
term place for Heat at all.

So while there are plenty of valid complaints, one observation is Heat
always gets blamed because it's the operator visible interface, but
quite often the problems are e.g Nova or some other non-heat issue,
for example "No valid host found" is often perceived a heat problem by
new users when in reality it's not.

That said, there are valid complaints around the SoftwareDeployment
approach and operator familiarity vs some more traditional tool such
as ansible.

Personally, I'm pretty apprehensive about the approach taken in (3). I
feel that it is a lot of complexity that could be done simpler if we
took a step back and thought more about a longer term approach. I
recognize that it's mostly an experiment/POC at this stage, and I'm
not trying to directly knock down the approach. It's just that when I
start to see more patches (Kubernetes installation) using the same
approach, I figure it's worth discussing more broadly vs trying to
have a discussion by -1'ing patch reviews, etc.

I agree, I think the approach in (3) is a stopgap until we can define
a cleaner approach with less layers.

IMO the first step towards that is likely to be a "split stack" which
outputs heat data, then deployment configuration is performed via
mistral->ansible just like we already do in (1).

I'm interested in all feedback of course. And I plan to take a shot at
working on the prototype I mentioned in (5) if anyone would like to
collaborate around that.

I'm very happy to collaborate, and this is quite closely related to
the investigations I've been doing around enabling minor updates for
containers.

Lets sync up about it, but as I mentioned above I'm not yet fully sold
on a new translation tool, vs just more t-h-t refactoring to enable
output of data directly consumable via ansible-playbook (which can
then be run via operators, or heat, or mistral, or whatever).

I think if we can form some broad agreement before the PTG, we have a
chance at making some meaningful progress during Queens.

Agreed, although we probably do need to make some more progress on
some aspects of this for container minor updates that we'll need for
Pike.

Thanks,

Steve


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 10, 2017 by Steven_Hardy (16,900 points)   2 7 12
0 votes

On Sun, Jul 9, 2017 at 8:44 AM, Yolanda Robla Mota yroblamo@redhat.com wrote:
What i'd like to dig more is how Ansible and Heat can live together. And
what features do Heat offer that are not covered by Ansible as well? Is
there still the need to have Heat as the main engine, or could that be
replaced by Ansible totally in the future?

The main interface provided by Heat which AFAIK cannot currently be
replaced by Ansible is the parameters schema, where the template
parameters are exposed (that include description, type and constraint
data) in a format that is useful to e.g building the interfaces for
tripleo-ui

Ansible has a different approach to role/playbook parameters AFAICT,
which is more a global namespace with no type validation, no way to
include description data or tags with variable declarations, and no
way to specify constraints (other than perhaps hainvg custom modules
or playbook patterns that perform the validations early in the
deployment).

This is kind of similar to how the global namespace for hiera works
with our puppet model, although that at least has the advantage of
namespacing foo::something::variable, which again doesn't have a
direct equivalent in the ansible role model AFAIK (happy to be
corrected here, I'm not an ansible expert :)

For these reasons (as mentioned in my reply to James), I think a first
step of a "split stack" model where heat deploys the nodes/networks
etc, then outputs data that can be consumed by Ansible is reasonable -
it leaves the operator interfaces alone for now, and gives us time to
think about the interface changes that may be needed long term, while
still giving most of the operator-debug and usability/scalabilty
benefits that I think folks pushing for Ansible are looking for.

Steve

On Sat, Jul 8, 2017 at 12:20 AM, James Slagle james.slagle@gmail.com
wrote:

On Fri, Jul 7, 2017 at 5:31 PM, David Moreau Simard dms@redhat.com
wrote:

On Fri, Jul 7, 2017 at 1:50 PM, James Slagle james.slagle@gmail.com
wrote:

(0) tripleo-quickstart which follows the common and well accepted
approach to bundling a set of Ansible playbooks/roles.

I don't want to de-rail the thread but I really want to bring some
attention to a pattern that tripleo-quickstart has been using across
it's playbooks and roles.
I sincerely hope that we can find a better implementation should we
start developing new things from scratch.

Yes, just to clarify...by "well accepted" I just meant how the git
repo is organized and how you are expected to interface with those
playbooks and roles as opposed to what those playbooks/roles actually
do.

I'll sound like a broken record for those that have heard me mention
this before but for those that haven't, here's a concrete example of
how things are done today:
(Sorry for the link overload, making sure the relevant information is
available)

For an example tripleo-quickstart job, here's the console [1] and it's
corresponding ARA report [2]:
- A bash script is created [3][4][5] from a jinja template [6]
- A task executes the bash script [7][8][9]

From my limited experience, I believe the intent was that the
playbooks should do what a user is expected to do so that it's as
close to reproducing the user interface of TripleO 1:1.

For example, we document users running commands from a shell prompt.
Therefore, oooq ought to do the same thing as close as possible.
Obviously there will be gaps, just as there is with tripleo.sh, but I
feel that both tools (tripleo.sh/oooq) were trying to be faithful to
our published docs as mush as possible, and I think there's something
to be commended there.

Not saying it's right or wong, just that I believe that was the intent.

An alternative would be custom ansible modules that exposed tasks for
interfacing with our API directly. That would also be valuable, as
that code path is mostly untested now outside of the UI and CLI.

I think that tripleo-quickstart is a slightly different class of
"thing" from the other current Ansible uses I mentioned, in that it
sits at a layer above everything else. It's meant to automate TripleO
itself vs TripleO automating things. Regardless, we should certainly
consider how it fits into a larger plan.

--
-- James Slagle
--


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--

Yolanda Robla Mota

Principal Software Engineer, RHCE

Red Hat

C/Avellana 213

Urb Portugal

yroblamo@redhat.com M: +34605641639


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 10, 2017 by Steven_Hardy (16,900 points)   2 7 12
...