settingsLogin | Registersettings

[openstack-dev] [TripleO] containerized undercloud in Queens

0 votes

One of the things the TripleO containers team is planning on tackling
in Queens is fully containerizing the undercloud. At the PTG we created
an etherpad [1] that contains a list of features that need to be
implemented to fully replace instack-undercloud.

Benefits of this work:

-Alignment: aligning the undercloud and overcloud installers gets rid
of dual maintenance of services.

-Composability: tripleo-heat-templates and our new Ansible
architecture around it are composable. This means any set of services
can be used to build up your own undercloud. In other words the
framework here isn't just useful for "underclouds". It is really the
ability to deploy Tripleo on a single node with no external
dependencies. Single node TripleO installer. The containers team has
already been leveraging existing (experimental) undercloud_deploy
installer to develop services for Pike.

-Development: The containerized undercloud is a great development
tool. It utilizes the same framework as the full overcloud deployment
but takes about 20 minutes to deploy. This means faster iterations,
less waiting, and more testing. Having this be a first class citizen
in the ecosystem will ensure this platform is functioning for
developers to use all the time.

-CI resources: better use of CI resources. At the PTG we received
feedback from the OpenStack infrastructure team that our upstream CI
resource usage is quite high at times (even as high as 50% of the
total). Because of the shared framework and single node capabilities we
can re-architecture much of our upstream CI matrix around single node.
We no longer require multinode jobs to be able to test many of the
services in tripleo-heat-templates... we can just use a single cloud VM
instead. We'll still want multinode undercloud -> overcloud jobs for
testing things like HA and baremetal provisioning. But we can cover a
large set of the services (in particular many of the new scenario jobs
we added in Pike) with single node CI test runs in much less time.

-Containers: There are no plans to containerize the existing instack-
undercloud work. By moving our undercloud installer to a tripleo-heat-
templates and Ansible architecture we can leverage containers.
Interestingly, the same installer also supports baremetal (package)
installation as well at this point. Like to overcloud however I think
making containers our undercloud default would better align the TripleO
tooling.

We are actively working through a few issues with the deployment
framework Ansible effort to fully integrate that into the undercloud
installer. We are also reaching out to other teams like the UI and
Security folks to coordinate the efforts around those components. If
there are any questions about the effort or you'd like to be involved
in the implementation let us know. Stay tuned for more specific updates
as we organize to get as much of this in M1 and M2 as possible.

On behalf of the containers team,

Dan

[1] https://etherpad.openstack.org/p/tripleo-queens-undercloud-containe
rs


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Nov 8, 2017 in openstack-dev by Dan_Prince (8,160 points)   1 4 6

33 Responses

0 votes

Hey Dan,

Thanks for sending out a note about this. I have a few questions inline.

On Mon, Oct 2, 2017 at 6:02 AM, Dan Prince dprince@redhat.com wrote:
One of the things the TripleO containers team is planning on tackling
in Queens is fully containerizing the undercloud. At the PTG we created
an etherpad [1] that contains a list of features that need to be
implemented to fully replace instack-undercloud.

I know we talked about this at the PTG and I was skeptical that this
will land in Queens. With the exception of the Container's team
wanting this, I'm not sure there is an actual end user who is looking
for the feature so I want to make sure we're not just doing more work
because we as developers think it's a good idea. Given that etherpad
appears to contain a pretty big list of features, are we going to be
able to land all of them by M2? Would it be beneficial to craft a
basic spec related to this to ensure we are not missing additional
things?

Benefits of this work:

-Alignment: aligning the undercloud and overcloud installers gets rid
of dual maintenance of services.

I like reusing existing stuff. +1

-Composability: tripleo-heat-templates and our new Ansible
architecture around it are composable. This means any set of services
can be used to build up your own undercloud. In other words the
framework here isn't just useful for "underclouds". It is really the
ability to deploy Tripleo on a single node with no external
dependencies. Single node TripleO installer. The containers team has
already been leveraging existing (experimental) undercloud_deploy
installer to develop services for Pike.

Is this something that is actually being asked for or is this just an
added bonus because it allows developers to reduce what is actually
being deployed for testing?

-Development: The containerized undercloud is a great development
tool. It utilizes the same framework as the full overcloud deployment
but takes about 20 minutes to deploy. This means faster iterations,
less waiting, and more testing. Having this be a first class citizen
in the ecosystem will ensure this platform is functioning for
developers to use all the time.

Seems to go with the previous question about the re-usability for
people who are not developers. Has everyone (including non-container
folks) tried this out and attest that it's a better workflow for them?
Are there use cases that are made worse by switching?

-CI resources: better use of CI resources. At the PTG we received
feedback from the OpenStack infrastructure team that our upstream CI
resource usage is quite high at times (even as high as 50% of the
total). Because of the shared framework and single node capabilities we
can re-architecture much of our upstream CI matrix around single node.
We no longer require multinode jobs to be able to test many of the
services in tripleo-heat-templates... we can just use a single cloud VM
instead. We'll still want multinode undercloud -> overcloud jobs for
testing things like HA and baremetal provisioning. But we can cover a
large set of the services (in particular many of the new scenario jobs
we added in Pike) with single node CI test runs in much less time.

I like this idea but would like to see more details around this.
Since this is a new feature we need to make sure that we are properly
covering the containerized undercloud with CI as well. I think we
need 3 jobs to properly cover this feature before marking it done. I
added them to the etherpad but I think we need to ensure the following
3 jobs are defined and voting by M2 to consider actually switching
from the current instack-undercloud installation to the containerized
version.

1) undercloud-containers - a containerized install, should be voting by m1
2) undercloud-containers-update - minor updates run on containerized
underclouds, should be voting by m2
3) undercloud-containers-upgrade - major upgrade from
non-containerized to containerized undercloud, should be voting by m2.

If we have these jobs, is there anything we can drop or mark as
covered that is currently being covered by an overcloud job?

-Containers: There are no plans to containerize the existing instack-
undercloud work. By moving our undercloud installer to a tripleo-heat-
templates and Ansible architecture we can leverage containers.
Interestingly, the same installer also supports baremetal (package)
installation as well at this point. Like to overcloud however I think
making containers our undercloud default would better align the TripleO
tooling.

We are actively working through a few issues with the deployment
framework Ansible effort to fully integrate that into the undercloud
installer. We are also reaching out to other teams like the UI and
Security folks to coordinate the efforts around those components. If
there are any questions about the effort or you'd like to be involved
in the implementation let us know. Stay tuned for more specific updates
as we organize to get as much of this in M1 and M2 as possible.

I would like to see weekly updates on this effort during the IRC
meeting. As previously mentioned around squad status, I'll be asking
for them during the meeting so it would be nice to get an update this
on a weekly basis so we can make sure that we'll be OK to cut over.

Also what does the cut over plan look like? This is something that
might be beneficial to have in a spec. IMHO, I'm ok to continue
pushing the container effort using the openstack undercloud deploy
method for now. Once we have voting CI jobs and the feature list has
been covered then we can evaluate if we've made the M2 time frame to
switching openstack undercloud deploy to be the new undercloud
install. I want to make sure we don't introduce regressions and are
doing thing in a user friendly fashion since the undercloud is the
first intro an end user gets to tripleo. It would be a good idea to
review what the new install process looks like and make sure it "just
works" given that the current process[0] (with all it's flaws) is
fairly trivial to perform.

Thanks,
-Alex

[0] https://docs.openstack.org/tripleo-docs/latest/install/installation/installation.html#installing-the-undercloud

On behalf of the containers team,

Dan

[1] https://etherpad.openstack.org/p/tripleo-queens-undercloud-containe
rs


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 2, 2017 by aschultz_at_redhat.c (5,800 points)   2 2 4
0 votes

On Mon, 2017-10-02 at 15:20 -0600, Alex Schultz wrote:
Hey Dan,

Thanks for sending out a note about this. I have a few questions
inline.

On Mon, Oct 2, 2017 at 6:02 AM, Dan Prince dprince@redhat.com
wrote:

One of the things the TripleO containers team is planning on
tackling
in Queens is fully containerizing the undercloud. At the PTG we
created
an etherpad [1] that contains a list of features that need to be
implemented to fully replace instack-undercloud.

I know we talked about this at the PTG and I was skeptical that this
will land in Queens. With the exception of the Container's team
wanting this, I'm not sure there is an actual end user who is looking
for the feature so I want to make sure we're not just doing more work
because we as developers think it's a good idea.

I've heard from several operators that they were actually surprised we
implemented containers in the Overcloud first. Validating a new
deployment framework on a single node Undercloud (for operators) before
overtaking their entire cloud deployment has a lot of merit to it IMO.
When you share the same deployment architecture across the
overcloud/undercloud it puts us in a better position to decide where to
expose new features to operators first (when creating the undercloud or
overcloud for example).

Also, if you read my email again I've explicitly listed the
"Containers" benefit last. While I think moving the undercloud to
containers is a great benefit all by itself this is more of a
"framework alignment" in TripleO and gets us out of maintaining huge
amounts of technical debt. Re-using the same framework for the
undercloud and overcloud has a lot of merit. It effectively streamlines
the development process for service developers, and 3rd parties wishing
to integrate some of their components on a single node. Why be forced
to create a multi-node dev environment if you don't have to (aren't
using HA for example).

Lets be honest. While instack-undercloud helped solve the old "seed" VM
issue it was outdated the day it landed upstream. The entire premise of
the tool is that it uses old style "elements" to create the undercloud
and we moved away from those as the primary means driving the creation
of the Overcloud years ago at this point. The new 'undercloud_deploy'
installer gets us back to our roots by once again sharing the same
architecture to create the over and underclouds. A demo from long ago
expands on this idea a bit: https://www.youtube.com/watch?v=y1qMDLAf26
Q&t=5s

In short, we aren't just doing more work because developers think it is
a good idea. This has potential to be one of the most useful
architectural changes in TripleO that we've made in years. Could
significantly decrease our CI reasources if we use it to replace the
existing scenarios jobs which take multiple VMs per job. Is a building
block we could use for other features like and HA undercloud. And yes,
it does also have a huge impact on developer velocity in that many of
us already prefer to use the tool as a means of streamlining our
dev/test cycles to minutes instead of hours. Why spend hours running
quickstart Ansible scripts when in many cases you can just doit.sh. htt
ps://github.com/dprince/undercloud_containers/blob/master/doit.sh

Lastly, this isn't just a containers team thing. We've been using the
underclouddeploy architecture across many teams to help develop for
almost an entire cycle now. Huge benefits. I would go as far as saying
that undercloud
deploy was the biggest feature in Pike that enabled
us to bang out a majority of the docker/service templates in tripleo-
heat-templates.

Given that etherpad
appears to contain a pretty big list of features, are we going to be
able to land all of them by M2? Would it be beneficial to craft a
basic spec related to this to ensure we are not missing additional
things?

I'm not sure there is a lot of value in creating a spec at this point.
We've already got an approved blueprint for the feature in Pike here: h
ttps://blueprints.launchpad.net/tripleo/+spec/containerized-undercloud

I think we might get more velocity out of grooming the etherpad and
perhaps dividing this work among the appropriate teams.

Benefits of this work:

-Alignment: aligning the undercloud and overcloud installers gets
rid
of dual maintenance of services.

I like reusing existing stuff. +1

-Composability: tripleo-heat-templates and our new Ansible
architecture around it are composable. This means any set of
services
can be used to build up your own undercloud. In other words the
framework here isn't just useful for "underclouds". It is really
the
ability to deploy Tripleo on a single node with no external
dependencies. Single node TripleO installer. The containers team
has
already been leveraging existing (experimental) undercloud_deploy
installer to develop services for Pike.

Is this something that is actually being asked for or is this just an
added bonus because it allows developers to reduce what is actually
being deployed for testing?

There is an implied ask for this feature when a new developer starts to
use TripleO. Right now resource bar is quite high for TripleO. You have
to have a multi-node development environment at the very least (one
undercloud node, and one overcloud node). The ideas we are talking
about here short circuits this in many cases... where if you aren't
testing HA services or Ironic you could simple use undercloud_deploy to
test tripleo-heat-template changes on a single VM. Less resources, and
much less time spent learning and waiting.

-Development: The containerized undercloud is a great development
tool. It utilizes the same framework as the full overcloud
deployment
but takes about 20 minutes to deploy. This means faster
iterations,
less waiting, and more testing. Having this be a first class
citizen
in the ecosystem will ensure this platform is functioning for
developers to use all the time.

Seems to go with the previous question about the re-usability for
people who are not developers. Has everyone (including non-container
folks) tried this out and attest that it's a better workflow for
them?
Are there use cases that are made worse by switching?

I would let other chime in but the feedback I've gotten has mostly been
that it improves the dev/test cycle greatly.

-CI resources: better use of CI resources. At the PTG we received
feedback from the OpenStack infrastructure team that our upstream
CI
resource usage is quite high at times (even as high as 50% of the
total). Because of the shared framework and single node
capabilities we
can re-architecture much of our upstream CI matrix around single
node.
We no longer require multinode jobs to be able to test many of the
services in tripleo-heat-templates... we can just use a single
cloud VM
instead. We'll still want multinode undercloud -> overcloud jobs
for
testing things like HA and baremetal provisioning. But we can cover
a
large set of the services (in particular many of the new scenario
jobs
we added in Pike) with single node CI test runs in much less time.

I like this idea but would like to see more details around this.
Since this is a new feature we need to make sure that we are properly
covering the containerized undercloud with CI as well. I think we
need 3 jobs to properly cover this feature before marking it done. I
added them to the etherpad but I think we need to ensure the
following
3 jobs are defined and voting by M2 to consider actually switching
from the current instack-undercloud installation to the containerized
version.

1) undercloud-containers - a containerized install, should be voting
by m1
2) undercloud-containers-update - minor updates run on containerized
underclouds, should be voting by m2
3) undercloud-containers-upgrade - major upgrade from
non-containerized to containerized undercloud, should be voting by
m2.

If we have these jobs, is there anything we can drop or mark as
covered that is currently being covered by an overcloud job?

-Containers: There are no plans to containerize the existing
instack-
undercloud work. By moving our undercloud installer to a tripleo-
heat-
templates and Ansible architecture we can leverage containers.
Interestingly, the same installer also supports baremetal (package)
installation as well at this point. Like to overcloud however I
think
making containers our undercloud default would better align the
TripleO
tooling.

We are actively working through a few issues with the deployment
framework Ansible effort to fully integrate that into the
undercloud
installer. We are also reaching out to other teams like the UI and
Security folks to coordinate the efforts around those components.
If
there are any questions about the effort or you'd like to be
involved
in the implementation let us know. Stay tuned for more specific
updates
as we organize to get as much of this in M1 and M2 as possible.

I would like to see weekly updates on this effort during the IRC
meeting. As previously mentioned around squad status, I'll be asking
for them during the meeting so it would be nice to get an update this
on a weekly basis so we can make sure that we'll be OK to cut over.

Also what does the cut over plan look like? This is something that
might be beneficial to have in a spec. IMHO, I'm ok to continue
pushing the container effort using the openstack undercloud deploy
method for now. Once we have voting CI jobs and the feature list has
been covered then we can evaluate if we've made the M2 time frame to
switching openstack undercloud deploy to be the new undercloud
install. I want to make sure we don't introduce regressions and are
doing thing in a user friendly fashion since the undercloud is the
first intro an end user gets to tripleo. It would be a good idea to
review what the new install process looks like and make sure it "just
works" given that the current process[0] (with all it's flaws) is
fairly trivial to perform.

Thanks,
-Alex

[0] https://docs.openstack.org/tripleo-docs/latest/install/installati
on/installation.html#installing-the-undercloud

On behalf of the containers team,

Dan

[1] https://etherpad.openstack.org/p/tripleo-queens-undercloud-cont
aine
rs



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsu
bscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubs
cribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 3, 2017 by Dan_Prince (8,160 points)   1 4 6
0 votes

On Tue, Oct 3, 2017 at 10:12 AM, Dan Prince dprince@redhat.com wrote:
[...]
I would let other chime in but the feedback I've gotten has mostly been
that it improves the dev/test cycle greatly.

[...]

I like both aschultz & dprince thoughts here, I agree with both of you
on most of the points made here.
I think we need to engage more efforts on CI (see what Alex wrote
about milestone, we should try to respect that, it has proved to be
helpful) & documentation as well (let's push doc before end of m2).

If we can make CI & doc working before end of m2, it's at least a good
step forward.
Also, we could ship this feature as "experimental" in Queens and
"stable" in Rocky (even if it has been developed since more than one
cycle by dprince & co), I think it's a reasonable path for our users.
--
Emilien Macchi


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 3, 2017 by emilien_at_redhat.co (36,940 points)   2 6 9
0 votes

On Tue, Oct 3, 2017 at 11:12 AM, Dan Prince dprince@redhat.com wrote:
On Mon, 2017-10-02 at 15:20 -0600, Alex Schultz wrote:

Hey Dan,

Thanks for sending out a note about this. I have a few questions
inline.

On Mon, Oct 2, 2017 at 6:02 AM, Dan Prince dprince@redhat.com
wrote:

One of the things the TripleO containers team is planning on
tackling
in Queens is fully containerizing the undercloud. At the PTG we
created
an etherpad [1] that contains a list of features that need to be
implemented to fully replace instack-undercloud.

I know we talked about this at the PTG and I was skeptical that this
will land in Queens. With the exception of the Container's team
wanting this, I'm not sure there is an actual end user who is looking
for the feature so I want to make sure we're not just doing more work
because we as developers think it's a good idea.

I've heard from several operators that they were actually surprised we
implemented containers in the Overcloud first. Validating a new
deployment framework on a single node Undercloud (for operators) before
overtaking their entire cloud deployment has a lot of merit to it IMO.
When you share the same deployment architecture across the
overcloud/undercloud it puts us in a better position to decide where to
expose new features to operators first (when creating the undercloud or
overcloud for example).

Also, if you read my email again I've explicitly listed the
"Containers" benefit last. While I think moving the undercloud to
containers is a great benefit all by itself this is more of a
"framework alignment" in TripleO and gets us out of maintaining huge
amounts of technical debt. Re-using the same framework for the
undercloud and overcloud has a lot of merit. It effectively streamlines
the development process for service developers, and 3rd parties wishing
to integrate some of their components on a single node. Why be forced
to create a multi-node dev environment if you don't have to (aren't
using HA for example).

Lets be honest. While instack-undercloud helped solve the old "seed" VM
issue it was outdated the day it landed upstream. The entire premise of
the tool is that it uses old style "elements" to create the undercloud
and we moved away from those as the primary means driving the creation
of the Overcloud years ago at this point. The new 'undercloud_deploy'
installer gets us back to our roots by once again sharing the same
architecture to create the over and underclouds. A demo from long ago
expands on this idea a bit: https://www.youtube.com/watch?v=y1qMDLAf26
Q&t=5s

In short, we aren't just doing more work because developers think it is
a good idea. This has potential to be one of the most useful
architectural changes in TripleO that we've made in years. Could
significantly decrease our CI reasources if we use it to replace the
existing scenarios jobs which take multiple VMs per job. Is a building
block we could use for other features like and HA undercloud. And yes,
it does also have a huge impact on developer velocity in that many of
us already prefer to use the tool as a means of streamlining our
dev/test cycles to minutes instead of hours. Why spend hours running
quickstart Ansible scripts when in many cases you can just doit.sh. htt
ps://github.com/dprince/undercloud_containers/blob/master/doit.sh

So like I've repeatedly said, I'm not completely against it as I agree
what we have is not ideal. I'm not -2, I'm -1 pending additional
information. I'm trying to be realistic and reduce our risk for this
cycle. IMHO doit.sh is not acceptable as an undercloud installer and
this is what I've been trying to point out as the actual impact to the
end user who has to use this thing. We have an established
installation method for the undercloud, that while isn't great, isn't
a bash script with git fetches, etc. So as for the implementation,
this is what I want to see properly flushed out prior to accepting
this feature as complete for Queens (and the new default). I would
like to see a plan of what features need to be added (eg. the stuff on
the etherpad), folks assigned to do this work, and estimated
timelines. Given that we shouldn't be making major feature changes
after M2 (~9 weeks), I want to get an understanding of what is
realistically going to make it. If after reviewing the initial
details we find that it's not actually going to make M2, then let's
agree to this now rather than trying to force it in at the end.

I know you've been a great proponent of the containerized undercloud
and I agree it offers a lot more for development efforts. But I just
want to make sure that we are getting all the feedback we can before
continuing down this path. Since, as you point out, a bunch of this
work is already available for consumption by developers, I don't see
making it the new default as a requirement for Queens unless it's a
fully implemented and tested. There's nothing stopping folks from
using it now and making incremental improvements during Queens and we
commit to making it the new default for Rocky.

The point of this cycle was supposed to be more stablization/getting
all the containers in place. Doing something like this seems to go
against what we were actually trying to achieve. I'd rather make
smaller incremental progress with your proposal being the end goal and
agreeing that perhaps Rocky is more realistic for the default cut
over.

Lastly, this isn't just a containers team thing. We've been using the
underclouddeploy architecture across many teams to help develop for
almost an entire cycle now. Huge benefits. I would go as far as saying
that undercloud
deploy was the biggest feature in Pike that enabled
us to bang out a majority of the docker/service templates in tripleo-
heat-templates.

Given that etherpad
appears to contain a pretty big list of features, are we going to be
able to land all of them by M2? Would it be beneficial to craft a
basic spec related to this to ensure we are not missing additional
things?

I'm not sure there is a lot of value in creating a spec at this point.
We've already got an approved blueprint for the feature in Pike here: h
ttps://blueprints.launchpad.net/tripleo/+spec/containerized-undercloud

I think we might get more velocity out of grooming the etherpad and
perhaps dividing this work among the appropriate teams.

That's fine, but I would like to see additional efforts made to
organize this work, assign folks and add proper timelines.

Benefits of this work:

-Alignment: aligning the undercloud and overcloud installers gets
rid
of dual maintenance of services.

I like reusing existing stuff. +1

-Composability: tripleo-heat-templates and our new Ansible
architecture around it are composable. This means any set of
services
can be used to build up your own undercloud. In other words the
framework here isn't just useful for "underclouds". It is really
the
ability to deploy Tripleo on a single node with no external
dependencies. Single node TripleO installer. The containers team
has
already been leveraging existing (experimental) undercloud_deploy
installer to develop services for Pike.

Is this something that is actually being asked for or is this just an
added bonus because it allows developers to reduce what is actually
being deployed for testing?

There is an implied ask for this feature when a new developer starts to
use TripleO. Right now resource bar is quite high for TripleO. You have
to have a multi-node development environment at the very least (one
undercloud node, and one overcloud node). The ideas we are talking
about here short circuits this in many cases... where if you aren't
testing HA services or Ironic you could simple use undercloud_deploy to
test tripleo-heat-template changes on a single VM. Less resources, and
much less time spent learning and waiting.

IMHO I don't think the undercloud install is the limiting factor for
new developers and I'm not sure this is actually reducing that
complexity. It does reduce the amount of hardware needed to develop
some items, but there's a cost in complexity by moving the
configuration to THT which is already where many people struggle. As
I previously mentioned, there's nothing stopping us from promoting the
containerized undercloud as a development tool and ensuring it's full
featured before switching to it as the default at a later date.

-Development: The containerized undercloud is a great development
tool. It utilizes the same framework as the full overcloud
deployment
but takes about 20 minutes to deploy. This means faster
iterations,
less waiting, and more testing. Having this be a first class
citizen
in the ecosystem will ensure this platform is functioning for
developers to use all the time.

Seems to go with the previous question about the re-usability for
people who are not developers. Has everyone (including non-container
folks) tried this out and attest that it's a better workflow for
them?
Are there use cases that are made worse by switching?

I would let other chime in but the feedback I've gotten has mostly been
that it improves the dev/test cycle greatly.

-CI resources: better use of CI resources. At the PTG we received
feedback from the OpenStack infrastructure team that our upstream
CI
resource usage is quite high at times (even as high as 50% of the
total). Because of the shared framework and single node
capabilities we
can re-architecture much of our upstream CI matrix around single
node.
We no longer require multinode jobs to be able to test many of the
services in tripleo-heat-templates... we can just use a single
cloud VM
instead. We'll still want multinode undercloud -> overcloud jobs
for
testing things like HA and baremetal provisioning. But we can cover
a
large set of the services (in particular many of the new scenario
jobs
we added in Pike) with single node CI test runs in much less time.

I like this idea but would like to see more details around this.
Since this is a new feature we need to make sure that we are properly
covering the containerized undercloud with CI as well. I think we
need 3 jobs to properly cover this feature before marking it done. I
added them to the etherpad but I think we need to ensure the
following
3 jobs are defined and voting by M2 to consider actually switching
from the current instack-undercloud installation to the containerized
version.

1) undercloud-containers - a containerized install, should be voting
by m1
2) undercloud-containers-update - minor updates run on containerized
underclouds, should be voting by m2
3) undercloud-containers-upgrade - major upgrade from
non-containerized to containerized undercloud, should be voting by
m2.

If we have these jobs, is there anything we can drop or mark as
covered that is currently being covered by an overcloud job?

Can you please comment on these expectations as being achievable? If
they are not achievable, I don't think we can agree to switch the
default for Queens. As we shipped the 'undercloud deploy' as
experimental for Pike, it's well within reason to continue to do so
for Queens. Perhaps we change the labeling to beta or working it into
a --containerized option for 'undercloud install'.

I think my ask for the undercloud-containers job as non-voting by m1
is achievable today because it's currently green (pending any zuul
freezes). My concern is really minor updates and upgrades need to be
understood and accounted for ASAP. If we're truly able to reuse some
of the work we did for O->P upgrades, then these should be fairly
straight forward things to accomplish and there would be fewer
blockers to make the switch.

-Containers: There are no plans to containerize the existing
instack-
undercloud work. By moving our undercloud installer to a tripleo-
heat-
templates and Ansible architecture we can leverage containers.
Interestingly, the same installer also supports baremetal (package)
installation as well at this point. Like to overcloud however I
think
making containers our undercloud default would better align the
TripleO
tooling.

We are actively working through a few issues with the deployment
framework Ansible effort to fully integrate that into the
undercloud
installer. We are also reaching out to other teams like the UI and
Security folks to coordinate the efforts around those components.
If
there are any questions about the effort or you'd like to be
involved
in the implementation let us know. Stay tuned for more specific
updates
as we organize to get as much of this in M1 and M2 as possible.

I would like to see weekly updates on this effort during the IRC
meeting. As previously mentioned around squad status, I'll be asking
for them during the meeting so it would be nice to get an update this
on a weekly basis so we can make sure that we'll be OK to cut over.

Also what does the cut over plan look like? This is something that
might be beneficial to have in a spec. IMHO, I'm ok to continue
pushing the container effort using the openstack undercloud deploy
method for now. Once we have voting CI jobs and the feature list has
been covered then we can evaluate if we've made the M2 time frame to
switching openstack undercloud deploy to be the new undercloud
install. I want to make sure we don't introduce regressions and are
doing thing in a user friendly fashion since the undercloud is the
first intro an end user gets to tripleo. It would be a good idea to
review what the new install process looks like and make sure it "just
works" given that the current process[0] (with all it's flaws) is
fairly trivial to perform.

Basically what I would like to see before making this new default is:
1) minor updates work (with CI)
2) P->Q upgrades work (with CI)
3) Documentation complete
4) no UX impact for installation (eg. how they installed it before is
the same as they install it now for containers)

If these are accounted for and completed before M2 then I would be +2
on the switch.

Thanks,
-Alex

[0] https://docs.openstack.org/tripleo-docs/latest/install/installati
on/installation.html#installing-the-undercloud

On behalf of the containers team,

Dan

[1] https://etherpad.openstack.org/p/tripleo-queens-undercloud-cont
aine
rs



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsu
bscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubs
cribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 3, 2017 by aschultz_at_redhat.c (5,800 points)   2 2 4
0 votes

On Tue, Oct 3, 2017 at 1:50 PM, Alex Schultz aschultz@redhat.com wrote:
On Tue, Oct 3, 2017 at 11:12 AM, Dan Prince dprince@redhat.com wrote:

On Mon, 2017-10-02 at 15:20 -0600, Alex Schultz wrote:

Hey Dan,

Thanks for sending out a note about this. I have a few questions
inline.

On Mon, Oct 2, 2017 at 6:02 AM, Dan Prince dprince@redhat.com
wrote:

One of the things the TripleO containers team is planning on
tackling
in Queens is fully containerizing the undercloud. At the PTG we
created
an etherpad [1] that contains a list of features that need to be
implemented to fully replace instack-undercloud.

I know we talked about this at the PTG and I was skeptical that this
will land in Queens. With the exception of the Container's team
wanting this, I'm not sure there is an actual end user who is looking
for the feature so I want to make sure we're not just doing more work
because we as developers think it's a good idea.

I've heard from several operators that they were actually surprised we
implemented containers in the Overcloud first. Validating a new
deployment framework on a single node Undercloud (for operators) before
overtaking their entire cloud deployment has a lot of merit to it IMO.
When you share the same deployment architecture across the
overcloud/undercloud it puts us in a better position to decide where to
expose new features to operators first (when creating the undercloud or
overcloud for example).

Also, if you read my email again I've explicitly listed the
"Containers" benefit last. While I think moving the undercloud to
containers is a great benefit all by itself this is more of a
"framework alignment" in TripleO and gets us out of maintaining huge
amounts of technical debt. Re-using the same framework for the
undercloud and overcloud has a lot of merit. It effectively streamlines
the development process for service developers, and 3rd parties wishing
to integrate some of their components on a single node. Why be forced
to create a multi-node dev environment if you don't have to (aren't
using HA for example).

Lets be honest. While instack-undercloud helped solve the old "seed" VM
issue it was outdated the day it landed upstream. The entire premise of
the tool is that it uses old style "elements" to create the undercloud
and we moved away from those as the primary means driving the creation
of the Overcloud years ago at this point. The new 'undercloud_deploy'
installer gets us back to our roots by once again sharing the same
architecture to create the over and underclouds. A demo from long ago
expands on this idea a bit: https://www.youtube.com/watch?v=y1qMDLAf26
Q&t=5s

In short, we aren't just doing more work because developers think it is
a good idea. This has potential to be one of the most useful
architectural changes in TripleO that we've made in years. Could
significantly decrease our CI reasources if we use it to replace the
existing scenarios jobs which take multiple VMs per job. Is a building
block we could use for other features like and HA undercloud. And yes,
it does also have a huge impact on developer velocity in that many of
us already prefer to use the tool as a means of streamlining our
dev/test cycles to minutes instead of hours. Why spend hours running
quickstart Ansible scripts when in many cases you can just doit.sh. htt
ps://github.com/dprince/undercloud_containers/blob/master/doit.sh

So like I've repeatedly said, I'm not completely against it as I agree
what we have is not ideal. I'm not -2, I'm -1 pending additional
information. I'm trying to be realistic and reduce our risk for this
cycle. IMHO doit.sh is not acceptable as an undercloud installer and
this is what I've been trying to point out as the actual impact to the
end user who has to use this thing. We have an established
installation method for the undercloud, that while isn't great, isn't
a bash script with git fetches, etc. So as for the implementation,
this is what I want to see properly flushed out prior to accepting
this feature as complete for Queens (and the new default). I would
like to see a plan of what features need to be added (eg. the stuff on
the etherpad), folks assigned to do this work, and estimated
timelines. Given that we shouldn't be making major feature changes
after M2 (~9 weeks), I want to get an understanding of what is
realistically going to make it. If after reviewing the initial
details we find that it's not actually going to make M2, then let's
agree to this now rather than trying to force it in at the end.

I know you've been a great proponent of the containerized undercloud
and I agree it offers a lot more for development efforts. But I just
want to make sure that we are getting all the feedback we can before
continuing down this path. Since, as you point out, a bunch of this
work is already available for consumption by developers, I don't see
making it the new default as a requirement for Queens unless it's a
fully implemented and tested. There's nothing stopping folks from
using it now and making incremental improvements during Queens and we
commit to making it the new default for Rocky.

The point of this cycle was supposed to be more stablization/getting
all the containers in place. Doing something like this seems to go
against what we were actually trying to achieve. I'd rather make
smaller incremental progress with your proposal being the end goal and
agreeing that perhaps Rocky is more realistic for the default cut
over.

For clarification I meant, "stablization/getting all the overcloud
containers in place." I do see the alignment of having a
containerized undercloud with the overcloud container theme however I
think it introduces more risk and there's a bunch things (undercloud
updates/upgrades/UX) not necessarily being accounted which is where my
resistance comes from. I don't want to do yet another architectural
change without having these items accounted for. We've seen many
issues from the Pike cycle around these exact things for the
Overcloud. Let's not repeat our mistakes.

Lastly, this isn't just a containers team thing. We've been using the
underclouddeploy architecture across many teams to help develop for
almost an entire cycle now. Huge benefits. I would go as far as saying
that undercloud
deploy was the biggest feature in Pike that enabled
us to bang out a majority of the docker/service templates in tripleo-
heat-templates.

Given that etherpad
appears to contain a pretty big list of features, are we going to be
able to land all of them by M2? Would it be beneficial to craft a
basic spec related to this to ensure we are not missing additional
things?

I'm not sure there is a lot of value in creating a spec at this point.
We've already got an approved blueprint for the feature in Pike here: h
ttps://blueprints.launchpad.net/tripleo/+spec/containerized-undercloud

I think we might get more velocity out of grooming the etherpad and
perhaps dividing this work among the appropriate teams.

That's fine, but I would like to see additional efforts made to
organize this work, assign folks and add proper timelines.

Benefits of this work:

-Alignment: aligning the undercloud and overcloud installers gets
rid
of dual maintenance of services.

I like reusing existing stuff. +1

-Composability: tripleo-heat-templates and our new Ansible
architecture around it are composable. This means any set of
services
can be used to build up your own undercloud. In other words the
framework here isn't just useful for "underclouds". It is really
the
ability to deploy Tripleo on a single node with no external
dependencies. Single node TripleO installer. The containers team
has
already been leveraging existing (experimental) undercloud_deploy
installer to develop services for Pike.

Is this something that is actually being asked for or is this just an
added bonus because it allows developers to reduce what is actually
being deployed for testing?

There is an implied ask for this feature when a new developer starts to
use TripleO. Right now resource bar is quite high for TripleO. You have
to have a multi-node development environment at the very least (one
undercloud node, and one overcloud node). The ideas we are talking
about here short circuits this in many cases... where if you aren't
testing HA services or Ironic you could simple use undercloud_deploy to
test tripleo-heat-template changes on a single VM. Less resources, and
much less time spent learning and waiting.

IMHO I don't think the undercloud install is the limiting factor for
new developers and I'm not sure this is actually reducing that
complexity. It does reduce the amount of hardware needed to develop
some items, but there's a cost in complexity by moving the
configuration to THT which is already where many people struggle. As
I previously mentioned, there's nothing stopping us from promoting the
containerized undercloud as a development tool and ensuring it's full
featured before switching to it as the default at a later date.

-Development: The containerized undercloud is a great development
tool. It utilizes the same framework as the full overcloud
deployment
but takes about 20 minutes to deploy. This means faster
iterations,
less waiting, and more testing. Having this be a first class
citizen
in the ecosystem will ensure this platform is functioning for
developers to use all the time.

Seems to go with the previous question about the re-usability for
people who are not developers. Has everyone (including non-container
folks) tried this out and attest that it's a better workflow for
them?
Are there use cases that are made worse by switching?

I would let other chime in but the feedback I've gotten has mostly been
that it improves the dev/test cycle greatly.

-CI resources: better use of CI resources. At the PTG we received
feedback from the OpenStack infrastructure team that our upstream
CI
resource usage is quite high at times (even as high as 50% of the
total). Because of the shared framework and single node
capabilities we
can re-architecture much of our upstream CI matrix around single
node.
We no longer require multinode jobs to be able to test many of the
services in tripleo-heat-templates... we can just use a single
cloud VM
instead. We'll still want multinode undercloud -> overcloud jobs
for
testing things like HA and baremetal provisioning. But we can cover
a
large set of the services (in particular many of the new scenario
jobs
we added in Pike) with single node CI test runs in much less time.

I like this idea but would like to see more details around this.
Since this is a new feature we need to make sure that we are properly
covering the containerized undercloud with CI as well. I think we
need 3 jobs to properly cover this feature before marking it done. I
added them to the etherpad but I think we need to ensure the
following
3 jobs are defined and voting by M2 to consider actually switching
from the current instack-undercloud installation to the containerized
version.

1) undercloud-containers - a containerized install, should be voting
by m1
2) undercloud-containers-update - minor updates run on containerized
underclouds, should be voting by m2
3) undercloud-containers-upgrade - major upgrade from
non-containerized to containerized undercloud, should be voting by
m2.

If we have these jobs, is there anything we can drop or mark as
covered that is currently being covered by an overcloud job?

Can you please comment on these expectations as being achievable? If
they are not achievable, I don't think we can agree to switch the
default for Queens. As we shipped the 'undercloud deploy' as
experimental for Pike, it's well within reason to continue to do so
for Queens. Perhaps we change the labeling to beta or working it into
a --containerized option for 'undercloud install'.

I think my ask for the undercloud-containers job as non-voting by m1
is achievable today because it's currently green (pending any zuul
freezes). My concern is really minor updates and upgrades need to be
understood and accounted for ASAP. If we're truly able to reuse some
of the work we did for O->P upgrades, then these should be fairly
straight forward things to accomplish and there would be fewer
blockers to make the switch.

-Containers: There are no plans to containerize the existing
instack-
undercloud work. By moving our undercloud installer to a tripleo-
heat-
templates and Ansible architecture we can leverage containers.
Interestingly, the same installer also supports baremetal (package)
installation as well at this point. Like to overcloud however I
think
making containers our undercloud default would better align the
TripleO
tooling.

We are actively working through a few issues with the deployment
framework Ansible effort to fully integrate that into the
undercloud
installer. We are also reaching out to other teams like the UI and
Security folks to coordinate the efforts around those components.
If
there are any questions about the effort or you'd like to be
involved
in the implementation let us know. Stay tuned for more specific
updates
as we organize to get as much of this in M1 and M2 as possible.

I would like to see weekly updates on this effort during the IRC
meeting. As previously mentioned around squad status, I'll be asking
for them during the meeting so it would be nice to get an update this
on a weekly basis so we can make sure that we'll be OK to cut over.

Also what does the cut over plan look like? This is something that
might be beneficial to have in a spec. IMHO, I'm ok to continue
pushing the container effort using the openstack undercloud deploy
method for now. Once we have voting CI jobs and the feature list has
been covered then we can evaluate if we've made the M2 time frame to
switching openstack undercloud deploy to be the new undercloud
install. I want to make sure we don't introduce regressions and are
doing thing in a user friendly fashion since the undercloud is the
first intro an end user gets to tripleo. It would be a good idea to
review what the new install process looks like and make sure it "just
works" given that the current process[0] (with all it's flaws) is
fairly trivial to perform.

Basically what I would like to see before making this new default is:
1) minor updates work (with CI)
2) P->Q upgrades work (with CI)
3) Documentation complete
4) no UX impact for installation (eg. how they installed it before is
the same as they install it now for containers)

If these are accounted for and completed before M2 then I would be +2
on the switch.

Thanks,
-Alex

[0] https://docs.openstack.org/tripleo-docs/latest/install/installati
on/installation.html#installing-the-undercloud

On behalf of the containers team,

Dan

[1] https://etherpad.openstack.org/p/tripleo-queens-undercloud-cont
aine
rs



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsu
bscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubs
cribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 3, 2017 by aschultz_at_redhat.c (5,800 points)   2 2 4
0 votes

On Tue, Oct 3, 2017 at 3:50 PM, Alex Schultz aschultz@redhat.com wrote:

On Tue, Oct 3, 2017 at 11:12 AM, Dan Prince dprince@redhat.com wrote:

On Mon, 2017-10-02 at 15:20 -0600, Alex Schultz wrote:

Hey Dan,

Thanks for sending out a note about this. I have a few questions
inline.

On Mon, Oct 2, 2017 at 6:02 AM, Dan Prince dprince@redhat.com
wrote:

One of the things the TripleO containers team is planning on
tackling
in Queens is fully containerizing the undercloud. At the PTG we
created
an etherpad [1] that contains a list of features that need to be
implemented to fully replace instack-undercloud.

I know we talked about this at the PTG and I was skeptical that this
will land in Queens. With the exception of the Container's team
wanting this, I'm not sure there is an actual end user who is looking
for the feature so I want to make sure we're not just doing more work
because we as developers think it's a good idea.

I've heard from several operators that they were actually surprised we
implemented containers in the Overcloud first. Validating a new
deployment framework on a single node Undercloud (for operators) before
overtaking their entire cloud deployment has a lot of merit to it IMO.
When you share the same deployment architecture across the
overcloud/undercloud it puts us in a better position to decide where to
expose new features to operators first (when creating the undercloud or
overcloud for example).

Also, if you read my email again I've explicitly listed the
"Containers" benefit last. While I think moving the undercloud to
containers is a great benefit all by itself this is more of a
"framework alignment" in TripleO and gets us out of maintaining huge
amounts of technical debt. Re-using the same framework for the
undercloud and overcloud has a lot of merit. It effectively streamlines
the development process for service developers, and 3rd parties wishing
to integrate some of their components on a single node. Why be forced
to create a multi-node dev environment if you don't have to (aren't
using HA for example).

Lets be honest. While instack-undercloud helped solve the old "seed" VM
issue it was outdated the day it landed upstream. The entire premise of
the tool is that it uses old style "elements" to create the undercloud
and we moved away from those as the primary means driving the creation
of the Overcloud years ago at this point. The new 'undercloud_deploy'
installer gets us back to our roots by once again sharing the same
architecture to create the over and underclouds. A demo from long ago
expands on this idea a bit: https://www.youtube.com/watch?v=y1qMDLAf26
Q&t=5s

In short, we aren't just doing more work because developers think it is
a good idea. This has potential to be one of the most useful
architectural changes in TripleO that we've made in years. Could
significantly decrease our CI reasources if we use it to replace the
existing scenarios jobs which take multiple VMs per job. Is a building
block we could use for other features like and HA undercloud. And yes,
it does also have a huge impact on developer velocity in that many of
us already prefer to use the tool as a means of streamlining our
dev/test cycles to minutes instead of hours. Why spend hours running
quickstart Ansible scripts when in many cases you can just doit.sh. htt
ps://github.com/dprince/undercloud_containers/blob/master/doit.sh

So like I've repeatedly said, I'm not completely against it as I agree
what we have is not ideal. I'm not -2, I'm -1 pending additional
information. I'm trying to be realistic and reduce our risk for this
cycle.

This reduces our complexity greatly I think in that once it is completed
will allow us to eliminate two project (instack and instack-undercloud) and
the maintenance thereof. Furthermore, as this dovetails nice with the
Ansible

IMHO doit.sh is not acceptable as an undercloud installer and
this is what I've been trying to point out as the actual impact to the
end user who has to use this thing.

doit.sh is an example of where the effort is today. It is essentially the
same stuff we document online here:
http://tripleo.org/install/containers_deployment/undercloud.html.

Similar to quickstart it is just something meant to help you setup a dev
environment.

We have an established
installation method for the undercloud, that while isn't great, isn't
a bash script with git fetches, etc. So as for the implementation,
this is what I want to see properly flushed out prior to accepting
this feature as complete for Queens (and the new default).

Of course the feature would need to prove itself before it becomes the new
default Undercloud. I'm trying to build consensus and get the team focused
on these things.

What strikes me as odd is your earlier comment about " I want to make sure
we're not just doing more work because we as developers think it's a good
idea." I'm a developer and I do think this is a good idea. Please don't try
to de-motivate this effort just because you happen to believe this. It was
accepted for Pike and unfortunately we didn't get enough buy in early
enough to get focus on it. Now that is starting to change and just as it is
you are suggesting we not keep it a priority?

I would
like to see a plan of what features need to be added (eg. the stuff on
the etherpad), folks assigned to do this work, and estimated
timelines. Given that we shouldn't be making major feature changes
after M2 (~9 weeks), I want to get an understanding of what is
realistically going to make it. If after reviewing the initial
details we find that it's not actually going to make M2, then let's
agree to this now rather than trying to force it in at the end.

All of this is forthcoming. Those details will come in time.

I know you've been a great proponent of the containerized undercloud
and I agree it offers a lot more for development efforts. But I just
want to make sure that we are getting all the feedback we can before
continuing down this path. Since, as you point out, a bunch of this
work is already available for consumption by developers, I don't see
making it the new default as a requirement for Queens unless it's a
fully implemented and tested. There's nothing stopping folks from
using it now and making incremental improvements during Queens and we
commit to making it the new default for Rocky.

The point of this cycle was supposed to be more stablization/getting
all the containers in place. Doing something like this seems to go
against what we were actually trying to achieve. I'd rather make
smaller incremental progress with your proposal being the end goal and
agreeing that perhaps Rocky is more realistic for the default cut
over.

I thought the point of this release was full containerization? And part of
that is containerizing the undercloud too right?

Lastly, this isn't just a containers team thing. We've been using the
underclouddeploy architecture across many teams to help develop for
almost an entire cycle now. Huge benefits. I would go as far as saying
that undercloud
deploy was the biggest feature in Pike that enabled
us to bang out a majority of the docker/service templates in tripleo-
heat-templates.

Given that etherpad
appears to contain a pretty big list of features, are we going to be
able to land all of them by M2? Would it be beneficial to craft a
basic spec related to this to ensure we are not missing additional
things?

I'm not sure there is a lot of value in creating a spec at this point.
We've already got an approved blueprint for the feature in Pike here: h
ttps://blueprints.launchpad.net/tripleo/+spec/containerized-undercloud

I think we might get more velocity out of grooming the etherpad and
perhaps dividing this work among the appropriate teams.

That's fine, but I would like to see additional efforts made to
organize this work, assign folks and add proper timelines.

Benefits of this work:

-Alignment: aligning the undercloud and overcloud installers gets
rid
of dual maintenance of services.

I like reusing existing stuff. +1

-Composability: tripleo-heat-templates and our new Ansible
architecture around it are composable. This means any set of
services
can be used to build up your own undercloud. In other words the
framework here isn't just useful for "underclouds". It is really
the
ability to deploy Tripleo on a single node with no external
dependencies. Single node TripleO installer. The containers team
has
already been leveraging existing (experimental) undercloud_deploy
installer to develop services for Pike.

Is this something that is actually being asked for or is this just an
added bonus because it allows developers to reduce what is actually
being deployed for testing?

There is an implied ask for this feature when a new developer starts to
use TripleO. Right now resource bar is quite high for TripleO. You have
to have a multi-node development environment at the very least (one
undercloud node, and one overcloud node). The ideas we are talking
about here short circuits this in many cases... where if you aren't
testing HA services or Ironic you could simple use undercloud_deploy to
test tripleo-heat-template changes on a single VM. Less resources, and
much less time spent learning and waiting.

IMHO I don't think the undercloud install is the limiting factor for
new developers and I'm not sure this is actually reducing that
complexity. It does reduce the amount of hardware needed to develop
some items, but there's a cost in complexity by moving the
configuration to THT which is already where many people struggle. As
I previously mentioned, there's nothing stopping us from promoting the
containerized undercloud as a development tool and ensuring it's full
featured before switching to it as the default at a later date.

Because the new undercloud_deploy installer uses t-h-t we get containers
for free. Additionally as we convert over to Ansible instead of Heat
software deployments we also get better operator feedback there as well.
Woudn't it be nice to have an Undercloud installer driven by Ansible
instead of Python and tripleo-image-elements?

The reason I linked in doit.sh above (and if you actually go and look at
the recent patches) we are already wiring these things up right now (before
M1!) and it looks really nice. As we eventually move away from Puppet for
configuration that too goes away. So I think the idea here is a
net-reduction in complexity because we no longer have to maintain
instack-undercloud, puppet modules, and elements.

It isn't that the undercloud install is a limiting factor. It is that the
set of services making up your "Undercloud" can be anything you want
because t-h-t supports all of our services. Anything you want with minimal
t-h-t, Ansible, and containers. This means you can effectively develop on a
single node for many cases and it will just work in a multi-node Overcloud
setup too because we have the same architecture.

Dan

-Development: The containerized undercloud is a great development
tool. It utilizes the same framework as the full overcloud
deployment
but takes about 20 minutes to deploy. This means faster
iterations,
less waiting, and more testing. Having this be a first class
citizen
in the ecosystem will ensure this platform is functioning for
developers to use all the time.

Seems to go with the previous question about the re-usability for
people who are not developers. Has everyone (including non-container
folks) tried this out and attest that it's a better workflow for
them?
Are there use cases that are made worse by switching?

I would let other chime in but the feedback I've gotten has mostly been
that it improves the dev/test cycle greatly.

-CI resources: better use of CI resources. At the PTG we received
feedback from the OpenStack infrastructure team that our upstream
CI
resource usage is quite high at times (even as high as 50% of the
total). Because of the shared framework and single node
capabilities we
can re-architecture much of our upstream CI matrix around single
node.
We no longer require multinode jobs to be able to test many of the
services in tripleo-heat-templates... we can just use a single
cloud VM
instead. We'll still want multinode undercloud -> overcloud jobs
for
testing things like HA and baremetal provisioning. But we can cover
a
large set of the services (in particular many of the new scenario
jobs
we added in Pike) with single node CI test runs in much less time.

I like this idea but would like to see more details around this.
Since this is a new feature we need to make sure that we are properly
covering the containerized undercloud with CI as well. I think we
need 3 jobs to properly cover this feature before marking it done. I
added them to the etherpad but I think we need to ensure the
following
3 jobs are defined and voting by M2 to consider actually switching
from the current instack-undercloud installation to the containerized
version.

1) undercloud-containers - a containerized install, should be voting
by m1
2) undercloud-containers-update - minor updates run on containerized
underclouds, should be voting by m2
3) undercloud-containers-upgrade - major upgrade from
non-containerized to containerized undercloud, should be voting by
m2.

If we have these jobs, is there anything we can drop or mark as
covered that is currently being covered by an overcloud job?

Can you please comment on these expectations as being achievable? If
they are not achievable, I don't think we can agree to switch the
default for Queens. As we shipped the 'undercloud deploy' as
experimental for Pike, it's well within reason to continue to do so
for Queens. Perhaps we change the labeling to beta or working it into
a --containerized option for 'undercloud install'.

I think my ask for the undercloud-containers job as non-voting by m1
is achievable today because it's currently green (pending any zuul
freezes). My concern is really minor updates and upgrades need to be
understood and accounted for ASAP. If we're truly able to reuse some
of the work we did for O->P upgrades, then these should be fairly
straight forward things to accomplish and there would be fewer
blockers to make the switch.

-Containers: There are no plans to containerize the existing
instack-
undercloud work. By moving our undercloud installer to a tripleo-
heat-
templates and Ansible architecture we can leverage containers.
Interestingly, the same installer also supports baremetal (package)
installation as well at this point. Like to overcloud however I
think
making containers our undercloud default would better align the
TripleO
tooling.

We are actively working through a few issues with the deployment
framework Ansible effort to fully integrate that into the
undercloud
installer. We are also reaching out to other teams like the UI and
Security folks to coordinate the efforts around those components.
If
there are any questions about the effort or you'd like to be
involved
in the implementation let us know. Stay tuned for more specific
updates
as we organize to get as much of this in M1 and M2 as possible.

I would like to see weekly updates on this effort during the IRC
meeting. As previously mentioned around squad status, I'll be asking
for them during the meeting so it would be nice to get an update this
on a weekly basis so we can make sure that we'll be OK to cut over.

Also what does the cut over plan look like? This is something that
might be beneficial to have in a spec. IMHO, I'm ok to continue
pushing the container effort using the openstack undercloud deploy
method for now. Once we have voting CI jobs and the feature list has
been covered then we can evaluate if we've made the M2 time frame to
switching openstack undercloud deploy to be the new undercloud
install. I want to make sure we don't introduce regressions and are
doing thing in a user friendly fashion since the undercloud is the
first intro an end user gets to tripleo. It would be a good idea to
review what the new install process looks like and make sure it "just
works" given that the current process[0] (with all it's flaws) is
fairly trivial to perform.

Basically what I would like to see before making this new default is:
1) minor updates work (with CI)
2) P->Q upgrades work (with CI)
3) Documentation complete
4) no UX impact for installation (eg. how they installed it before is
the same as they install it now for containers)

If these are accounted for and completed before M2 then I would be +2
on the switch.

Thanks,
-Alex

[0] https://docs.openstack.org/tripleo-docs/latest/install/installati
on/installation.html#installing-the-undercloud

On behalf of the containers team,

Dan

[1] https://etherpad.openstack.org/p/tripleo-queens-undercloud-cont
aine
rs



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsu
bscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubs
cribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:
unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 3, 2017 by Dan_Prince (8,160 points)   1 4 6
0 votes

On Tue, Oct 3, 2017 at 2:46 PM, Dan Prince dprince@redhat.com wrote:

On Tue, Oct 3, 2017 at 3:50 PM, Alex Schultz aschultz@redhat.com wrote:

On Tue, Oct 3, 2017 at 11:12 AM, Dan Prince dprince@redhat.com wrote:

On Mon, 2017-10-02 at 15:20 -0600, Alex Schultz wrote:

Hey Dan,

Thanks for sending out a note about this. I have a few questions
inline.

On Mon, Oct 2, 2017 at 6:02 AM, Dan Prince dprince@redhat.com
wrote:

One of the things the TripleO containers team is planning on
tackling
in Queens is fully containerizing the undercloud. At the PTG we
created
an etherpad [1] that contains a list of features that need to be
implemented to fully replace instack-undercloud.

I know we talked about this at the PTG and I was skeptical that this
will land in Queens. With the exception of the Container's team
wanting this, I'm not sure there is an actual end user who is looking
for the feature so I want to make sure we're not just doing more work
because we as developers think it's a good idea.

I've heard from several operators that they were actually surprised we
implemented containers in the Overcloud first. Validating a new
deployment framework on a single node Undercloud (for operators) before
overtaking their entire cloud deployment has a lot of merit to it IMO.
When you share the same deployment architecture across the
overcloud/undercloud it puts us in a better position to decide where to
expose new features to operators first (when creating the undercloud or
overcloud for example).

Also, if you read my email again I've explicitly listed the
"Containers" benefit last. While I think moving the undercloud to
containers is a great benefit all by itself this is more of a
"framework alignment" in TripleO and gets us out of maintaining huge
amounts of technical debt. Re-using the same framework for the
undercloud and overcloud has a lot of merit. It effectively streamlines
the development process for service developers, and 3rd parties wishing
to integrate some of their components on a single node. Why be forced
to create a multi-node dev environment if you don't have to (aren't
using HA for example).

Lets be honest. While instack-undercloud helped solve the old "seed" VM
issue it was outdated the day it landed upstream. The entire premise of
the tool is that it uses old style "elements" to create the undercloud
and we moved away from those as the primary means driving the creation
of the Overcloud years ago at this point. The new 'undercloud_deploy'
installer gets us back to our roots by once again sharing the same
architecture to create the over and underclouds. A demo from long ago
expands on this idea a bit: https://www.youtube.com/watch?v=y1qMDLAf26
Q&t=5s

In short, we aren't just doing more work because developers think it is
a good idea. This has potential to be one of the most useful
architectural changes in TripleO that we've made in years. Could
significantly decrease our CI reasources if we use it to replace the
existing scenarios jobs which take multiple VMs per job. Is a building
block we could use for other features like and HA undercloud. And yes,
it does also have a huge impact on developer velocity in that many of
us already prefer to use the tool as a means of streamlining our
dev/test cycles to minutes instead of hours. Why spend hours running
quickstart Ansible scripts when in many cases you can just doit.sh. htt
ps://github.com/dprince/undercloud_containers/blob/master/doit.sh

So like I've repeatedly said, I'm not completely against it as I agree
what we have is not ideal. I'm not -2, I'm -1 pending additional
information. I'm trying to be realistic and reduce our risk for this
cycle.

This reduces our complexity greatly I think in that once it is completed
will allow us to eliminate two project (instack and instack-undercloud) and
the maintenance thereof. Furthermore, as this dovetails nice with the
Ansible

I agree. So I think there's some misconceptions here about my thoughts
on this effort. I am not against this effort. I am for this effort and
wish to see more of it. I want to see the effort communicated publicly
via ML and IRC meetings. What I am against switching the default
undercloud method until the containerization of the undercloud has the
appropriate test coverage and documentation to ensure it is on par
with what it is replacing. Does this make sense?

IMHO doit.sh is not acceptable as an undercloud installer and
this is what I've been trying to point out as the actual impact to the
end user who has to use this thing.

doit.sh is an example of where the effort is today. It is essentially the
same stuff we document online here:
http://tripleo.org/install/containers_deployment/undercloud.html.

Similar to quickstart it is just something meant to help you setup a dev
environment.

Right, providing something that the non-developer uses vs providing
something for hacking are two separate things. Making it consumable by
the end user (not developer) is what I'm pointing out that needs to be
accounted for. This is a recurring theme that I have pushed for in
OpenStack to ensure that the operator (actual end user) is accounted
for when making decisions. Tripleo has not done a good job of this
either. Sure the referenced documentation works for the dev case, but
probably not the actual deployer/operator case. There needs to be a
migration guide or documentation of old configuration -> new
configuration for the people who are familiar with non-containerized
undercloud vs containerized undercloud. Do we have all the use cases
accounted for etc. etc. This is the part that I don't think we have
figured out and which is what I'm asking that we make sure we account
for with this.

We have an established
installation method for the undercloud, that while isn't great, isn't
a bash script with git fetches, etc. So as for the implementation,
this is what I want to see properly flushed out prior to accepting
this feature as complete for Queens (and the new default).

Of course the feature would need to prove itself before it becomes the new
default Undercloud. I'm trying to build consensus and get the team focused
on these things.

What strikes me as odd is your earlier comment about " I want to make sure
we're not just doing more work because we as developers think it's a good
idea." I'm a developer and I do think this is a good idea. Please don't try
to de-motivate this effort just because you happen to believe this. It was
accepted for Pike and unfortunately we didn't get enough buy in early enough
to get focus on it. Now that is starting to change and just as it is you are
suggesting we not keep it a priority?

Once again, I agree and I am on board to the end goal that I think is
trying to be achieved by this effort. What I am currently not on board
with is the time frame of for Queens based on concerns previously
mentioned. This is not about trying to demotivating an effort. It's
about ensuring quality and something that is consumable by an
additional set of end users of the software (the operator/deployer,
not developer). Given that we have not finished the overcloud
deployment and are still working on fixing items found for that, I
personally feel it's a bit early to consider switching the undercloud
default install to a containerized method. That being said, I have
repeatedly stated that if we account for updates, upgrades, docs and
the operator UX there's no problems with this effort. I just don't
think it's realistic given current timelines (~9 weeks). Please feel
free to provide information/patches to the contrary. I have not said
don't work on it. I just want to make sure we have all the pieces in
place needed to consider it a proper replacement for the existing
undercloud installation (by M2). If anything there's probably more
work that needs to be done and if we want to make it a priority to
happen, then it needs to be documented and communicated so folks can
assist as they have cycles.

I would
like to see a plan of what features need to be added (eg. the stuff on
the etherpad), folks assigned to do this work, and estimated
timelines. Given that we shouldn't be making major feature changes
after M2 (~9 weeks), I want to get an understanding of what is
realistically going to make it. If after reviewing the initial
details we find that it's not actually going to make M2, then let's
agree to this now rather than trying to force it in at the end.

All of this is forthcoming. Those details will come in time.

I know you've been a great proponent of the containerized undercloud
and I agree it offers a lot more for development efforts. But I just
want to make sure that we are getting all the feedback we can before
continuing down this path. Since, as you point out, a bunch of this
work is already available for consumption by developers, I don't see
making it the new default as a requirement for Queens unless it's a
fully implemented and tested. There's nothing stopping folks from
using it now and making incremental improvements during Queens and we
commit to making it the new default for Rocky.

The point of this cycle was supposed to be more stablization/getting
all the containers in place. Doing something like this seems to go
against what we were actually trying to achieve. I'd rather make
smaller incremental progress with your proposal being the end goal and
agreeing that perhaps Rocky is more realistic for the default cut
over.

I thought the point of this release was full containerization? And part of
that is containerizing the undercloud too right?

Not that I was aware of. Others have asked because they have not been
aware that it included the undercloud. Given that we are wanting to
eventually look to kubernetes maybe we don't need to containerize the
undercloud as it may be it could be discarded with that switch.
That's probably a longer discussion. It might need to be researched
which is why it's important to understand why we're doing the
containerization effort and what exactly it entails. Given that I
don't think we're looking to deploy kubernetes via
THT/tripleo-puppet/containers, I wonder what impact this would have
with this effort? That's probably a conversation for another thread.

Lastly, this isn't just a containers team thing. We've been using the
underclouddeploy architecture across many teams to help develop for
almost an entire cycle now. Huge benefits. I would go as far as saying
that undercloud
deploy was the biggest feature in Pike that enabled
us to bang out a majority of the docker/service templates in tripleo-
heat-templates.

Given that etherpad
appears to contain a pretty big list of features, are we going to be
able to land all of them by M2? Would it be beneficial to craft a
basic spec related to this to ensure we are not missing additional
things?

I'm not sure there is a lot of value in creating a spec at this point.
We've already got an approved blueprint for the feature in Pike here: h
ttps://blueprints.launchpad.net/tripleo/+spec/containerized-undercloud

I think we might get more velocity out of grooming the etherpad and
perhaps dividing this work among the appropriate teams.

That's fine, but I would like to see additional efforts made to
organize this work, assign folks and add proper timelines.

Benefits of this work:

-Alignment: aligning the undercloud and overcloud installers gets
rid
of dual maintenance of services.

I like reusing existing stuff. +1

-Composability: tripleo-heat-templates and our new Ansible
architecture around it are composable. This means any set of
services
can be used to build up your own undercloud. In other words the
framework here isn't just useful for "underclouds". It is really
the
ability to deploy Tripleo on a single node with no external
dependencies. Single node TripleO installer. The containers team
has
already been leveraging existing (experimental) undercloud_deploy
installer to develop services for Pike.

Is this something that is actually being asked for or is this just an
added bonus because it allows developers to reduce what is actually
being deployed for testing?

There is an implied ask for this feature when a new developer starts to
use TripleO. Right now resource bar is quite high for TripleO. You have
to have a multi-node development environment at the very least (one
undercloud node, and one overcloud node). The ideas we are talking
about here short circuits this in many cases... where if you aren't
testing HA services or Ironic you could simple use undercloud_deploy to
test tripleo-heat-template changes on a single VM. Less resources, and
much less time spent learning and waiting.

IMHO I don't think the undercloud install is the limiting factor for
new developers and I'm not sure this is actually reducing that
complexity. It does reduce the amount of hardware needed to develop
some items, but there's a cost in complexity by moving the
configuration to THT which is already where many people struggle. As
I previously mentioned, there's nothing stopping us from promoting the
containerized undercloud as a development tool and ensuring it's full
featured before switching to it as the default at a later date.

Because the new undercloud_deploy installer uses t-h-t we get containers for
free. Additionally as we convert over to Ansible instead of Heat software
deployments we also get better operator feedback there as well. Woudn't it
be nice to have an Undercloud installer driven by Ansible instead of Python
and tripleo-image-elements?

Yup, and once again I recognize this as a benefit.

The reason I linked in doit.sh above (and if you actually go and look at the
recent patches) we are already wiring these things up right now (before M1!)
and it looks really nice. As we eventually move away from Puppet for
configuration that too goes away. So I think the idea here is a
net-reduction in complexity because we no longer have to maintain
instack-undercloud, puppet modules, and elements.

It isn't that the undercloud install is a limiting factor. It is that the
set of services making up your "Undercloud" can be anything you want because
t-h-t supports all of our services. Anything you want with minimal t-h-t,
Ansible, and containers. This means you can effectively develop on a single
node for many cases and it will just work in a multi-node Overcloud setup
too because we have the same architecture.

My concern is making sure we aren't moving too fast and introducing
more regressions/bugs/missing use cases/etc. My hope is by documenting
all of this, ensuring we have proper expectations around a definition
of done (and time frames), and allowing for additional review, we will
reduce the risk introduced by this switch. These types of things
align with what we talked about at the PTG in during the retro[0]
(see: start define definition of done, start status reporting on ML,
stop over committing, stop big change without tests, less complexity,
etc, etc). This stuff's complicated, let's make sure we do it right.

Thanks,
-Alex

[0] http://people.redhat.com/aschultz/denver-ptg/tripleo-ptg-retro.jpg

Dan

-Development: The containerized undercloud is a great development
tool. It utilizes the same framework as the full overcloud
deployment
but takes about 20 minutes to deploy. This means faster
iterations,
less waiting, and more testing. Having this be a first class
citizen
in the ecosystem will ensure this platform is functioning for
developers to use all the time.

Seems to go with the previous question about the re-usability for
people who are not developers. Has everyone (including non-container
folks) tried this out and attest that it's a better workflow for
them?
Are there use cases that are made worse by switching?

I would let other chime in but the feedback I've gotten has mostly been
that it improves the dev/test cycle greatly.

-CI resources: better use of CI resources. At the PTG we received
feedback from the OpenStack infrastructure team that our upstream
CI
resource usage is quite high at times (even as high as 50% of the
total). Because of the shared framework and single node
capabilities we
can re-architecture much of our upstream CI matrix around single
node.
We no longer require multinode jobs to be able to test many of the
services in tripleo-heat-templates... we can just use a single
cloud VM
instead. We'll still want multinode undercloud -> overcloud jobs
for
testing things like HA and baremetal provisioning. But we can cover
a
large set of the services (in particular many of the new scenario
jobs
we added in Pike) with single node CI test runs in much less time.

I like this idea but would like to see more details around this.
Since this is a new feature we need to make sure that we are properly
covering the containerized undercloud with CI as well. I think we
need 3 jobs to properly cover this feature before marking it done. I
added them to the etherpad but I think we need to ensure the
following
3 jobs are defined and voting by M2 to consider actually switching
from the current instack-undercloud installation to the containerized
version.

1) undercloud-containers - a containerized install, should be voting
by m1
2) undercloud-containers-update - minor updates run on containerized
underclouds, should be voting by m2
3) undercloud-containers-upgrade - major upgrade from
non-containerized to containerized undercloud, should be voting by
m2.

If we have these jobs, is there anything we can drop or mark as
covered that is currently being covered by an overcloud job?

Can you please comment on these expectations as being achievable? If
they are not achievable, I don't think we can agree to switch the
default for Queens. As we shipped the 'undercloud deploy' as
experimental for Pike, it's well within reason to continue to do so
for Queens. Perhaps we change the labeling to beta or working it into
a --containerized option for 'undercloud install'.

I think my ask for the undercloud-containers job as non-voting by m1
is achievable today because it's currently green (pending any zuul
freezes). My concern is really minor updates and upgrades need to be
understood and accounted for ASAP. If we're truly able to reuse some
of the work we did for O->P upgrades, then these should be fairly
straight forward things to accomplish and there would be fewer
blockers to make the switch.

-Containers: There are no plans to containerize the existing
instack-
undercloud work. By moving our undercloud installer to a tripleo-
heat-
templates and Ansible architecture we can leverage containers.
Interestingly, the same installer also supports baremetal (package)
installation as well at this point. Like to overcloud however I
think
making containers our undercloud default would better align the
TripleO
tooling.

We are actively working through a few issues with the deployment
framework Ansible effort to fully integrate that into the
undercloud
installer. We are also reaching out to other teams like the UI and
Security folks to coordinate the efforts around those components.
If
there are any questions about the effort or you'd like to be
involved
in the implementation let us know. Stay tuned for more specific
updates
as we organize to get as much of this in M1 and M2 as possible.

I would like to see weekly updates on this effort during the IRC
meeting. As previously mentioned around squad status, I'll be asking
for them during the meeting so it would be nice to get an update this
on a weekly basis so we can make sure that we'll be OK to cut over.

Also what does the cut over plan look like? This is something that
might be beneficial to have in a spec. IMHO, I'm ok to continue
pushing the container effort using the openstack undercloud deploy
method for now. Once we have voting CI jobs and the feature list has
been covered then we can evaluate if we've made the M2 time frame to
switching openstack undercloud deploy to be the new undercloud
install. I want to make sure we don't introduce regressions and are
doing thing in a user friendly fashion since the undercloud is the
first intro an end user gets to tripleo. It would be a good idea to
review what the new install process looks like and make sure it "just
works" given that the current process[0] (with all it's flaws) is
fairly trivial to perform.

Basically what I would like to see before making this new default is:
1) minor updates work (with CI)
2) P->Q upgrades work (with CI)
3) Documentation complete
4) no UX impact for installation (eg. how they installed it before is
the same as they install it now for containers)

If these are accounted for and completed before M2 then I would be +2
on the switch.

Thanks,
-Alex

[0] https://docs.openstack.org/tripleo-docs/latest/install/installati
on/installation.html#installing-the-undercloud

On behalf of the containers team,

Dan

[1] https://etherpad.openstack.org/p/tripleo-queens-undercloud-cont
aine
rs



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsu
bscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubs
cribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 3, 2017 by aschultz_at_redhat.c (5,800 points)   2 2 4
0 votes

On Tue, 2017-10-03 at 16:03 -0600, Alex Schultz wrote:
On Tue, Oct 3, 2017 at 2:46 PM, Dan Prince dprince@redhat.com
wrote:

On Tue, Oct 3, 2017 at 3:50 PM, Alex Schultz aschultz@redhat.com
wrote:

On Tue, Oct 3, 2017 at 11:12 AM, Dan Prince dprince@redhat.com
wrote:

On Mon, 2017-10-02 at 15:20 -0600, Alex Schultz wrote:

Hey Dan,

Thanks for sending out a note about this. I have a few
questions
inline.

On Mon, Oct 2, 2017 at 6:02 AM, Dan Prince <dprince@redhat.co
m>
wrote:

One of the things the TripleO containers team is planning
on
tackling
in Queens is fully containerizing the undercloud. At the
PTG we
created
an etherpad [1] that contains a list of features that need
to be
implemented to fully replace instack-undercloud.

I know we talked about this at the PTG and I was skeptical
that this
will land in Queens. With the exception of the Container's
team
wanting this, I'm not sure there is an actual end user who is
looking
for the feature so I want to make sure we're not just doing
more work
because we as developers think it's a good idea.

I've heard from several operators that they were actually
surprised we
implemented containers in the Overcloud first. Validating a new
deployment framework on a single node Undercloud (for
operators) before
overtaking their entire cloud deployment has a lot of merit to
it IMO.
When you share the same deployment architecture across the
overcloud/undercloud it puts us in a better position to decide
where to
expose new features to operators first (when creating the
undercloud or
overcloud for example).

Also, if you read my email again I've explicitly listed the
"Containers" benefit last. While I think moving the undercloud
to
containers is a great benefit all by itself this is more of a
"framework alignment" in TripleO and gets us out of maintaining
huge
amounts of technical debt. Re-using the same framework for the
undercloud and overcloud has a lot of merit. It effectively
streamlines
the development process for service developers, and 3rd parties
wishing
to integrate some of their components on a single node. Why be
forced
to create a multi-node dev environment if you don't have to
(aren't
using HA for example).

Lets be honest. While instack-undercloud helped solve the old
"seed" VM
issue it was outdated the day it landed upstream. The entire
premise of
the tool is that it uses old style "elements" to create the
undercloud
and we moved away from those as the primary means driving the
creation
of the Overcloud years ago at this point. The new
'undercloud_deploy'
installer gets us back to our roots by once again sharing the
same
architecture to create the over and underclouds. A demo from
long ago
expands on this idea a bit: https://www.youtube.com/watch?v=y1
qMDLAf26
Q&t=5s

In short, we aren't just doing more work because developers
think it is
a good idea. This has potential to be one of the most useful
architectural changes in TripleO that we've made in years.
Could
significantly decrease our CI reasources if we use it to
replace the
existing scenarios jobs which take multiple VMs per job. Is a
building
block we could use for other features like and HA undercloud.
And yes,
it does also have a huge impact on developer velocity in that
many of
us already prefer to use the tool as a means of streamlining
our
dev/test cycles to minutes instead of hours. Why spend hours
running
quickstart Ansible scripts when in many cases you can just
doit.sh. htt
ps://github.com/dprince/undercloud_containers/blob/master/doit.
sh

So like I've repeatedly said, I'm not completely against it as I
agree
what we have is not ideal. I'm not -2, I'm -1 pending additional
information. I'm trying to be realistic and reduce our risk for
this
cycle.

This reduces our complexity greatly I think in that once it is
completed
will allow us to eliminate two project (instack and instack-
undercloud) and
the maintenance thereof. Furthermore, as this dovetails nice with
the
Ansible

I agree. So I think there's some misconceptions here about my
thoughts
on this effort. I am not against this effort. I am for this effort
and
wish to see more of it. I want to see the effort communicated
publicly
via ML and IRC meetings. What I am against switching the default
undercloud method until the containerization of the undercloud has
the
appropriate test coverage and documentation to ensure it is on par
with what it is replacing. Does this make sense?

IMHO doit.sh is not acceptable as an undercloud installer and
this is what I've been trying to point out as the actual impact
to the
end user who has to use this thing.

doit.sh is an example of where the effort is today. It is
essentially the
same stuff we document online here:
http://tripleo.org/install/containers_deployment/undercloud.html.

Similar to quickstart it is just something meant to help you setup
a dev
environment.

Right, providing something that the non-developer uses vs providing
something for hacking are two separate things. Making it consumable
by
the end user (not developer) is what I'm pointing out that needs to
be
accounted for. This is a recurring theme that I have pushed for in
OpenStack to ensure that the operator (actual end user) is accounted
for when making decisions. Tripleo has not done a good job of this
either. Sure the referenced documentation works for the dev case,
but
probably not the actual deployer/operator case.

This will come in time. What I would encourage us to do upstream is
make as much progress on this in Queens as possible so that getting to
the point of polishing our documentation is the focus... instead of the
remaining work.

And to be clear all of this work advocates for the Operator just as
much as it does for the developer. No regressions, improved Ansible
feedback on the CLI, potential for future features around multitude and
alignment of the architecture around containers. Boom! I think
operators will like all of this. We can and will document it.

There needs to be a
migration guide or documentation of old configuration -> new
configuration for the people who are familiar with non-containerized
undercloud vs containerized undercloud. Do we have all the use cases
accounted for etc. etc. This is the part that I don't think we have
figured out and which is what I'm asking that we make sure we account
for with this.

The use case is the replace instack-undercloud with no feature
regressions.

We have an established
installation method for the undercloud, that while isn't great,
isn't
a bash script with git fetches, etc. So as for the
implementation,
this is what I want to see properly flushed out prior to
accepting
this feature as complete for Queens (and the new default).

Of course the feature would need to prove itself before it becomes
the new
default Undercloud. I'm trying to build consensus and get the team
focused
on these things.

What strikes me as odd is your earlier comment about " I want to
make sure
we're not just doing more work because we as developers think it's
a good
idea." I'm a developer and I do think this is a good idea. Please
don't try
to de-motivate this effort just because you happen to believe this.
It was
accepted for Pike and unfortunately we didn't get enough buy in
early enough
to get focus on it. Now that is starting to change and just as it
is you are
suggesting we not keep it a priority?

Once again, I agree and I am on board to the end goal that I think is
trying to be achieved by this effort. What I am currently not on
board
with is the time frame of for Queens based on concerns previously
mentioned. This is not about trying to demotivating an effort. It's
about ensuring quality and something that is consumable by an
additional set of end users of the software (the operator/deployer,
not developer). Given that we have not finished the overcloud
deployment and are still working on fixing items found for that, I
personally feel it's a bit early to consider switching the undercloud
default install to a containerized method. That being said, I have
repeatedly stated that if we account for updates, upgrades, docs and
the operator UX there's no problems with this effort. I just don't
think it's realistic given current timelines (~9 weeks).
Please feel
free to provide information/patches to the contrary.

Whether this feature makes the release or not I think it is too early
to say. What I can say is the amount of work remaining on the
Undercloud feature is IMO a good bit less than we knocked out in the
last release:

https://etherpad.openstack.org/p/tripleo-composable-containers-underclo
ud

And regardless of whether we make the release or not there is a huge
value to moving the work forward now... if only to put us in a better
position for the next release.

I've been on the containers team for a while now and I'm more familiar
with the velocity that we could handle. Let us motivate ourselves and
give updates along the way over the next 2 months as this effort
progresses. Please don't throw "cold water" on why you don't think we
are going to make the release (especially as PTL, this can be quite
harmful to the effort for some). In fact, lets just stop talking about
Queens, and Rocky entirely. I think we can agree that this feature is a
high priority and have people move the effort forward as much as we
can.

This is a very important feature. It can be fun to work on. Let those
of us who are doing the work finish scoping it and at least have a
chance at making progress before you throw weight against us not making
the release months from now.

I have not said
don't work on it. I just want to make sure we have all the pieces in
place needed to consider it a proper replacement for the existing
undercloud installation (by M2). If anything there's probably more
work that needs to be done and if we want to make it a priority to
happen, then it needs to be documented and communicated so folks can
assist as they have cycles.

I would
like to see a plan of what features need to be added (eg. the
stuff on
the etherpad), folks assigned to do this work, and estimated
timelines. Given that we shouldn't be making major feature
changes
after M2 (~9 weeks), I want to get an understanding of what is
realistically going to make it. If after reviewing the initial
details we find that it's not actually going to make M2, then
let's
agree to this now rather than trying to force it in at the end.

All of this is forthcoming. Those details will come in time.

I know you've been a great proponent of the containerized
undercloud
and I agree it offers a lot more for development efforts. But I
just
want to make sure that we are getting all the feedback we can
before
continuing down this path. Since, as you point out, a bunch of
this
work is already available for consumption by developers, I don't
see
making it the new default as a requirement for Queens unless it's
a
fully implemented and tested. There's nothing stopping folks
from
using it now and making incremental improvements during Queens
and we
commit to making it the new default for Rocky.

The point of this cycle was supposed to be more
stablization/getting
all the containers in place. Doing something like this seems to
go
against what we were actually trying to achieve. I'd rather make
smaller incremental progress with your proposal being the end
goal and
agreeing that perhaps Rocky is more realistic for the default cut
over.

I thought the point of this release was full containerization? And
part of
that is containerizing the undercloud too right?

Not that I was aware of. Others have asked because they have not been
aware that it included the undercloud. Given that we are wanting to
eventually look to kubernetes maybe we don't need to containerize the
undercloud as it may be it could be discarded with that switch.

I don't think so. The whole point of the initial Undercloud work was
that it aligns the architectures. Using Kubernetes to maintain an
Undercloud would also be a valid approach I think. Perhaps a bit
overkill but it would be a super useful dev environment tool to develop
Kubernetes services on regardless.

And again, there are no plans to containerize instack-undercloud
components as is. I think we have agreement that using containers in
the Undercloud is a high priority and we need to move this effort
forwards.

That's probably a longer discussion. It might need to be researched
which is why it's important to understand why we're doing the
containerization effort and what exactly it entails. Given that I
don't think we're looking to deploy kubernetes via
THT/tripleo-puppet/containers, I wonder what impact this would have
with this effort? That's probably a conversation for another thread.

Lastly, this isn't just a containers team thing. We've been
using the
underclouddeploy architecture across many teams to help
develop for
almost an entire cycle now. Huge benefits. I would go as far as
saying
that undercloud
deploy was the biggest feature in Pike that
enabled
us to bang out a majority of the docker/service templates in
tripleo-
heat-templates.

Given that etherpad
appears to contain a pretty big list of features, are we
going to be
able to land all of them by M2? Would it be beneficial to
craft a
basic spec related to this to ensure we are not missing
additional
things?

I'm not sure there is a lot of value in creating a spec at this
point.
We've already got an approved blueprint for the feature in Pike
here: h
ttps://blueprints.launchpad.net/tripleo/+spec/containerized-
undercloud

I think we might get more velocity out of grooming the etherpad
and
perhaps dividing this work among the appropriate teams.

That's fine, but I would like to see additional efforts made to
organize this work, assign folks and add proper timelines.

Benefits of this work:

-Alignment: aligning the undercloud and overcloud
installers gets
rid
of dual maintenance of services.

I like reusing existing stuff. +1

-Composability: tripleo-heat-templates and our new Ansible
architecture around it are composable. This means any set
of
services
can be used to build up your own undercloud. In other words
the
framework here isn't just useful for "underclouds". It is
really
the
ability to deploy Tripleo on a single node with no external
dependencies. Single node TripleO installer. The containers
team
has
already been leveraging existing (experimental)
undercloud_deploy
installer to develop services for Pike.

Is this something that is actually being asked for or is this
just an
added bonus because it allows developers to reduce what is
actually
being deployed for testing?

There is an implied ask for this feature when a new developer
starts to
use TripleO. Right now resource bar is quite high for TripleO.
You have
to have a multi-node development environment at the very least
(one
undercloud node, and one overcloud node). The ideas we are
talking
about here short circuits this in many cases... where if you
aren't
testing HA services or Ironic you could simple use
undercloud_deploy to
test tripleo-heat-template changes on a single VM. Less
resources, and
much less time spent learning and waiting.

IMHO I don't think the undercloud install is the limiting factor
for
new developers and I'm not sure this is actually reducing that
complexity. It does reduce the amount of hardware needed to
develop
some items, but there's a cost in complexity by moving the
configuration to THT which is already where many people
struggle. As
I previously mentioned, there's nothing stopping us from
promoting the
containerized undercloud as a development tool and ensuring it's
full
featured before switching to it as the default at a later date.

Because the new undercloud_deploy installer uses t-h-t we get
containers for
free. Additionally as we convert over to Ansible instead of Heat
software
deployments we also get better operator feedback there as well.
Woudn't it
be nice to have an Undercloud installer driven by Ansible instead
of Python
and tripleo-image-elements?

Yup, and once again I recognize this as a benefit.

The reason I linked in doit.sh above (and if you actually go and
look at the
recent patches) we are already wiring these things up right now
(before M1!)
and it looks really nice. As we eventually move away from Puppet
for
configuration that too goes away. So I think the idea here is a
net-reduction in complexity because we no longer have to maintain
instack-undercloud, puppet modules, and elements.

It isn't that the undercloud install is a limiting factor. It is
that the
set of services making up your "Undercloud" can be anything you
want because
t-h-t supports all of our services. Anything you want with minimal
t-h-t,
Ansible, and containers. This means you can effectively develop on
a single
node for many cases and it will just work in a multi-node Overcloud
setup
too because we have the same architecture.

My concern is making sure we aren't moving too fast and introducing
more regressions/bugs/missing use cases/etc. My hope is by
documenting
all of this, ensuring we have proper expectations around a definition
of done (and time frames), and allowing for additional review, we
will
reduce the risk introduced by this switch. These types of things
align with what we talked about at the PTG in during the retro[0]
(see: start define definition of done, start status reporting on ML,
stop over committing, stop big change without tests, less complexity,
etc, etc). This stuff's complicated, let's make sure we do it right.

Thanks,
-Alex

[0] http://people.redhat.com/aschultz/denver-ptg/tripleo-ptg-retro.jp
g

Dan

-Development: The containerized undercloud is a great
development
tool. It utilizes the same framework as the full overcloud
deployment
but takes about 20 minutes to deploy. This means faster
iterations,
less waiting, and more testing. Having this be a first
class
citizen
in the ecosystem will ensure this platform is functioning
for
developers to use all the time.

Seems to go with the previous question about the re-usability
for
people who are not developers. Has everyone (including non-
container
folks) tried this out and attest that it's a better workflow
for
them?
Are there use cases that are made worse by switching?

I would let other chime in but the feedback I've gotten has
mostly been
that it improves the dev/test cycle greatly.

-CI resources: better use of CI resources. At the PTG we
received
feedback from the OpenStack infrastructure team that our
upstream
CI
resource usage is quite high at times (even as high as 50%
of the
total). Because of the shared framework and single node
capabilities we
can re-architecture much of our upstream CI matrix around
single
node.
We no longer require multinode jobs to be able to test many
of the
services in tripleo-heat-templates... we can just use a
single
cloud VM
instead. We'll still want multinode undercloud -> overcloud
jobs
for
testing things like HA and baremetal provisioning. But we
can cover
a
large set of the services (in particular many of the new
scenario
jobs
we added in Pike) with single node CI test runs in much
less time.

I like this idea but would like to see more details around
this.
Since this is a new feature we need to make sure that we are
properly
covering the containerized undercloud with CI as well. I
think we
need 3 jobs to properly cover this feature before marking it
done. I
added them to the etherpad but I think we need to ensure the
following
3 jobs are defined and voting by M2 to consider actually
switching
from the current instack-undercloud installation to the
containerized
version.

1) undercloud-containers - a containerized install, should be
voting
by m1
2) undercloud-containers-update - minor updates run on
containerized
underclouds, should be voting by m2
3) undercloud-containers-upgrade - major upgrade from
non-containerized to containerized undercloud, should be
voting by
m2.

If we have these jobs, is there anything we can drop or mark
as
covered that is currently being covered by an overcloud job?

Can you please comment on these expectations as being
achievable? If
they are not achievable, I don't think we can agree to switch the
default for Queens. As we shipped the 'undercloud deploy' as
experimental for Pike, it's well within reason to continue to do
so
for Queens. Perhaps we change the labeling to beta or working it
into
a --containerized option for 'undercloud install'.

I think my ask for the undercloud-containers job as non-voting by
m1
is achievable today because it's currently green (pending any
zuul
freezes). My concern is really minor updates and upgrades need to
be
understood and accounted for ASAP. If we're truly able to reuse
some
of the work we did for O->P upgrades, then these should be fairly
straight forward things to accomplish and there would be fewer
blockers to make the switch.

-Containers: There are no plans to containerize the
existing
instack-
undercloud work. By moving our undercloud installer to a
tripleo-
heat-
templates and Ansible architecture we can leverage
containers.
Interestingly, the same installer also supports baremetal
(package)
installation as well at this point. Like to overcloud
however I
think
making containers our undercloud default would better align
the
TripleO
tooling.

We are actively working through a few issues with the
deployment
framework Ansible effort to fully integrate that into the
undercloud
installer. We are also reaching out to other teams like the
UI and
Security folks to coordinate the efforts around those
components.
If
there are any questions about the effort or you'd like to
be
involved
in the implementation let us know. Stay tuned for more
specific
updates
as we organize to get as much of this in M1 and M2 as
possible.

I would like to see weekly updates on this effort during the
IRC
meeting. As previously mentioned around squad status, I'll be
asking
for them during the meeting so it would be nice to get an
update this
on a weekly basis so we can make sure that we'll be OK to cut
over.

Also what does the cut over plan look like? This is
something that
might be beneficial to have in a spec. IMHO, I'm ok to
continue
pushing the container effort using the openstack undercloud
deploy
method for now. Once we have voting CI jobs and the feature
list has
been covered then we can evaluate if we've made the M2 time
frame to
switching openstack undercloud deploy to be the new
undercloud
install. I want to make sure we don't introduce regressions
and are
doing thing in a user friendly fashion since the undercloud
is the
first intro an end user gets to tripleo. It would be a good
idea to
review what the new install process looks like and make sure
it "just
works" given that the current process[0] (with all it's
flaws) is
fairly trivial to perform.

Basically what I would like to see before making this new default
is:
1) minor updates work (with CI)
2) P->Q upgrades work (with CI)
3) Documentation complete
4) no UX impact for installation (eg. how they installed it
before is
the same as they install it now for containers)

If these are accounted for and completed before M2 then I would
be +2
on the switch.

Thanks,
-Alex

[0] https://docs.openstack.org/tripleo-docs/latest/install/in
stallati
on/installation.html#installing-the-undercloud

On behalf of the containers team,

Dan

[1] https://etherpad.openstack.org/p/tripleo-queens-undercl
oud-cont
aine
rs




OpenStack Development Mailing List (not for usage
questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subj
ect:unsu
bscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/opensta
ck-dev




OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subjec
t:unsubs
cribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-d
ev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:un
subscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsu
bscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubs
cribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 4, 2017 by Dan_Prince (8,160 points)   1 4 6
0 votes

(top-posting, as it is not a direct response to a specific line)

This is your friendly reminder that we're not quite near containerized
ironic-inspector. The THT for it has probably never been tested at all, and the
iptables magic we do may simply not be containers-compatible. Milan would
appreciate any help with his ironic-inspector rework.

Dmitry

On 10/04/2017 03:00 PM, Dan Prince wrote:
On Tue, 2017-10-03 at 16:03 -0600, Alex Schultz wrote:

On Tue, Oct 3, 2017 at 2:46 PM, Dan Prince dprince@redhat.com
wrote:

On Tue, Oct 3, 2017 at 3:50 PM, Alex Schultz aschultz@redhat.com
wrote:

On Tue, Oct 3, 2017 at 11:12 AM, Dan Prince dprince@redhat.com
wrote:

On Mon, 2017-10-02 at 15:20 -0600, Alex Schultz wrote:

Hey Dan,

Thanks for sending out a note about this. I have a few
questions
inline.

On Mon, Oct 2, 2017 at 6:02 AM, Dan Prince <dprince@redhat.co
m>
wrote:

One of the things the TripleO containers team is planning
on
tackling
in Queens is fully containerizing the undercloud. At the
PTG we
created
an etherpad [1] that contains a list of features that need
to be
implemented to fully replace instack-undercloud.

I know we talked about this at the PTG and I was skeptical
that this
will land in Queens. With the exception of the Container's
team
wanting this, I'm not sure there is an actual end user who is
looking
for the feature so I want to make sure we're not just doing
more work
because we as developers think it's a good idea.

I've heard from several operators that they were actually
surprised we
implemented containers in the Overcloud first. Validating a new
deployment framework on a single node Undercloud (for
operators) before
overtaking their entire cloud deployment has a lot of merit to
it IMO.
When you share the same deployment architecture across the
overcloud/undercloud it puts us in a better position to decide
where to
expose new features to operators first (when creating the
undercloud or
overcloud for example).

Also, if you read my email again I've explicitly listed the
"Containers" benefit last. While I think moving the undercloud
to
containers is a great benefit all by itself this is more of a
"framework alignment" in TripleO and gets us out of maintaining
huge
amounts of technical debt. Re-using the same framework for the
undercloud and overcloud has a lot of merit. It effectively
streamlines
the development process for service developers, and 3rd parties
wishing
to integrate some of their components on a single node. Why be
forced
to create a multi-node dev environment if you don't have to
(aren't
using HA for example).

Lets be honest. While instack-undercloud helped solve the old
"seed" VM
issue it was outdated the day it landed upstream. The entire
premise of
the tool is that it uses old style "elements" to create the
undercloud
and we moved away from those as the primary means driving the
creation
of the Overcloud years ago at this point. The new
'undercloud_deploy'
installer gets us back to our roots by once again sharing the
same
architecture to create the over and underclouds. A demo from
long ago
expands on this idea a bit: https://www.youtube.com/watch?v=y1
qMDLAf26
Q&t=5s

In short, we aren't just doing more work because developers
think it is
a good idea. This has potential to be one of the most useful
architectural changes in TripleO that we've made in years.
Could
significantly decrease our CI reasources if we use it to
replace the
existing scenarios jobs which take multiple VMs per job. Is a
building
block we could use for other features like and HA undercloud.
And yes,
it does also have a huge impact on developer velocity in that
many of
us already prefer to use the tool as a means of streamlining
our
dev/test cycles to minutes instead of hours. Why spend hours
running
quickstart Ansible scripts when in many cases you can just
doit.sh. htt
ps://github.com/dprince/undercloud_containers/blob/master/doit.
sh

So like I've repeatedly said, I'm not completely against it as I
agree
what we have is not ideal. I'm not -2, I'm -1 pending additional
information. I'm trying to be realistic and reduce our risk for
this
cycle.

This reduces our complexity greatly I think in that once it is
completed
will allow us to eliminate two project (instack and instack-
undercloud) and
the maintenance thereof. Furthermore, as this dovetails nice with
the
Ansible

I agree. So I think there's some misconceptions here about my
thoughts
on this effort. I am not against this effort. I am for this effort
and
wish to see more of it. I want to see the effort communicated
publicly
via ML and IRC meetings. What I am against switching the default
undercloud method until the containerization of the undercloud has
the
appropriate test coverage and documentation to ensure it is on par
with what it is replacing. Does this make sense?

IMHO doit.sh is not acceptable as an undercloud installer and
this is what I've been trying to point out as the actual impact
to the
end user who has to use this thing.

doit.sh is an example of where the effort is today. It is
essentially the
same stuff we document online here:
http://tripleo.org/install/containers_deployment/undercloud.html.

Similar to quickstart it is just something meant to help you setup
a dev
environment.

Right, providing something that the non-developer uses vs providing
something for hacking are two separate things. Making it consumable
by
the end user (not developer) is what I'm pointing out that needs to
be
accounted for. This is a recurring theme that I have pushed for in
OpenStack to ensure that the operator (actual end user) is accounted
for when making decisions. Tripleo has not done a good job of this
either. Sure the referenced documentation works for the dev case,
but
probably not the actual deployer/operator case.

This will come in time. What I would encourage us to do upstream is
make as much progress on this in Queens as possible so that getting to
the point of polishing our documentation is the focus... instead of the
remaining work.

And to be clear all of this work advocates for the Operator just as
much as it does for the developer. No regressions, improved Ansible
feedback on the CLI, potential for future features around multitude and
alignment of the architecture around containers. Boom! I think
operators will like all of this. We can and will document it.

There needs to be a
migration guide or documentation of old configuration -> new
configuration for the people who are familiar with non-containerized
undercloud vs containerized undercloud. Do we have all the use cases
accounted for etc. etc. This is the part that I don't think we have
figured out and which is what I'm asking that we make sure we account
for with this.

The use case is the replace instack-undercloud with no feature
regressions.

We have an established
installation method for the undercloud, that while isn't great,
isn't
a bash script with git fetches, etc. So as for the
implementation,
this is what I want to see properly flushed out prior to
accepting
this feature as complete for Queens (and the new default).

Of course the feature would need to prove itself before it becomes
the new
default Undercloud. I'm trying to build consensus and get the team
focused
on these things.

What strikes me as odd is your earlier comment about " I want to
make sure
we're not just doing more work because we as developers think it's
a good
idea." I'm a developer and I do think this is a good idea. Please
don't try
to de-motivate this effort just because you happen to believe this.
It was
accepted for Pike and unfortunately we didn't get enough buy in
early enough
to get focus on it. Now that is starting to change and just as it
is you are
suggesting we not keep it a priority?

Once again, I agree and I am on board to the end goal that I think is
trying to be achieved by this effort. What I am currently not on
board
with is the time frame of for Queens based on concerns previously
mentioned. This is not about trying to demotivating an effort. It's
about ensuring quality and something that is consumable by an
additional set of end users of the software (the operator/deployer,
not developer). Given that we have not finished the overcloud
deployment and are still working on fixing items found for that, I
personally feel it's a bit early to consider switching the undercloud
default install to a containerized method. That being said, I have
repeatedly stated that if we account for updates, upgrades, docs and
the operator UX there's no problems with this effort. I just don't
think it's realistic given current timelines (~9 weeks).
Please feel
free to provide information/patches to the contrary.

Whether this feature makes the release or not I think it is too early
to say. What I can say is the amount of work remaining on the
Undercloud feature is IMO a good bit less than we knocked out in the
last release:

https://etherpad.openstack.org/p/tripleo-composable-containers-underclo
ud

And regardless of whether we make the release or not there is a huge
value to moving the work forward now... if only to put us in a better
position for the next release.

I've been on the containers team for a while now and I'm more familiar
with the velocity that we could handle. Let us motivate ourselves and
give updates along the way over the next 2 months as this effort
progresses. Please don't throw "cold water" on why you don't think we
are going to make the release (especially as PTL, this can be quite
harmful to the effort for some). In fact, lets just stop talking about
Queens, and Rocky entirely. I think we can agree that this feature is a
high priority and have people move the effort forward as much as we
can.

This is a very important feature. It can be fun to work on. Let those
of us who are doing the work finish scoping it and at least have a
chance at making progress before you throw weight against us not making
the release months from now.

I have not said
don't work on it. I just want to make sure we have all the pieces in
place needed to consider it a proper replacement for the existing
undercloud installation (by M2). If anything there's probably more
work that needs to be done and if we want to make it a priority to
happen, then it needs to be documented and communicated so folks can
assist as they have cycles.

I would
like to see a plan of what features need to be added (eg. the
stuff on
the etherpad), folks assigned to do this work, and estimated
timelines. Given that we shouldn't be making major feature
changes
after M2 (~9 weeks), I want to get an understanding of what is
realistically going to make it. If after reviewing the initial
details we find that it's not actually going to make M2, then
let's
agree to this now rather than trying to force it in at the end.

All of this is forthcoming. Those details will come in time.

I know you've been a great proponent of the containerized
undercloud
and I agree it offers a lot more for development efforts. But I
just
want to make sure that we are getting all the feedback we can
before
continuing down this path. Since, as you point out, a bunch of
this
work is already available for consumption by developers, I don't
see
making it the new default as a requirement for Queens unless it's
a
fully implemented and tested. There's nothing stopping folks
from
using it now and making incremental improvements during Queens
and we
commit to making it the new default for Rocky.

The point of this cycle was supposed to be more
stablization/getting
all the containers in place. Doing something like this seems to
go
against what we were actually trying to achieve. I'd rather make
smaller incremental progress with your proposal being the end
goal and
agreeing that perhaps Rocky is more realistic for the default cut
over.

I thought the point of this release was full containerization? And
part of
that is containerizing the undercloud too right?

Not that I was aware of. Others have asked because they have not been
aware that it included the undercloud. Given that we are wanting to
eventually look to kubernetes maybe we don't need to containerize the
undercloud as it may be it could be discarded with that switch.

I don't think so. The whole point of the initial Undercloud work was
that it aligns the architectures. Using Kubernetes to maintain an
Undercloud would also be a valid approach I think. Perhaps a bit
overkill but it would be a super useful dev environment tool to develop
Kubernetes services on regardless.

And again, there are no plans to containerize instack-undercloud
components as is. I think we have agreement that using containers in
the Undercloud is a high priority and we need to move this effort
forwards.

That's probably a longer discussion. It might need to be researched
which is why it's important to understand why we're doing the
containerization effort and what exactly it entails. Given that I
don't think we're looking to deploy kubernetes via
THT/tripleo-puppet/containers, I wonder what impact this would have
with this effort? That's probably a conversation for another thread.

Lastly, this isn't just a containers team thing. We've been
using the
underclouddeploy architecture across many teams to help
develop for
almost an entire cycle now. Huge benefits. I would go as far as
saying
that undercloud
deploy was the biggest feature in Pike that
enabled
us to bang out a majority of the docker/service templates in
tripleo-
heat-templates.

Given that etherpad
appears to contain a pretty big list of features, are we
going to be
able to land all of them by M2? Would it be beneficial to
craft a
basic spec related to this to ensure we are not missing
additional
things?

I'm not sure there is a lot of value in creating a spec at this
point.
We've already got an approved blueprint for the feature in Pike
here: h
ttps://blueprints.launchpad.net/tripleo/+spec/containerized-
undercloud

I think we might get more velocity out of grooming the etherpad
and
perhaps dividing this work among the appropriate teams.

That's fine, but I would like to see additional efforts made to
organize this work, assign folks and add proper timelines.

Benefits of this work:

-Alignment: aligning the undercloud and overcloud
installers gets
rid
of dual maintenance of services.

I like reusing existing stuff. +1

-Composability: tripleo-heat-templates and our new Ansible
architecture around it are composable. This means any set
of
services
can be used to build up your own undercloud. In other words
the
framework here isn't just useful for "underclouds". It is
really
the
ability to deploy Tripleo on a single node with no external
dependencies. Single node TripleO installer. The containers
team
has
already been leveraging existing (experimental)
undercloud_deploy
installer to develop services for Pike.

Is this something that is actually being asked for or is this
just an
added bonus because it allows developers to reduce what is
actually
being deployed for testing?

There is an implied ask for this feature when a new developer
starts to
use TripleO. Right now resource bar is quite high for TripleO.
You have
to have a multi-node development environment at the very least
(one
undercloud node, and one overcloud node). The ideas we are
talking
about here short circuits this in many cases... where if you
aren't
testing HA services or Ironic you could simple use
undercloud_deploy to
test tripleo-heat-template changes on a single VM. Less
resources, and
much less time spent learning and waiting.

IMHO I don't think the undercloud install is the limiting factor
for
new developers and I'm not sure this is actually reducing that
complexity. It does reduce the amount of hardware needed to
develop
some items, but there's a cost in complexity by moving the
configuration to THT which is already where many people
struggle. As
I previously mentioned, there's nothing stopping us from
promoting the
containerized undercloud as a development tool and ensuring it's
full
featured before switching to it as the default at a later date.

Because the new undercloud_deploy installer uses t-h-t we get
containers for
free. Additionally as we convert over to Ansible instead of Heat
software
deployments we also get better operator feedback there as well.
Woudn't it
be nice to have an Undercloud installer driven by Ansible instead
of Python
and tripleo-image-elements?

Yup, and once again I recognize this as a benefit.

The reason I linked in doit.sh above (and if you actually go and
look at the
recent patches) we are already wiring these things up right now
(before M1!)
and it looks really nice. As we eventually move away from Puppet
for
configuration that too goes away. So I think the idea here is a
net-reduction in complexity because we no longer have to maintain
instack-undercloud, puppet modules, and elements.

It isn't that the undercloud install is a limiting factor. It is
that the
set of services making up your "Undercloud" can be anything you
want because
t-h-t supports all of our services. Anything you want with minimal
t-h-t,
Ansible, and containers. This means you can effectively develop on
a single
node for many cases and it will just work in a multi-node Overcloud
setup
too because we have the same architecture.

My concern is making sure we aren't moving too fast and introducing
more regressions/bugs/missing use cases/etc. My hope is by
documenting
all of this, ensuring we have proper expectations around a definition
of done (and time frames), and allowing for additional review, we
will
reduce the risk introduced by this switch. These types of things
align with what we talked about at the PTG in during the retro[0]
(see: start define definition of done, start status reporting on ML,
stop over committing, stop big change without tests, less complexity,
etc, etc). This stuff's complicated, let's make sure we do it right.

Thanks,
-Alex

[0] http://people.redhat.com/aschultz/denver-ptg/tripleo-ptg-retro.jp
g

Dan

-Development: The containerized undercloud is a great
development
tool. It utilizes the same framework as the full overcloud
deployment
but takes about 20 minutes to deploy. This means faster
iterations,
less waiting, and more testing. Having this be a first
class
citizen
in the ecosystem will ensure this platform is functioning
for
developers to use all the time.

Seems to go with the previous question about the re-usability
for
people who are not developers. Has everyone (including non-
container
folks) tried this out and attest that it's a better workflow
for
them?
Are there use cases that are made worse by switching?

I would let other chime in but the feedback I've gotten has
mostly been
that it improves the dev/test cycle greatly.

-CI resources: better use of CI resources. At the PTG we
received
feedback from the OpenStack infrastructure team that our
upstream
CI
resource usage is quite high at times (even as high as 50%
of the
total). Because of the shared framework and single node
capabilities we
can re-architecture much of our upstream CI matrix around
single
node.
We no longer require multinode jobs to be able to test many
of the
services in tripleo-heat-templates... we can just use a
single
cloud VM
instead. We'll still want multinode undercloud -> overcloud
jobs
for
testing things like HA and baremetal provisioning. But we
can cover
a
large set of the services (in particular many of the new
scenario
jobs
we added in Pike) with single node CI test runs in much
less time.

I like this idea but would like to see more details around
this.
Since this is a new feature we need to make sure that we are
properly
covering the containerized undercloud with CI as well. I
think we
need 3 jobs to properly cover this feature before marking it
done. I
added them to the etherpad but I think we need to ensure the
following
3 jobs are defined and voting by M2 to consider actually
switching
from the current instack-undercloud installation to the
containerized
version.

1) undercloud-containers - a containerized install, should be
voting
by m1
2) undercloud-containers-update - minor updates run on
containerized
underclouds, should be voting by m2
3) undercloud-containers-upgrade - major upgrade from
non-containerized to containerized undercloud, should be
voting by
m2.

If we have these jobs, is there anything we can drop or mark
as
covered that is currently being covered by an overcloud job?

Can you please comment on these expectations as being
achievable? If
they are not achievable, I don't think we can agree to switch the
default for Queens. As we shipped the 'undercloud deploy' as
experimental for Pike, it's well within reason to continue to do
so
for Queens. Perhaps we change the labeling to beta or working it
into
a --containerized option for 'undercloud install'.

I think my ask for the undercloud-containers job as non-voting by
m1
is achievable today because it's currently green (pending any
zuul
freezes). My concern is really minor updates and upgrades need to
be
understood and accounted for ASAP. If we're truly able to reuse
some
of the work we did for O->P upgrades, then these should be fairly
straight forward things to accomplish and there would be fewer
blockers to make the switch.

-Containers: There are no plans to containerize the
existing
instack-
undercloud work. By moving our undercloud installer to a
tripleo-
heat-
templates and Ansible architecture we can leverage
containers.
Interestingly, the same installer also supports baremetal
(package)
installation as well at this point. Like to overcloud
however I
think
making containers our undercloud default would better align
the
TripleO
tooling.

We are actively working through a few issues with the
deployment
framework Ansible effort to fully integrate that into the
undercloud
installer. We are also reaching out to other teams like the
UI and
Security folks to coordinate the efforts around those
components.
If
there are any questions about the effort or you'd like to
be
involved
in the implementation let us know. Stay tuned for more
specific
updates
as we organize to get as much of this in M1 and M2 as
possible.

I would like to see weekly updates on this effort during the
IRC
meeting. As previously mentioned around squad status, I'll be
asking
for them during the meeting so it would be nice to get an
update this
on a weekly basis so we can make sure that we'll be OK to cut
over.

Also what does the cut over plan look like? This is
something that
might be beneficial to have in a spec. IMHO, I'm ok to
continue
pushing the container effort using the openstack undercloud
deploy
method for now. Once we have voting CI jobs and the feature
list has
been covered then we can evaluate if we've made the M2 time
frame to
switching openstack undercloud deploy to be the new
undercloud
install. I want to make sure we don't introduce regressions
and are
doing thing in a user friendly fashion since the undercloud
is the
first intro an end user gets to tripleo. It would be a good
idea to
review what the new install process looks like and make sure
it "just
works" given that the current process[0] (with all it's
flaws) is
fairly trivial to perform.

Basically what I would like to see before making this new default
is:
1) minor updates work (with CI)
2) P->Q upgrades work (with CI)
3) Documentation complete
4) no UX impact for installation (eg. how they installed it
before is
the same as they install it now for containers)

If these are accounted for and completed before M2 then I would
be +2
on the switch.

Thanks,
-Alex

[0] https://docs.openstack.org/tripleo-docs/latest/install/in
stallati
on/installation.html#installing-the-undercloud

On behalf of the containers team,

Dan

[1] https://etherpad.openstack.org/p/tripleo-queens-undercl
oud-cont
aine
rs




OpenStack Development Mailing List (not for usage
questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subj
ect:unsu
bscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/opensta
ck-dev




OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subjec
t:unsubs
cribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-d
ev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:un
subscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsu
bscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubs
cribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 4, 2017 by Dmitry_Tantsur (18,080 points)   2 3 7
0 votes

On Wed, Oct 4, 2017 at 7:00 AM, Dan Prince dprince@redhat.com wrote:
On Tue, 2017-10-03 at 16:03 -0600, Alex Schultz wrote:

On Tue, Oct 3, 2017 at 2:46 PM, Dan Prince dprince@redhat.com
wrote:

On Tue, Oct 3, 2017 at 3:50 PM, Alex Schultz aschultz@redhat.com
wrote:

On Tue, Oct 3, 2017 at 11:12 AM, Dan Prince dprince@redhat.com
wrote:

On Mon, 2017-10-02 at 15:20 -0600, Alex Schultz wrote:

Hey Dan,

Thanks for sending out a note about this. I have a few
questions
inline.

On Mon, Oct 2, 2017 at 6:02 AM, Dan Prince <dprince@redhat.co
m>
wrote:

One of the things the TripleO containers team is planning
on
tackling
in Queens is fully containerizing the undercloud. At the
PTG we
created
an etherpad [1] that contains a list of features that need
to be
implemented to fully replace instack-undercloud.

I know we talked about this at the PTG and I was skeptical
that this
will land in Queens. With the exception of the Container's
team
wanting this, I'm not sure there is an actual end user who is
looking
for the feature so I want to make sure we're not just doing
more work
because we as developers think it's a good idea.

I've heard from several operators that they were actually
surprised we
implemented containers in the Overcloud first. Validating a new
deployment framework on a single node Undercloud (for
operators) before
overtaking their entire cloud deployment has a lot of merit to
it IMO.
When you share the same deployment architecture across the
overcloud/undercloud it puts us in a better position to decide
where to
expose new features to operators first (when creating the
undercloud or
overcloud for example).

Also, if you read my email again I've explicitly listed the
"Containers" benefit last. While I think moving the undercloud
to
containers is a great benefit all by itself this is more of a
"framework alignment" in TripleO and gets us out of maintaining
huge
amounts of technical debt. Re-using the same framework for the
undercloud and overcloud has a lot of merit. It effectively
streamlines
the development process for service developers, and 3rd parties
wishing
to integrate some of their components on a single node. Why be
forced
to create a multi-node dev environment if you don't have to
(aren't
using HA for example).

Lets be honest. While instack-undercloud helped solve the old
"seed" VM
issue it was outdated the day it landed upstream. The entire
premise of
the tool is that it uses old style "elements" to create the
undercloud
and we moved away from those as the primary means driving the
creation
of the Overcloud years ago at this point. The new
'undercloud_deploy'
installer gets us back to our roots by once again sharing the
same
architecture to create the over and underclouds. A demo from
long ago
expands on this idea a bit: https://www.youtube.com/watch?v=y1
qMDLAf26
Q&t=5s

In short, we aren't just doing more work because developers
think it is
a good idea. This has potential to be one of the most useful
architectural changes in TripleO that we've made in years.
Could
significantly decrease our CI reasources if we use it to
replace the
existing scenarios jobs which take multiple VMs per job. Is a
building
block we could use for other features like and HA undercloud.
And yes,
it does also have a huge impact on developer velocity in that
many of
us already prefer to use the tool as a means of streamlining
our
dev/test cycles to minutes instead of hours. Why spend hours
running
quickstart Ansible scripts when in many cases you can just
doit.sh. htt
ps://github.com/dprince/undercloud_containers/blob/master/doit.
sh

So like I've repeatedly said, I'm not completely against it as I
agree
what we have is not ideal. I'm not -2, I'm -1 pending additional
information. I'm trying to be realistic and reduce our risk for
this
cycle.

This reduces our complexity greatly I think in that once it is
completed
will allow us to eliminate two project (instack and instack-
undercloud) and
the maintenance thereof. Furthermore, as this dovetails nice with
the
Ansible

I agree. So I think there's some misconceptions here about my
thoughts
on this effort. I am not against this effort. I am for this effort
and
wish to see more of it. I want to see the effort communicated
publicly
via ML and IRC meetings. What I am against switching the default
undercloud method until the containerization of the undercloud has
the
appropriate test coverage and documentation to ensure it is on par
with what it is replacing. Does this make sense?

IMHO doit.sh is not acceptable as an undercloud installer and
this is what I've been trying to point out as the actual impact
to the
end user who has to use this thing.

doit.sh is an example of where the effort is today. It is
essentially the
same stuff we document online here:
http://tripleo.org/install/containers_deployment/undercloud.html.

Similar to quickstart it is just something meant to help you setup
a dev
environment.

Right, providing something that the non-developer uses vs providing
something for hacking are two separate things. Making it consumable
by
the end user (not developer) is what I'm pointing out that needs to
be
accounted for. This is a recurring theme that I have pushed for in
OpenStack to ensure that the operator (actual end user) is accounted
for when making decisions. Tripleo has not done a good job of this
either. Sure the referenced documentation works for the dev case,
but
probably not the actual deployer/operator case.

This will come in time. What I would encourage us to do upstream is
make as much progress on this in Queens as possible so that getting to
the point of polishing our documentation is the focus... instead of the
remaining work.

And to be clear all of this work advocates for the Operator just as
much as it does for the developer. No regressions, improved Ansible
feedback on the CLI, potential for future features around multitude and
alignment of the architecture around containers. Boom! I think
operators will like all of this. We can and will document it.

There needs to be a
migration guide or documentation of old configuration -> new
configuration for the people who are familiar with non-containerized
undercloud vs containerized undercloud. Do we have all the use cases
accounted for etc. etc. This is the part that I don't think we have
figured out and which is what I'm asking that we make sure we account
for with this.

The use case is the replace instack-undercloud with no feature
regressions.

We have an established
installation method for the undercloud, that while isn't great,
isn't
a bash script with git fetches, etc. So as for the
implementation,
this is what I want to see properly flushed out prior to
accepting
this feature as complete for Queens (and the new default).

Of course the feature would need to prove itself before it becomes
the new
default Undercloud. I'm trying to build consensus and get the team
focused
on these things.

What strikes me as odd is your earlier comment about " I want to
make sure
we're not just doing more work because we as developers think it's
a good
idea." I'm a developer and I do think this is a good idea. Please
don't try
to de-motivate this effort just because you happen to believe this.
It was
accepted for Pike and unfortunately we didn't get enough buy in
early enough
to get focus on it. Now that is starting to change and just as it
is you are
suggesting we not keep it a priority?

Once again, I agree and I am on board to the end goal that I think is
trying to be achieved by this effort. What I am currently not on
board
with is the time frame of for Queens based on concerns previously
mentioned. This is not about trying to demotivating an effort. It's
about ensuring quality and something that is consumable by an
additional set of end users of the software (the operator/deployer,
not developer). Given that we have not finished the overcloud
deployment and are still working on fixing items found for that, I
personally feel it's a bit early to consider switching the undercloud
default install to a containerized method. That being said, I have
repeatedly stated that if we account for updates, upgrades, docs and
the operator UX there's no problems with this effort. I just don't
think it's realistic given current timelines (~9 weeks).
Please feel
free to provide information/patches to the contrary.

Whether this feature makes the release or not I think it is too early
to say. What I can say is the amount of work remaining on the
Undercloud feature is IMO a good bit less than we knocked out in the
last release:

https://etherpad.openstack.org/p/tripleo-composable-containers-underclo
ud

And regardless of whether we make the release or not there is a huge
value to moving the work forward now... if only to put us in a better
position for the next release.

I've been on the containers team for a while now and I'm more familiar
with the velocity that we could handle. Let us motivate ourselves and
give updates along the way over the next 2 months as this effort
progresses. Please don't throw "cold water" on why you don't think we
are going to make the release (especially as PTL, this can be quite
harmful to the effort for some). In fact, lets just stop talking about
Queens, and Rocky entirely. I think we can agree that this feature is a
high priority and have people move the effort forward as much as we
can.

This is a very important feature. It can be fun to work on. Let those
of us who are doing the work finish scoping it and at least have a
chance at making progress before you throw weight against us not making
the release months from now.

I'm not trying to slow you down on. I'm only trying point out that
there are specifics that must be accomplished before we can consider
something as ready and what you're asking is quite the change. It's a
reminder of deadlines and ensuring that they are met before assuming
it will make the release. Please use M2 for your planning as the
target destination. If you can make what I have asked by M2, then
switching should not be a problem. If we get to M2 and it's not quite
done, let's evaluate the outstanding work and see if it makes sense.

I encourage additional planning and work items to be publicized as
soon as possible. Given that M1 is in 2 weeks, it's just a reminder
that these are the deadlines. As PTL I don't make the schedule. I'm
just reminding you that it exists[0] and needs to be taken into
consideration. I would like to remind you that we also have additional
containerization items that are necessary for the overcloud which I
think should take precedence of the undercloud containerization as the
default. Once again if you and the containerization team feel you can
make it, great let's get it done. I look forward to reviewing and
approving patches as they show up.

Thanks,
-Alex

[0] https://releases.openstack.org/queens/schedule.html

I have not said
don't work on it. I just want to make sure we have all the pieces in
place needed to consider it a proper replacement for the existing
undercloud installation (by M2). If anything there's probably more
work that needs to be done and if we want to make it a priority to
happen, then it needs to be documented and communicated so folks can
assist as they have cycles.

I would
like to see a plan of what features need to be added (eg. the
stuff on
the etherpad), folks assigned to do this work, and estimated
timelines. Given that we shouldn't be making major feature
changes
after M2 (~9 weeks), I want to get an understanding of what is
realistically going to make it. If after reviewing the initial
details we find that it's not actually going to make M2, then
let's
agree to this now rather than trying to force it in at the end.

All of this is forthcoming. Those details will come in time.

I know you've been a great proponent of the containerized
undercloud
and I agree it offers a lot more for development efforts. But I
just
want to make sure that we are getting all the feedback we can
before
continuing down this path. Since, as you point out, a bunch of
this
work is already available for consumption by developers, I don't
see
making it the new default as a requirement for Queens unless it's
a
fully implemented and tested. There's nothing stopping folks
from
using it now and making incremental improvements during Queens
and we
commit to making it the new default for Rocky.

The point of this cycle was supposed to be more
stablization/getting
all the containers in place. Doing something like this seems to
go
against what we were actually trying to achieve. I'd rather make
smaller incremental progress with your proposal being the end
goal and
agreeing that perhaps Rocky is more realistic for the default cut
over.

I thought the point of this release was full containerization? And
part of
that is containerizing the undercloud too right?

Not that I was aware of. Others have asked because they have not been
aware that it included the undercloud. Given that we are wanting to
eventually look to kubernetes maybe we don't need to containerize the
undercloud as it may be it could be discarded with that switch.

I don't think so. The whole point of the initial Undercloud work was
that it aligns the architectures. Using Kubernetes to maintain an
Undercloud would also be a valid approach I think. Perhaps a bit
overkill but it would be a super useful dev environment tool to develop
Kubernetes services on regardless.

And again, there are no plans to containerize instack-undercloud
components as is. I think we have agreement that using containers in
the Undercloud is a high priority and we need to move this effort
forwards.

That's probably a longer discussion. It might need to be researched
which is why it's important to understand why we're doing the
containerization effort and what exactly it entails. Given that I
don't think we're looking to deploy kubernetes via
THT/tripleo-puppet/containers, I wonder what impact this would have
with this effort? That's probably a conversation for another thread.

Lastly, this isn't just a containers team thing. We've been
using the
underclouddeploy architecture across many teams to help
develop for
almost an entire cycle now. Huge benefits. I would go as far as
saying
that undercloud
deploy was the biggest feature in Pike that
enabled
us to bang out a majority of the docker/service templates in
tripleo-
heat-templates.

Given that etherpad
appears to contain a pretty big list of features, are we
going to be
able to land all of them by M2? Would it be beneficial to
craft a
basic spec related to this to ensure we are not missing
additional
things?

I'm not sure there is a lot of value in creating a spec at this
point.
We've already got an approved blueprint for the feature in Pike
here: h
ttps://blueprints.launchpad.net/tripleo/+spec/containerized-
undercloud

I think we might get more velocity out of grooming the etherpad
and
perhaps dividing this work among the appropriate teams.

That's fine, but I would like to see additional efforts made to
organize this work, assign folks and add proper timelines.

Benefits of this work:

-Alignment: aligning the undercloud and overcloud
installers gets
rid
of dual maintenance of services.

I like reusing existing stuff. +1

-Composability: tripleo-heat-templates and our new Ansible
architecture around it are composable. This means any set
of
services
can be used to build up your own undercloud. In other words
the
framework here isn't just useful for "underclouds". It is
really
the
ability to deploy Tripleo on a single node with no external
dependencies. Single node TripleO installer. The containers
team
has
already been leveraging existing (experimental)
undercloud_deploy
installer to develop services for Pike.

Is this something that is actually being asked for or is this
just an
added bonus because it allows developers to reduce what is
actually
being deployed for testing?

There is an implied ask for this feature when a new developer
starts to
use TripleO. Right now resource bar is quite high for TripleO.
You have
to have a multi-node development environment at the very least
(one
undercloud node, and one overcloud node). The ideas we are
talking
about here short circuits this in many cases... where if you
aren't
testing HA services or Ironic you could simple use
undercloud_deploy to
test tripleo-heat-template changes on a single VM. Less
resources, and
much less time spent learning and waiting.

IMHO I don't think the undercloud install is the limiting factor
for
new developers and I'm not sure this is actually reducing that
complexity. It does reduce the amount of hardware needed to
develop
some items, but there's a cost in complexity by moving the
configuration to THT which is already where many people
struggle. As
I previously mentioned, there's nothing stopping us from
promoting the
containerized undercloud as a development tool and ensuring it's
full
featured before switching to it as the default at a later date.

Because the new undercloud_deploy installer uses t-h-t we get
containers for
free. Additionally as we convert over to Ansible instead of Heat
software
deployments we also get better operator feedback there as well.
Woudn't it
be nice to have an Undercloud installer driven by Ansible instead
of Python
and tripleo-image-elements?

Yup, and once again I recognize this as a benefit.

The reason I linked in doit.sh above (and if you actually go and
look at the
recent patches) we are already wiring these things up right now
(before M1!)
and it looks really nice. As we eventually move away from Puppet
for
configuration that too goes away. So I think the idea here is a
net-reduction in complexity because we no longer have to maintain
instack-undercloud, puppet modules, and elements.

It isn't that the undercloud install is a limiting factor. It is
that the
set of services making up your "Undercloud" can be anything you
want because
t-h-t supports all of our services. Anything you want with minimal
t-h-t,
Ansible, and containers. This means you can effectively develop on
a single
node for many cases and it will just work in a multi-node Overcloud
setup
too because we have the same architecture.

My concern is making sure we aren't moving too fast and introducing
more regressions/bugs/missing use cases/etc. My hope is by
documenting
all of this, ensuring we have proper expectations around a definition
of done (and time frames), and allowing for additional review, we
will
reduce the risk introduced by this switch. These types of things
align with what we talked about at the PTG in during the retro[0]
(see: start define definition of done, start status reporting on ML,
stop over committing, stop big change without tests, less complexity,
etc, etc). This stuff's complicated, let's make sure we do it right.

Thanks,
-Alex

[0] http://people.redhat.com/aschultz/denver-ptg/tripleo-ptg-retro.jp
g

Dan

-Development: The containerized undercloud is a great
development
tool. It utilizes the same framework as the full overcloud
deployment
but takes about 20 minutes to deploy. This means faster
iterations,
less waiting, and more testing. Having this be a first
class
citizen
in the ecosystem will ensure this platform is functioning
for
developers to use all the time.

Seems to go with the previous question about the re-usability
for
people who are not developers. Has everyone (including non-
container
folks) tried this out and attest that it's a better workflow
for
them?
Are there use cases that are made worse by switching?

I would let other chime in but the feedback I've gotten has
mostly been
that it improves the dev/test cycle greatly.

-CI resources: better use of CI resources. At the PTG we
received
feedback from the OpenStack infrastructure team that our
upstream
CI
resource usage is quite high at times (even as high as 50%
of the
total). Because of the shared framework and single node
capabilities we
can re-architecture much of our upstream CI matrix around
single
node.
We no longer require multinode jobs to be able to test many
of the
services in tripleo-heat-templates... we can just use a
single
cloud VM
instead. We'll still want multinode undercloud -> overcloud
jobs
for
testing things like HA and baremetal provisioning. But we
can cover
a
large set of the services (in particular many of the new
scenario
jobs
we added in Pike) with single node CI test runs in much
less time.

I like this idea but would like to see more details around
this.
Since this is a new feature we need to make sure that we are
properly
covering the containerized undercloud with CI as well. I
think we
need 3 jobs to properly cover this feature before marking it
done. I
added them to the etherpad but I think we need to ensure the
following
3 jobs are defined and voting by M2 to consider actually
switching
from the current instack-undercloud installation to the
containerized
version.

1) undercloud-containers - a containerized install, should be
voting
by m1
2) undercloud-containers-update - minor updates run on
containerized
underclouds, should be voting by m2
3) undercloud-containers-upgrade - major upgrade from
non-containerized to containerized undercloud, should be
voting by
m2.

If we have these jobs, is there anything we can drop or mark
as
covered that is currently being covered by an overcloud job?

Can you please comment on these expectations as being
achievable? If
they are not achievable, I don't think we can agree to switch the
default for Queens. As we shipped the 'undercloud deploy' as
experimental for Pike, it's well within reason to continue to do
so
for Queens. Perhaps we change the labeling to beta or working it
into
a --containerized option for 'undercloud install'.

I think my ask for the undercloud-containers job as non-voting by
m1
is achievable today because it's currently green (pending any
zuul
freezes). My concern is really minor updates and upgrades need to
be
understood and accounted for ASAP. If we're truly able to reuse
some
of the work we did for O->P upgrades, then these should be fairly
straight forward things to accomplish and there would be fewer
blockers to make the switch.

-Containers: There are no plans to containerize the
existing
instack-
undercloud work. By moving our undercloud installer to a
tripleo-
heat-
templates and Ansible architecture we can leverage
containers.
Interestingly, the same installer also supports baremetal
(package)
installation as well at this point. Like to overcloud
however I
think
making containers our undercloud default would better align
the
TripleO
tooling.

We are actively working through a few issues with the
deployment
framework Ansible effort to fully integrate that into the
undercloud
installer. We are also reaching out to other teams like the
UI and
Security folks to coordinate the efforts around those
components.
If
there are any questions about the effort or you'd like to
be
involved
in the implementation let us know. Stay tuned for more
specific
updates
as we organize to get as much of this in M1 and M2 as
possible.

I would like to see weekly updates on this effort during the
IRC
meeting. As previously mentioned around squad status, I'll be
asking
for them during the meeting so it would be nice to get an
update this
on a weekly basis so we can make sure that we'll be OK to cut
over.

Also what does the cut over plan look like? This is
something that
might be beneficial to have in a spec. IMHO, I'm ok to
continue
pushing the container effort using the openstack undercloud
deploy
method for now. Once we have voting CI jobs and the feature
list has
been covered then we can evaluate if we've made the M2 time
frame to
switching openstack undercloud deploy to be the new
undercloud
install. I want to make sure we don't introduce regressions
and are
doing thing in a user friendly fashion since the undercloud
is the
first intro an end user gets to tripleo. It would be a good
idea to
review what the new install process looks like and make sure
it "just
works" given that the current process[0] (with all it's
flaws) is
fairly trivial to perform.

Basically what I would like to see before making this new default
is:
1) minor updates work (with CI)
2) P->Q upgrades work (with CI)
3) Documentation complete
4) no UX impact for installation (eg. how they installed it
before is
the same as they install it now for containers)

If these are accounted for and completed before M2 then I would
be +2
on the switch.

Thanks,
-Alex

[0] https://docs.openstack.org/tripleo-docs/latest/install/in
stallati
on/installation.html#installing-the-undercloud

On behalf of the containers team,

Dan

[1] https://etherpad.openstack.org/p/tripleo-queens-undercl
oud-cont
aine
rs




OpenStack Development Mailing List (not for usage
questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subj
ect:unsu
bscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/opensta
ck-dev




OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subjec
t:unsubs
cribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-d
ev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:un
subscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsu
bscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubs
cribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 4, 2017 by aschultz_at_redhat.c (5,800 points)   2 2 4
...