On Tue, Oct 3, 2017 at 11:12 AM, Dan Prince firstname.lastname@example.org wrote:
On Mon, 2017-10-02 at 15:20 -0600, Alex Schultz wrote:
Thanks for sending out a note about this. I have a few questions
On Mon, Oct 2, 2017 at 6:02 AM, Dan Prince email@example.com
One of the things the TripleO containers team is planning on
in Queens is fully containerizing the undercloud. At the PTG we
an etherpad  that contains a list of features that need to be
implemented to fully replace instack-undercloud.
I know we talked about this at the PTG and I was skeptical that this
will land in Queens. With the exception of the Container's team
wanting this, I'm not sure there is an actual end user who is looking
for the feature so I want to make sure we're not just doing more work
because we as developers think it's a good idea.
I've heard from several operators that they were actually surprised we
implemented containers in the Overcloud first. Validating a new
deployment framework on a single node Undercloud (for operators) before
overtaking their entire cloud deployment has a lot of merit to it IMO.
When you share the same deployment architecture across the
overcloud/undercloud it puts us in a better position to decide where to
expose new features to operators first (when creating the undercloud or
overcloud for example).
Also, if you read my email again I've explicitly listed the
"Containers" benefit last. While I think moving the undercloud to
containers is a great benefit all by itself this is more of a
"framework alignment" in TripleO and gets us out of maintaining huge
amounts of technical debt. Re-using the same framework for the
undercloud and overcloud has a lot of merit. It effectively streamlines
the development process for service developers, and 3rd parties wishing
to integrate some of their components on a single node. Why be forced
to create a multi-node dev environment if you don't have to (aren't
using HA for example).
Lets be honest. While instack-undercloud helped solve the old "seed" VM
issue it was outdated the day it landed upstream. The entire premise of
the tool is that it uses old style "elements" to create the undercloud
and we moved away from those as the primary means driving the creation
of the Overcloud years ago at this point. The new 'undercloud_deploy'
installer gets us back to our roots by once again sharing the same
architecture to create the over and underclouds. A demo from long ago
expands on this idea a bit: https://www.youtube.com/watch?v=y1qMDLAf26
In short, we aren't just doing more work because developers think it is
a good idea. This has potential to be one of the most useful
architectural changes in TripleO that we've made in years. Could
significantly decrease our CI reasources if we use it to replace the
existing scenarios jobs which take multiple VMs per job. Is a building
block we could use for other features like and HA undercloud. And yes,
it does also have a huge impact on developer velocity in that many of
us already prefer to use the tool as a means of streamlining our
dev/test cycles to minutes instead of hours. Why spend hours running
quickstart Ansible scripts when in many cases you can just doit.sh. htt
So like I've repeatedly said, I'm not completely against it as I agree
what we have is not ideal. I'm not -2, I'm -1 pending additional
information. I'm trying to be realistic and reduce our risk for this
cycle. IMHO doit.sh is not acceptable as an undercloud installer and
this is what I've been trying to point out as the actual impact to the
end user who has to use this thing. We have an established
installation method for the undercloud, that while isn't great, isn't
a bash script with git fetches, etc. So as for the implementation,
this is what I want to see properly flushed out prior to accepting
this feature as complete for Queens (and the new default). I would
like to see a plan of what features need to be added (eg. the stuff on
the etherpad), folks assigned to do this work, and estimated
timelines. Given that we shouldn't be making major feature changes
after M2 (~9 weeks), I want to get an understanding of what is
realistically going to make it. If after reviewing the initial
details we find that it's not actually going to make M2, then let's
agree to this now rather than trying to force it in at the end.
I know you've been a great proponent of the containerized undercloud
and I agree it offers a lot more for development efforts. But I just
want to make sure that we are getting all the feedback we can before
continuing down this path. Since, as you point out, a bunch of this
work is already available for consumption by developers, I don't see
making it the new default as a requirement for Queens unless it's a
fully implemented and tested. There's nothing stopping folks from
using it now and making incremental improvements during Queens and we
commit to making it the new default for Rocky.
The point of this cycle was supposed to be more stablization/getting
all the containers in place. Doing something like this seems to go
against what we were actually trying to achieve. I'd rather make
smaller incremental progress with your proposal being the end goal and
agreeing that perhaps Rocky is more realistic for the default cut
Lastly, this isn't just a containers team thing. We've been using the
underclouddeploy architecture across many teams to help develop for
almost an entire cycle now. Huge benefits. I would go as far as saying
that underclouddeploy was the biggest feature in Pike that enabled
us to bang out a majority of the docker/service templates in tripleo-
Given that etherpad
appears to contain a pretty big list of features, are we going to be
able to land all of them by M2? Would it be beneficial to craft a
basic spec related to this to ensure we are not missing additional
I'm not sure there is a lot of value in creating a spec at this point.
We've already got an approved blueprint for the feature in Pike here: h
I think we might get more velocity out of grooming the etherpad and
perhaps dividing this work among the appropriate teams.
That's fine, but I would like to see additional efforts made to
organize this work, assign folks and add proper timelines.
Benefits of this work:
-Alignment: aligning the undercloud and overcloud installers gets
of dual maintenance of services.
I like reusing existing stuff. +1
-Composability: tripleo-heat-templates and our new Ansible
architecture around it are composable. This means any set of
can be used to build up your own undercloud. In other words the
framework here isn't just useful for "underclouds". It is really
ability to deploy Tripleo on a single node with no external
dependencies. Single node TripleO installer. The containers team
already been leveraging existing (experimental) undercloud_deploy
installer to develop services for Pike.
Is this something that is actually being asked for or is this just an
added bonus because it allows developers to reduce what is actually
being deployed for testing?
There is an implied ask for this feature when a new developer starts to
use TripleO. Right now resource bar is quite high for TripleO. You have
to have a multi-node development environment at the very least (one
undercloud node, and one overcloud node). The ideas we are talking
about here short circuits this in many cases... where if you aren't
testing HA services or Ironic you could simple use undercloud_deploy to
test tripleo-heat-template changes on a single VM. Less resources, and
much less time spent learning and waiting.
IMHO I don't think the undercloud install is the limiting factor for
new developers and I'm not sure this is actually reducing that
complexity. It does reduce the amount of hardware needed to develop
some items, but there's a cost in complexity by moving the
configuration to THT which is already where many people struggle. As
I previously mentioned, there's nothing stopping us from promoting the
containerized undercloud as a development tool and ensuring it's full
featured before switching to it as the default at a later date.
-Development: The containerized undercloud is a great development
tool. It utilizes the same framework as the full overcloud
but takes about 20 minutes to deploy. This means faster
less waiting, and more testing. Having this be a first class
in the ecosystem will ensure this platform is functioning for
developers to use all the time.
Seems to go with the previous question about the re-usability for
people who are not developers. Has everyone (including non-container
folks) tried this out and attest that it's a better workflow for
Are there use cases that are made worse by switching?
I would let other chime in but the feedback I've gotten has mostly been
that it improves the dev/test cycle greatly.
-CI resources: better use of CI resources. At the PTG we received
feedback from the OpenStack infrastructure team that our upstream
resource usage is quite high at times (even as high as 50% of the
total). Because of the shared framework and single node
can re-architecture much of our upstream CI matrix around single
We no longer require multinode jobs to be able to test many of the
services in tripleo-heat-templates... we can just use a single
instead. We'll still want multinode undercloud -> overcloud jobs
testing things like HA and baremetal provisioning. But we can cover
large set of the services (in particular many of the new scenario
we added in Pike) with single node CI test runs in much less time.
I like this idea but would like to see more details around this.
Since this is a new feature we need to make sure that we are properly
covering the containerized undercloud with CI as well. I think we
need 3 jobs to properly cover this feature before marking it done. I
added them to the etherpad but I think we need to ensure the
3 jobs are defined and voting by M2 to consider actually switching
from the current instack-undercloud installation to the containerized
1) undercloud-containers - a containerized install, should be voting
2) undercloud-containers-update - minor updates run on containerized
underclouds, should be voting by m2
3) undercloud-containers-upgrade - major upgrade from
non-containerized to containerized undercloud, should be voting by
If we have these jobs, is there anything we can drop or mark as
covered that is currently being covered by an overcloud job?
Can you please comment on these expectations as being achievable? If
they are not achievable, I don't think we can agree to switch the
default for Queens. As we shipped the 'undercloud deploy' as
experimental for Pike, it's well within reason to continue to do so
for Queens. Perhaps we change the labeling to beta or working it into
a --containerized option for 'undercloud install'.
I think my ask for the undercloud-containers job as non-voting by m1
is achievable today because it's currently green (pending any zuul
freezes). My concern is really minor updates and upgrades need to be
understood and accounted for ASAP. If we're truly able to reuse some
of the work we did for O->P upgrades, then these should be fairly
straight forward things to accomplish and there would be fewer
blockers to make the switch.
-Containers: There are no plans to containerize the existing
undercloud work. By moving our undercloud installer to a tripleo-
templates and Ansible architecture we can leverage containers.
Interestingly, the same installer also supports baremetal (package)
installation as well at this point. Like to overcloud however I
making containers our undercloud default would better align the
We are actively working through a few issues with the deployment
framework Ansible effort to fully integrate that into the
installer. We are also reaching out to other teams like the UI and
Security folks to coordinate the efforts around those components.
there are any questions about the effort or you'd like to be
in the implementation let us know. Stay tuned for more specific
as we organize to get as much of this in M1 and M2 as possible.
I would like to see weekly updates on this effort during the IRC
meeting. As previously mentioned around squad status, I'll be asking
for them during the meeting so it would be nice to get an update this
on a weekly basis so we can make sure that we'll be OK to cut over.
Also what does the cut over plan look like? This is something that
might be beneficial to have in a spec. IMHO, I'm ok to continue
pushing the container effort using the openstack undercloud deploy
method for now. Once we have voting CI jobs and the feature list has
been covered then we can evaluate if we've made the M2 time frame to
switching openstack undercloud deploy to be the new undercloud
install. I want to make sure we don't introduce regressions and are
doing thing in a user friendly fashion since the undercloud is the
first intro an end user gets to tripleo. It would be a good idea to
review what the new install process looks like and make sure it "just
works" given that the current process (with all it's flaws) is
fairly trivial to perform.
Basically what I would like to see before making this new default is:
1) minor updates work (with CI)
2) P->Q upgrades work (with CI)
3) Documentation complete
4) no UX impact for installation (eg. how they installed it before is
the same as they install it now for containers)
If these are accounted for and completed before M2 then I would be +2
on the switch.