settingsLogin | Registersettings

Re: [openstack-dev] Upstream LTS Releases

0 votes

Hi!

This is amazing to see this discussed! Looking forward to more details.

On 11/08/2017 12:28 AM, Erik McCormick wrote:

Hello Ops folks,

This morning at the Sydney Summit we had a very well attended and very
productive session about how to go about keeping a selection of past
releases available and maintained for a longer period of time (LTS).

There was agreement in the room that this could be accomplished by
moving the responsibility for those releases from the Stable Branch
team down to those who are already creating and testing patches for
old releases: The distros, deployers, and operators.

The concept, in general, is to create a new set of cores from these
groups, and use 3rd party CI to validate patches. There are lots of
details to be worked out yet, but our amazing UC (User Committee) will
be begin working out the details.

What is the most worrying is the exact "take over" process. Does it mean that
the teams will give away the +2 power to a different team? Or will our (small)
stable teams still be responsible for landing changes? If so, will they have to
learn how to debug 3rd party CI jobs?

Generally, I'm scared of both overloading the teams and losing the control over
quality at the same time :) Probably the final proposal will clarify it..

Please take a look at the Etherpad from the session if you'd like to
see the details. More importantly, if you would like to contribute to
this effort, please add your name to the list starting on line 133.

https://etherpad.openstack.org/p/SYD-forum-upstream-lts-releases

Thanks to everyone who participated!

Cheers,
Erik


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Nov 15, 2017 in openstack-dev by Dmitry_Tantsur (18,080 points)   2 3 4

39 Responses

0 votes

On 11/14/2017 11:17 PM, Doug Hellmann wrote:
Excerpts from Chris Friesen's message of 2017-11-14 15:50:08 -0600:

On 11/14/2017 02:10 PM, Doug Hellmann wrote:

Excerpts from Chris Friesen's message of 2017-11-14 14:01:58 -0600:

On 11/14/2017 01:28 PM, Dmitry Tantsur wrote:

The quality of backported fixes is expected to be a direct (and only?)
interest of those new teams of new cores, coming from users and operators and
vendors.

I'm not assuming bad intentions, not at all. But there is a lot of involved in a
decision whether to make a backport or not. Will these people be able to
evaluate a risk of each patch? Do they have enough context on how that release
was implemented and what can break? Do they understand why feature backports are
bad? Why they should not skip (supported) releases when backporting?

I know a lot of very reasonable people who do not understand the things above
really well.

I would hope that the core team for upstream LTS would be the (hopefully
experienced) people doing the downstream work that already happens within the
various distros.

Chris

Presumably those are the same people we've been trying to convince
to work on the existing stable branches for the last 5 years. What
makes these extended branches more appealing to those people than
the existing branches? Is it the reduced requirements on maintaining
test jobs? Or maybe some other policy change that could be applied
to the stable branches?

For what it's worth, we often lag more than 6 months behind master and so some
of the things we backport wouldn't be allowed by the existing stable branch
support phases. (ie aren't "critical" or security patches.)

Chris

We should include a review of some of those policies as part of
this discussion. It would seem very odd to have a fix land in master,
not make it into the stable branches, and then show up in a branch
following the LTS policy.

Actually, I'm strongly against skipping any supported (not EOL-ed) branches when
backporting patches. This includes normal stable branches, as well as other LTS
branches.

The main reason, if somebody wonders, is to avoid regressions for people
upgrading from a version with a backport to a version without.

Doug


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 15, 2017 by Dmitry_Tantsur (18,080 points)   2 3 4
0 votes

On 11/14/2017 09:01 PM, Chris Friesen wrote:
On 11/14/2017 01:28 PM, Dmitry Tantsur wrote:

The quality of backported fixes is expected to be a direct (and only?)
interest of those new teams of new cores, coming from users and operators and
vendors.

I'm not assuming bad intentions, not at all. But there is a lot of involved in a
decision whether to make a backport or not. Will these people be able to
evaluate a risk of each patch? Do they have enough context on how that release
was implemented and what can break? Do they understand why feature backports are
bad? Why they should not skip (supported) releases when backporting?

I know a lot of very reasonable people who do not understand the things above
really well.

I would hope that the core team for upstream LTS would be the (hopefully
experienced) people doing the downstream work that already happens within the
various distros.

Sure, but policies may vary. People do make feature backports downsteam from
time to time ;)

Chris


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 15, 2017 by Dmitry_Tantsur (18,080 points)   2 3 4
0 votes

Thank you Mathieu for the insights!

To add details to what happened:
* Upgrade was never made a #1 priority. It was a one man show for far
too long. (myself)

I suppose that confirms that upgrades is very nice to have in production
deployments, eventually, maybe... (please read below to continue)

  • I also happen to manage and work on other priorities.
  • Lot of work made to prepare for multiple versions support in our
    deployment tools. (we use Puppet)
  • Lot of work in the packaging area to speedup packaging. (we are
    still using deb packages but with virtualenv to stay Puppet
    compatible)
  • We need to forward-port private patches which upstream won't accept
    and/or are private business logic.

... yet long time maintaining and landing fixes is the ops' reality
and pain #1. And upgrades are only pain #2. LTS can not directly help
with #2, but only indirectly, if the vendors' downstream teams could
better cooperate with #1 and have more time and resources to dedicate
for #2, upgrades stories for shipped products and distros.

Let's please to not lower the real value of LTS branches and not
substitute #1 with #2. This topic is not about bureaucracy and policies,
it is about how could the community help vendors to cooperate over
maintaining of commodity things, with as less bureaucracy as possible,
to ease the operators pains in the end.

  • Our developer teams didn't have enough free cycles to work right
    away on the upgrade. (this means delays)
  • We need to test compatibility with 3rd party systems which takes
    some time. (and make them compatible)

This confirms perhaps why it is vital to only run 3rd party CI jobs for
LTS branches?

  • We need to update systems ever which we don't have full control.
    This means serious delays when it comes to deployment.
  • We need to test features/stability during some time in our dev environment.
  • We need to test features/stability during some time in our
    staging/pre-prod environment.
  • We need to announce and inform our users at least 2 weeks in advance
    before performing an upgrade.
  • We choose to upgrade one service at a time (in all regions) to avoid
    a huge big bang upgrade. (this means more maintenance windows to plan
    and you can't stack them too much)
  • We need to swiftly respond to bug discovered by our users. This
    means change of priorities and delay in other service upgrades.
  • We will soon need to upgrade operating systems to support latest
    OpenStack versions. (this means we have to stop OpenStack upgrades
    until all nodes are upgraded)

It seems that the answer to the question sounded, "Why upgrades are so
painful and take so much time for ops?" is "as upgrades are not the
priority. Long Time Support and maintenance are".

--
Best regards,
Bogdan Dobrelya,
Irc #bogdando


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 15, 2017 by bdobreli_at_redhat.c (2,260 points)   1 2
0 votes

Rochelle Grober wrote:
Folks,

This discussion and the people interested in it seem like a perfect application of the SIG process. By turning LTS into a SIG, everyone can discuss the issues on the SIG mailing list and the discussion shouldn't end up split. If it turns into a project, great. If a solution is found that doesn't need a new project, great. Even once there is a decision on how to move forward, there will still be implementation issues and enhancements, so the SIG could very well be long-lived. But the important aspect of this is: keeping the discussion in a place where both devs and ops can follow the whole thing and act on recommendations.

That's an excellent suggestion, Rocky.

Moving the discussion to a SIG around LTS / longer-support / post-EOL
support would also be a great way to form a team to work on that.

Yes, there is a one-time pain involved with subscribing to the -sigs ML,
but I'd say that it's a good idea anyway, and this minimal friction
might reduce the discussion to people that might actually help with
setting something up.

So join:
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-sigs

While I'm not sure that's the best name for it, as suggested by Rocky
let's use [lts] as a prefix there.

I'll start a couple of threads.

--
Thierry Carrez (ttx)


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 15, 2017 by Thierry_Carrez (57,480 points)   3 7 12
0 votes

I suggested by Rocky, I moved the discussion to the -sigs list by
posting my promised summary of the session at:

http://lists.openstack.org/pipermail/openstack-sigs/2017-November/000148.html

Please continue the discussion there, to avoid the cross-posting.

If you haven't already, please subscribe at:
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-sigs

--
Thierry Carrez (ttx)


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 15, 2017 by Thierry_Carrez (57,480 points)   3 7 12
0 votes

On 2017-11-15 00:37:26 +0000 (+0000), Fox, Kevin M wrote:
[...]
One idea is that at the root of chaos monkey. If something is
hard, do it frequently. If upgrading is hard, we need to be doing
it constantly so the pain gets largely eliminated. One idea would
be to discourage the use of standing up a fresh devstack all the
time by devs and have them upgrade them instead. If its hard, then
its likely someone will chip in to make it less hard.

This is also the idea behind running grenade in CI. The previous
OpenStack release is deployed, an attempt at a representative (if
small) dataset is loaded into it, and then it is upgraded to the
release under development with the proposed change applied and
exercised to make sure the original resources built under the
earlier release are still in working order. We can certainly do more
to make this a better representation of "The Real World" within the
resource constraints of our continuous integration, but we do at
least have a framework in place to attempt it.

Another is devstack in general. the tooling used by devs and that
used by ops are so different as to isolate the devs from ops'
pain. If they used more opsish tooling, then they would hit the
same issues and would be more likely to find solutions that work
for both parties.

Keep in mind that DevStack was developed to have a quick framework
anyone could use to locally deploy an all-in-one OpenStack from
source. It was not actually developed for CI automation, to the
extent that we developed a separate wrapper project to make DevStack
usable within our CI (the now somewhat archaically-named
devstack-gate project). It's certainly possible to replace that with
a more mainstream deployment tool, I think, so long as it maintains
the primary qualities we rely on: 1. rapid deployment, 2. can work
on a single system with fairly limited resources, 3. can deploy from
source and incorporate proposed patches, 4. pluggable/extensible so
that new services can be easily integrated even before they're
officially released.

A third one is supporting multiple version upgrades in the gate. I
rarely have a problem with a cloud has a database one version
back. I have seen lots of issues with databases that contain data
back when the cloud was instantiated and then upgraded multiple
times.

I believe this will be necessary anyway if we want to officially
support so-called "fast forward" upgrades, since anything that's not
tested is assumed to be (and in fact usually is) broken.

Another option is trying to unify/detangle the upgrade procedure.
upgrading compute kit should be one or two commands if you can
live with the defaults. Not weeks of poring through release notes,
finding correct orders from pages of text and testing vigorously
on test systems.

This also sounds like a defect in our current upgrade testing, if
we're somehow embedding upgrade automation in our testing without
providing the same tools to easily perform those steps in production
upgrades.

How about some tool that does the: dump database to somewhere
temporary, iterate over all the upgrade job components, and see if
it will successfully not corrupt your database. That takes a while
to do manually. Ideally it could even upload stacktraces back a
bug tracker for attention.

Without a clearer definition of "successfully not corrupt your
database" suitable for automated checking, I don't see how this one
is realistic. Do we have a database validation tool now? If we do,
is it deficient in some way? If we don't, what specifically should
it be checking? Seems like something we would also want to run at
the end of all our upgrade tests too.
--
Jeremy Stanley


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

responded Nov 15, 2017 by Jeremy_Stanley (56,700 points)   3 4 7
0 votes

Excerpts from Fox, Kevin M's message of 2017-11-15 00:37:26 +0000:

I can think of a few ideas, though some sound painful on paper.... Not really recommending anything, just thinking out loud...

One idea is that at the root of chaos monkey. If something is hard, do it frequently. If upgrading is hard, we need to be doing it constantly so the pain gets largely eliminated. One idea would be to discourage the use of standing up a fresh devstack all the time by devs and have them upgrade them instead. If its hard, then its likely someone will chip in to make it less hard.

Another is devstack in general. the tooling used by devs and that used by ops are so different as to isolate the devs from ops' pain. If they used more opsish tooling, then they would hit the same issues and would be more likely to find solutions that work for both parties.

A third one is supporting multiple version upgrades in the gate. I rarely have a problem with a cloud has a database one version back. I have seen lots of issues with databases that contain data back when the cloud was instantiated and then upgraded multiple times.

Another option is trying to unify/detangle the upgrade procedure. upgrading compute kit should be one or two commands if you can live with the defaults. Not weeks of poring through release notes, finding correct orders from pages of text and testing vigorously on test systems.

This sounds like an opportunity for some knowledge sharing. Maybe when
the Operators' Guide makes it into the wiki?

How about some tool that does the: dump database to somewhere temporary, iterate over all the upgrade job components, and see if it will successfully not corrupt your database. That takes a while to do manually. Ideally it could even upload stacktraces back a bug tracker for attention.

Thanks,
Kevin


From: Davanum Srinivas [davanum@gmail.com]
Sent: Tuesday, November 14, 2017 4:08 PM
To: OpenStack Development Mailing List (not for usage questions)
Cc: openstack-oper.
Subject: Re: [openstack-dev] [Openstack-operators] Upstream LTS Releases

On Wed, Nov 15, 2017 at 10:44 AM, John Dickinson me@not.mn wrote:

On 14 Nov 2017, at 15:18, Mathieu Gagné wrote:

On Tue, Nov 14, 2017 at 6:00 PM, Fox, Kevin M Kevin.Fox@pnnl.gov wrote:

The pressure for #2 comes from the inability to skip upgrades and the fact that upgrades are hugely time consuming still.

If you want to reduce the push for number #2 and help developers get their wish of getting features into users hands sooner, the path to upgrade really needs to be much less painful.

+1000

We are upgrading from Kilo to Mitaka. It took 1 year to plan and
execute the upgrade. (and we skipped a version)
Scheduling all the relevant internal teams is a monumental task
because we don't have dedicated teams for those projects and they have
other priorities.
Upgrading affects a LOT of our systems, some we don't fully have
control over. And it can takes months to get new deployment on those
systems. (and after, we have to test compatibility, of course)

So I guess you can understand my frustration when I'm told to upgrade
more often and that skipping versions is discouraged/unsupported.
At the current pace, I'm just falling behind. I need to skip
versions to keep up.

So for our next upgrades, we plan on skipping even more versions if
the database migration allows it. (except for Nova which is a huge
PITA to be honest due to CellsV1)
I just don't see any other ways to keep up otherwise.

?!?!

What does it take for this to never happen again? No operator should need to plan and execute an upgrade for a whole year to upgrade one year's worth of code development.

We don't need new policies, new teams, more releases, fewer releases, or anything like that. The goal is NOT "let's have an LTS release". The goal should be "How do we make sure Mattieu and everyone else in the world can actually deploy and use the software we are writing?"

Can we drop the entire LTS discussion for now and focus on "make upgrades take less than a year" instead? After we solve that, let's come back around to LTS versions, if needed. I know there's already some work around that. Let's focus there and not be distracted about the best bureaucracy for not deleting two-year-old branches.

--John

John,

So... Any concrete ideas on how to achieve that?

Thanks,
Dims

/me puts on asbestos pants

--
Mathieu


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 15, 2017 by Doug_Hellmann (87,520 points)   3 4 4
0 votes

On 14/11/17 15:10 -0500, Doug Hellmann wrote:
Excerpts from Chris Friesen's message of 2017-11-14 14:01:58 -0600:

On 11/14/2017 01:28 PM, Dmitry Tantsur wrote:

The quality of backported fixes is expected to be a direct (and only?)
interest of those new teams of new cores, coming from users and operators and
vendors.

I'm not assuming bad intentions, not at all. But there is a lot of involved in a
decision whether to make a backport or not. Will these people be able to
evaluate a risk of each patch? Do they have enough context on how that release
was implemented and what can break? Do they understand why feature backports are
bad? Why they should not skip (supported) releases when backporting?

I know a lot of very reasonable people who do not understand the things above
really well.

I would hope that the core team for upstream LTS would be the (hopefully
experienced) people doing the downstream work that already happens within the
various distros.

Chris

Presumably those are the same people we've been trying to convince
to work on the existing stable branches for the last 5 years. What
makes these extended branches more appealing to those people than
the existing branches? Is it the reduced requirements on maintaining
test jobs? Or maybe some other policy change that could be applied
to the stable branches?

Guessing based on the feedback so far, I would say that these branches are more
appealing because they are the ones these folks are actually running in
production.

Flavio

--
@flaper87
Flavio Percoco


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

responded Nov 15, 2017 by Flavio_Percoco (36,960 points)   3 6 9
0 votes

Some clarifications below.

On Wed, Nov 15, 2017 at 4:52 AM, Bogdan Dobrelya bdobreli@redhat.com wrote:
Thank you Mathieu for the insights!

To add details to what happened:
* Upgrade was never made a #1 priority. It was a one man show for far
too long. (myself)

I suppose that confirms that upgrades is very nice to have in production
deployments, eventually, maybe... (please read below to continue)

  • I also happen to manage and work on other priorities.
  • Lot of work made to prepare for multiple versions support in our
    deployment tools. (we use Puppet)
  • Lot of work in the packaging area to speedup packaging. (we are
    still using deb packages but with virtualenv to stay Puppet
    compatible)
  • We need to forward-port private patches which upstream won't accept
    and/or are private business logic.

... yet long time maintaining and landing fixes is the ops' reality and
pain #1. And upgrades are only pain #2. LTS can not directly help with #2,
but only indirectly, if the vendors' downstream teams could better cooperate
with #1 and have more time and resources to dedicate for #2, upgrades
stories for shipped products and distros.

We do not have a vendor. (anymore, if you consider Ubuntu
cloud-archive as a vendor)
We package and deploy ourselves.

Let's please to not lower the real value of LTS branches and not substitute

1 with #2. This topic is not about bureaucracy and policies, it is about

how could the community help vendors to cooperate over maintaining of
commodity things, with as less bureaucracy as possible, to ease the
operators pains in the end.

  • Our developer teams didn't have enough free cycles to work right
    away on the upgrade. (this means delays)
  • We need to test compatibility with 3rd party systems which takes
    some time. (and make them compatible)

This confirms perhaps why it is vital to only run 3rd party CI jobs for LTS
branches?

For us, 3rd party systems are internal systems outside our control or
realm of influence.
They are often in-house systems that the outside world would care very
little about.

  • We need to update systems ever which we don't have full control.
    This means serious delays when it comes to deployment.
  • We need to test features/stability during some time in our dev
    environment.
  • We need to test features/stability during some time in our
    staging/pre-prod environment.
  • We need to announce and inform our users at least 2 weeks in advance
    before performing an upgrade.
  • We choose to upgrade one service at a time (in all regions) to avoid
    a huge big bang upgrade. (this means more maintenance windows to plan
    and you can't stack them too much)
  • We need to swiftly respond to bug discovered by our users. This
    means change of priorities and delay in other service upgrades.
  • We will soon need to upgrade operating systems to support latest
    OpenStack versions. (this means we have to stop OpenStack upgrades
    until all nodes are upgraded)

It seems that the answer to the question sounded, "Why upgrades are so
painful and take so much time for ops?" is "as upgrades are not the
priority. Long Time Support and maintenance are".

The cost of performing an upgrading is both time and resources
consuming which are both limited.
And you need to sync the world around you to make it happen. It's not
a one man decision/task.

When you remove all the external factors, dependencies, politics,
etc., upgrading can take an afternoon from A to Z from some projects.
We do have an internal cloud for our developers that lives in a
vacuum. Let me tell you that it's not very easy upgrade it. We are
talking about hours/days, not years.

So if I can only afford to upgrade once per year, what are my options?

--
Mathieu


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 15, 2017 by mgagne_at_calavera.c (1,700 points)   3
...