settingsLogin | Registersettings

Re: [openstack-dev] Upstream LTS Releases

0 votes

Hi!

This is amazing to see this discussed! Looking forward to more details.

On 11/08/2017 12:28 AM, Erik McCormick wrote:

Hello Ops folks,

This morning at the Sydney Summit we had a very well attended and very
productive session about how to go about keeping a selection of past
releases available and maintained for a longer period of time (LTS).

There was agreement in the room that this could be accomplished by
moving the responsibility for those releases from the Stable Branch
team down to those who are already creating and testing patches for
old releases: The distros, deployers, and operators.

The concept, in general, is to create a new set of cores from these
groups, and use 3rd party CI to validate patches. There are lots of
details to be worked out yet, but our amazing UC (User Committee) will
be begin working out the details.

What is the most worrying is the exact "take over" process. Does it mean that
the teams will give away the +2 power to a different team? Or will our (small)
stable teams still be responsible for landing changes? If so, will they have to
learn how to debug 3rd party CI jobs?

Generally, I'm scared of both overloading the teams and losing the control over
quality at the same time :) Probably the final proposal will clarify it..

Please take a look at the Etherpad from the session if you'd like to
see the details. More importantly, if you would like to contribute to
this effort, please add your name to the list starting on line 133.

https://etherpad.openstack.org/p/SYD-forum-upstream-lts-releases

Thanks to everyone who participated!

Cheers,
Erik


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Nov 15, 2017 in openstack-dev by Dmitry_Tantsur (18,080 points)   2 3 7

39 Responses

0 votes

The pressure for #2 comes from the inability to skip upgrades and the fact that upgrades are hugely time consuming still.

If you want to reduce the push for number #2 and help developers get their wish of getting features into users hands sooner, the path to upgrade really needs to be much less painful.

Thanks,
Kevin


From: Erik McCormick [emccormick@cirrusseven.com]
Sent: Tuesday, November 14, 2017 9:21 AM
To: Blair Bethwaite
Cc: OpenStack Development Mailing List (not for usage questions); openstack-oper.
Subject: Re: [openstack-dev] [Openstack-operators] Upstream LTS Releases

On Tue, Nov 14, 2017 at 11:30 AM, Blair Bethwaite
blair.bethwaite@gmail.com wrote:
Hi all - please note this conversation has been split variously across
-dev and -operators.

One small observation from the discussion so far is that it seems as
though there are two issues being discussed under the one banner:
1) maintain old releases for longer
2) do stable releases less frequently

It would be interesting to understand if the people who want longer
maintenance windows would be helped by #2.

I would like to hear from people who do not want #2 and why not.
What are the benefits of 6 months vs. 1 year? I have heard objections
in the hallway track, but I have struggled to retain the rationale for
more than 10 seconds. I think this may be more of a religious
discussion that could take a while though.

1 is something we can act on right now with the eventual goal of

being able to skip releases entirely. We are addressing the
maintenance of the old issue right now. As we get farther down the
road of fast-forward upgrade tooling, then we will be able to please
those wishing for a slower upgrade cadence, and those that want to
stay on the bleeding edge simultaneously.

-Erik

On 14 November 2017 at 09:25, Doug Hellmann doug@doughellmann.com wrote:

Excerpts from Bogdan Dobrelya's message of 2017-11-14 17:08:31 +0100:

The concept, in general, is to create a new set of cores from these
groups, and use 3rd party CI to validate patches. There are lots of
details to be worked out yet, but our amazing UC (User Committee) will
be begin working out the details.

What is the most worrying is the exact "take over" process. Does it mean that
the teams will give away the +2 power to a different team? Or will our (small)
stable teams still be responsible for landing changes? If so, will they have to
learn how to debug 3rd party CI jobs?

Generally, I'm scared of both overloading the teams and losing the control over
quality at the same time :) Probably the final proposal will clarify it..

The quality of backported fixes is expected to be a direct (and only?)
interest of those new teams of new cores, coming from users and
operators and vendors. The more parties to establish their 3rd party

We have an unhealthy focus on "3rd party" jobs in this discussion. We
should not assume that they are needed or will be present. They may be,
but we shouldn't build policy around the assumption that they will. Why
would we have third-party jobs on an old branch that we don't have on
master, for instance?

checking jobs, the better proposed changes communicated, which directly
affects the quality in the end. I also suppose, contributors from ops
world will likely be only struggling to see things getting fixed, and
not new features adopted by legacy deployments they're used to maintain.
So in theory, this works and as a mainstream developer and maintainer,
you need no to fear of losing control over LTS code :)

Another question is how to not block all on each over, and not push
contributors away when things are getting awry, jobs failing and merging
is blocked for a long time, or there is no consensus reached in a code
review. I propose the LTS policy to enforce CI jobs be non-voting, as a
first step on that way, and giving every LTS team member a core rights
maybe? Not sure if that works though.

I'm not sure what change you're proposing for CI jobs and their voting
status. Do you mean we should make the jobs non-voting as soon as the
branch passes out of the stable support period?

Regarding the review team, anyone on the review team for a branch
that goes out of stable support will need to have +2 rights in that
branch. Otherwise there's no point in saying that they're maintaining
the branch.

Doug


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Cheers,
~Blairo


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 14, 2017 by Fox,_Kevin_M (29,360 points)   1 3 4
0 votes

On Tue, Nov 14, 2017 at 6:00 PM, Fox, Kevin M Kevin.Fox@pnnl.gov wrote:
The pressure for #2 comes from the inability to skip upgrades and the fact that upgrades are hugely time consuming still.

If you want to reduce the push for number #2 and help developers get their wish of getting features into users hands sooner, the path to upgrade really needs to be much less painful.

+1000

We are upgrading from Kilo to Mitaka. It took 1 year to plan and
execute the upgrade. (and we skipped a version)
Scheduling all the relevant internal teams is a monumental task
because we don't have dedicated teams for those projects and they have
other priorities.
Upgrading affects a LOT of our systems, some we don't fully have
control over. And it can takes months to get new deployment on those
systems. (and after, we have to test compatibility, of course)

So I guess you can understand my frustration when I'm told to upgrade
more often and that skipping versions is discouraged/unsupported.
At the current pace, I'm just falling behind. I need to skip
versions to keep up.

So for our next upgrades, we plan on skipping even more versions if
the database migration allows it. (except for Nova which is a huge
PITA to be honest due to CellsV1)
I just don't see any other ways to keep up otherwise.

--
Mathieu


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 14, 2017 by mgagne_at_calavera.c (1,700 points)   2 3
0 votes

On 14 Nov 2017, at 15:18, Mathieu Gagné wrote:

On Tue, Nov 14, 2017 at 6:00 PM, Fox, Kevin M Kevin.Fox@pnnl.gov wrote:

The pressure for #2 comes from the inability to skip upgrades and the fact that upgrades are hugely time consuming still.

If you want to reduce the push for number #2 and help developers get their wish of getting features into users hands sooner, the path to upgrade really needs to be much less painful.

+1000

We are upgrading from Kilo to Mitaka. It took 1 year to plan and
execute the upgrade. (and we skipped a version)
Scheduling all the relevant internal teams is a monumental task
because we don't have dedicated teams for those projects and they have
other priorities.
Upgrading affects a LOT of our systems, some we don't fully have
control over. And it can takes months to get new deployment on those
systems. (and after, we have to test compatibility, of course)

So I guess you can understand my frustration when I'm told to upgrade
more often and that skipping versions is discouraged/unsupported.
At the current pace, I'm just falling behind. I need to skip
versions to keep up.

So for our next upgrades, we plan on skipping even more versions if
the database migration allows it. (except for Nova which is a huge
PITA to be honest due to CellsV1)
I just don't see any other ways to keep up otherwise.

?!?!

What does it take for this to never happen again? No operator should need to plan and execute an upgrade for a whole year to upgrade one year's worth of code development.

We don't need new policies, new teams, more releases, fewer releases, or anything like that. The goal is NOT "let's have an LTS release". The goal should be "How do we make sure Mattieu and everyone else in the world can actually deploy and use the software we are writing?"

Can we drop the entire LTS discussion for now and focus on "make upgrades take less than a year" instead? After we solve that, let's come back around to LTS versions, if needed. I know there's already some work around that. Let's focus there and not be distracted about the best bureaucracy for not deleting two-year-old branches.

--John

/me puts on asbestos pants

--
Mathieu


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

responded Nov 14, 2017 by John_Dickinson (10,400 points)   2 4 7
0 votes

For those wondering why operators can’t always upgrade sooner, I can add a little bit of color: In our clouds, we have a couple vendors (one network plugin, one cinder driver) and those vendors typically are 1-3 releases behind ‘cutting edge’. By the time they support the version we want to go to, that version is almost end-of-life, which can make things interesting. On the bright side, by then there are usually some helpful articles out there about the issues upgrading from A to B.

As for the planning time required - for us, it mostly boils down to testing or doing it at a time when some amount of disruption is at least somewhat tolerable. For example, for online retail folks like me, upgrading between October and December would be out of the question due to the busy shopping season that is almost upon us.

I will say that I was very impressed with some of the containerized demos that were given at the Summit last week. I plan to look into some containerized options next year which hopefully could ease the upgrade process for us. Still, there is a lot of testing involved, coordination with 3rd parties, and other stars that would still have to align.

At Overstock we have also started maintaining two completely separate production clouds and have orchestration to build/rebuild VMs on either one as needed. Most the time we spread all our apps across both clouds. So next year when we get the chance to upgrade cloud A, we can either rebuild things on B, or just shut them down while we rebuild A. Then we would repeat on cloud B. Hopefully this eases our upgrade process…at least that’s what we are hoping!

My 2 cents. Thanks

On Nov 14, 2017, at 4:44 PM, John Dickinson me@not.mn wrote:

On 14 Nov 2017, at 15:18, Mathieu Gagné wrote:

On Tue, Nov 14, 2017 at 6:00 PM, Fox, Kevin M Kevin.Fox@pnnl.gov wrote:
The pressure for #2 comes from the inability to skip upgrades and the fact that upgrades are hugely time consuming still.

If you want to reduce the push for number #2 and help developers get their wish of getting features into users hands sooner, the path to upgrade really needs to be much less painful.

+1000

We are upgrading from Kilo to Mitaka. It took 1 year to plan and
execute the upgrade. (and we skipped a version)
Scheduling all the relevant internal teams is a monumental task
because we don't have dedicated teams for those projects and they have
other priorities.
Upgrading affects a LOT of our systems, some we don't fully have
control over. And it can takes months to get new deployment on those
systems. (and after, we have to test compatibility, of course)

So I guess you can understand my frustration when I'm told to upgrade
more often and that skipping versions is discouraged/unsupported.
At the current pace, I'm just falling behind. I need to skip
versions to keep up.

So for our next upgrades, we plan on skipping even more versions if
the database migration allows it. (except for Nova which is a huge
PITA to be honest due to CellsV1)
I just don't see any other ways to keep up otherwise.

?!?!

What does it take for this to never happen again? No operator should need to plan and execute an upgrade for a whole year to upgrade one year's worth of code development.

We don't need new policies, new teams, more releases, fewer releases, or anything like that. The goal is NOT "let's have an LTS release". The goal should be "How do we make sure Mattieu and everyone else in the world can actually deploy and use the software we are writing?"

Can we drop the entire LTS discussion for now and focus on "make upgrades take less than a year" instead? After we solve that, let's come back around to LTS versions, if needed. I know there's already some work around that. Let's focus there and not be distracted about the best bureaucracy for not deleting two-year-old branches.

--John

/me puts on asbestos pants

--
Mathieu


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 15, 2017 by Mike_Smith (2,280 points)   1 4
0 votes

On Tue, Nov 14, 2017 at 6:44 PM, John Dickinson me@not.mn wrote:

On 14 Nov 2017, at 15:18, Mathieu Gagné wrote:

On Tue, Nov 14, 2017 at 6:00 PM, Fox, Kevin M Kevin.Fox@pnnl.gov wrote:

The pressure for #2 comes from the inability to skip upgrades and the fact that upgrades are hugely time consuming still.

If you want to reduce the push for number #2 and help developers get their wish of getting features into users hands sooner, the path to upgrade really needs to be much less painful.

+1000

We are upgrading from Kilo to Mitaka. It took 1 year to plan and
execute the upgrade. (and we skipped a version)
Scheduling all the relevant internal teams is a monumental task
because we don't have dedicated teams for those projects and they have
other priorities.
Upgrading affects a LOT of our systems, some we don't fully have
control over. And it can takes months to get new deployment on those
systems. (and after, we have to test compatibility, of course)

So I guess you can understand my frustration when I'm told to upgrade
more often and that skipping versions is discouraged/unsupported.
At the current pace, I'm just falling behind. I need to skip
versions to keep up.

So for our next upgrades, we plan on skipping even more versions if
the database migration allows it. (except for Nova which is a huge
PITA to be honest due to CellsV1)
I just don't see any other ways to keep up otherwise.

?!?!

What does it take for this to never happen again? No operator should need to plan and execute an upgrade for a whole year to upgrade one year's worth of code development.

We don't need new policies, new teams, more releases, fewer releases, or anything like that. The goal is NOT "let's have an LTS release". The goal should be "How do we make sure Mattieu and everyone else in the world can actually deploy and use the software we are writing?"

Can we drop the entire LTS discussion for now and focus on "make upgrades take less than a year" instead? After we solve that, let's come back around to LTS versions, if needed. I know there's already some work around that. Let's focus there and not be distracted about the best bureaucracy for not deleting two-year-old branches.

To add details to what happened:
* Upgrade was never made a #1 priority. It was a one man show for far
too long. (myself)
* I also happen to manage and work on other priorities.
* Lot of work made to prepare for multiple versions support in our
deployment tools. (we use Puppet)
* Lot of work in the packaging area to speedup packaging. (we are
still using deb packages but with virtualenv to stay Puppet
compatible)
* We need to forward-port private patches which upstream won't accept
and/or are private business logic.
* Our developer teams didn't have enough free cycles to work right
away on the upgrade. (this means delays)
* We need to test compatibility with 3rd party systems which takes
some time. (and make them compatible)
* We need to update systems ever which we don't have full control.
This means serious delays when it comes to deployment.
* We need to test features/stability during some time in our dev environment.
* We need to test features/stability during some time in our
staging/pre-prod environment.
* We need to announce and inform our users at least 2 weeks in advance
before performing an upgrade.
* We choose to upgrade one service at a time (in all regions) to avoid
a huge big bang upgrade. (this means more maintenance windows to plan
and you can't stack them too much)
* We need to swiftly respond to bug discovered by our users. This
means change of priorities and delay in other service upgrades.
* We will soon need to upgrade operating systems to support latest
OpenStack versions. (this means we have to stop OpenStack upgrades
until all nodes are upgraded)

All those details rapidly add up. We are far far away from a git pull
&& ./stack.sh

I don't want to sound like too harsh but I feel some people live in a
vacuum or an ideal world far from the reality of some operators.
The above details are just a very small glimpse into my reality. I
hope people will understand and have a different perspectives when it
comes to upgrades.

--
Mathieu


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 15, 2017 by mgagne_at_calavera.c (1,700 points)   2 3
0 votes

On Wed, Nov 15, 2017 at 10:44 AM, John Dickinson me@not.mn wrote:

On 14 Nov 2017, at 15:18, Mathieu Gagné wrote:

On Tue, Nov 14, 2017 at 6:00 PM, Fox, Kevin M Kevin.Fox@pnnl.gov wrote:

The pressure for #2 comes from the inability to skip upgrades and the fact that upgrades are hugely time consuming still.

If you want to reduce the push for number #2 and help developers get their wish of getting features into users hands sooner, the path to upgrade really needs to be much less painful.

+1000

We are upgrading from Kilo to Mitaka. It took 1 year to plan and
execute the upgrade. (and we skipped a version)
Scheduling all the relevant internal teams is a monumental task
because we don't have dedicated teams for those projects and they have
other priorities.
Upgrading affects a LOT of our systems, some we don't fully have
control over. And it can takes months to get new deployment on those
systems. (and after, we have to test compatibility, of course)

So I guess you can understand my frustration when I'm told to upgrade
more often and that skipping versions is discouraged/unsupported.
At the current pace, I'm just falling behind. I need to skip
versions to keep up.

So for our next upgrades, we plan on skipping even more versions if
the database migration allows it. (except for Nova which is a huge
PITA to be honest due to CellsV1)
I just don't see any other ways to keep up otherwise.

?!?!

What does it take for this to never happen again? No operator should need to plan and execute an upgrade for a whole year to upgrade one year's worth of code development.

We don't need new policies, new teams, more releases, fewer releases, or anything like that. The goal is NOT "let's have an LTS release". The goal should be "How do we make sure Mattieu and everyone else in the world can actually deploy and use the software we are writing?"

Can we drop the entire LTS discussion for now and focus on "make upgrades take less than a year" instead? After we solve that, let's come back around to LTS versions, if needed. I know there's already some work around that. Let's focus there and not be distracted about the best bureaucracy for not deleting two-year-old branches.

--John

John,

So... Any concrete ideas on how to achieve that?

Thanks,
Dims

/me puts on asbestos pants

--
Mathieu


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Davanum Srinivas :: https://twitter.com/dims


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 15, 2017 by Davanum_Srinivas (35,920 points)   2 4 8
0 votes

On Tue, Nov 14, 2017 at 6:44 PM, John Dickinson me@not.mn wrote:

On 14 Nov 2017, at 15:18, Mathieu Gagné wrote:

On Tue, Nov 14, 2017 at 6:00 PM, Fox, Kevin M Kevin.Fox@pnnl.gov wrote:

The pressure for #2 comes from the inability to skip upgrades and the fact that upgrades are hugely time consuming still.

If you want to reduce the push for number #2 and help developers get their wish of getting features into users hands sooner, the path to upgrade really needs to be much less painful.

+1000

We are upgrading from Kilo to Mitaka. It took 1 year to plan and
execute the upgrade. (and we skipped a version)
Scheduling all the relevant internal teams is a monumental task
because we don't have dedicated teams for those projects and they have
other priorities.
Upgrading affects a LOT of our systems, some we don't fully have
control over. And it can takes months to get new deployment on those
systems. (and after, we have to test compatibility, of course)

So I guess you can understand my frustration when I'm told to upgrade
more often and that skipping versions is discouraged/unsupported.
At the current pace, I'm just falling behind. I need to skip
versions to keep up.

So for our next upgrades, we plan on skipping even more versions if
the database migration allows it. (except for Nova which is a huge
PITA to be honest due to CellsV1)
I just don't see any other ways to keep up otherwise.

?!?!

What does it take for this to never happen again? No operator should need to plan and execute an upgrade for a whole year to upgrade one year's worth of code development.

We don't need new policies, new teams, more releases, fewer releases, or anything like that. The goal is NOT "let's have an LTS release". The goal should be "How do we make sure Mattieu and everyone else in the world can actually deploy and use the software we are writing?"

Can we drop the entire LTS discussion for now and focus on "make upgrades take less than a year" instead? After we solve that, let's come back around to LTS versions, if needed. I know there's already some work around that. Let's focus there and not be distracted about the best bureaucracy for not deleting two-year-old branches.

--John

/me puts on asbestos pants

OK, let's tone down the flamethrower there a bit Mr. Asbestos Pants
;). The LTS push is not lieu of the quest for simpler upgrades. There
is also an effort to enable fast-forward upgrades going on. However,
this is a non-trivial task that will take many cycles to get to a
point where it's truly what you're looking for. The long term desire
of having LTS releases encompasses being able to hop from one LTS to
the next without stopping over. We just aren't there yet.

However, what we can do is make it so when mgagne finally gets to
Newton (or Ocata or wherever) on his next run, the code isn't
completely EOL and it can still receive some important patches. This
can be accomplished in the very near term, and that is what a certain
subset of us are focused on.

We still desire to skip versions. We still desire to have upgrades be
non-disruptive and non-destructive. This is just one step on the way
to that. This discussion has been going on for cycle after cycle with
little more than angst between ops and devs to show for it. This is
the first time we've had progress on this ball of goo that really
matters. Let's all be proactive contributors to the solution.

Those interested in having a say in the policy, put your $0.02 here:
https://etherpad.openstack.org/p/LTS-proposal

Peace, Love, and International Grooviness,
Erik

--
Mathieu


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 15, 2017 by Erik_McCormick (3,880 points)   2 4
0 votes

On Tue, Nov 14, 2017 at 4:10 PM, Rochelle Grober
rochelle.grober@huawei.com wrote:
Folks,

This discussion and the people interested in it seem like a perfect application of the SIG process. By turning LTS into a SIG, everyone can discuss the issues on the SIG mailing list and the discussion shouldn't end up split. If it turns into a project, great. If a solution is found that doesn't need a new project, great. Even once there is a decision on how to move forward, there will still be implementation issues and enhancements, so the SIG could very well be long-lived. But the important aspect of this is: keeping the discussion in a place where both devs and ops can follow the whole thing and act on recommendations.

Food for thought.

--Rocky

Just to add more legs to the spider that is this thread: I think the
SIG idea is a good one. It may evolve into a project team some day,
but for now it's a free-for-all polluting 2 mailing lists, and
multiple etherpads. How do we go about creating one?

-Erik

-----Original Message-----
From: Blair Bethwaite [mailto:blair.bethwaite@gmail.com]
Sent: Tuesday, November 14, 2017 8:31 AM
To: OpenStack Development Mailing List (not for usage questions)
openstack-dev@lists.openstack.org; openstack-oper.
Subject: Re: [openstack-dev] Upstream LTS Releases

Hi all - please note this conversation has been split variously across -dev and -
operators.

One small observation from the discussion so far is that it seems as though
there are two issues being discussed under the one banner:
1) maintain old releases for longer
2) do stable releases less frequently

It would be interesting to understand if the people who want longer
maintenance windows would be helped by #2.

On 14 November 2017 at 09:25, Doug Hellmann doug@doughellmann.com
wrote:

Excerpts from Bogdan Dobrelya's message of 2017-11-14 17:08:31 +0100:

The concept, in general, is to create a new set of cores from
these groups, and use 3rd party CI to validate patches. There are
lots of details to be worked out yet, but our amazing UC (User
Committee) will be begin working out the details.

What is the most worrying is the exact "take over" process. Does it
mean that the teams will give away the +2 power to a different
team? Or will our (small) stable teams still be responsible for
landing changes? If so, will they have to learn how to debug 3rd party CI
jobs?

Generally, I'm scared of both overloading the teams and losing the
control over quality at the same time :) Probably the final proposal will
clarify it..

The quality of backported fixes is expected to be a direct (and
only?) interest of those new teams of new cores, coming from users
and operators and vendors. The more parties to establish their 3rd
party

We have an unhealthy focus on "3rd party" jobs in this discussion. We
should not assume that they are needed or will be present. They may
be, but we shouldn't build policy around the assumption that they
will. Why would we have third-party jobs on an old branch that we
don't have on master, for instance?

checking jobs, the better proposed changes communicated, which
directly affects the quality in the end. I also suppose, contributors
from ops world will likely be only struggling to see things getting
fixed, and not new features adopted by legacy deployments they're used
to maintain.
So in theory, this works and as a mainstream developer and
maintainer, you need no to fear of losing control over LTS code :)

Another question is how to not block all on each over, and not push
contributors away when things are getting awry, jobs failing and
merging is blocked for a long time, or there is no consensus reached
in a code review. I propose the LTS policy to enforce CI jobs be
non-voting, as a first step on that way, and giving every LTS team
member a core rights maybe? Not sure if that works though.

I'm not sure what change you're proposing for CI jobs and their voting
status. Do you mean we should make the jobs non-voting as soon as the
branch passes out of the stable support period?

Regarding the review team, anyone on the review team for a branch that
goes out of stable support will need to have +2 rights in that branch.
Otherwise there's no point in saying that they're maintaining the
branch.

Doug



____ OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Cheers,
~Blairo



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 15, 2017 by Erik_McCormick (3,880 points)   2 4
0 votes

I can think of a few ideas, though some sound painful on paper.... Not really recommending anything, just thinking out loud...

One idea is that at the root of chaos monkey. If something is hard, do it frequently. If upgrading is hard, we need to be doing it constantly so the pain gets largely eliminated. One idea would be to discourage the use of standing up a fresh devstack all the time by devs and have them upgrade them instead. If its hard, then its likely someone will chip in to make it less hard.

Another is devstack in general. the tooling used by devs and that used by ops are so different as to isolate the devs from ops' pain. If they used more opsish tooling, then they would hit the same issues and would be more likely to find solutions that work for both parties.

A third one is supporting multiple version upgrades in the gate. I rarely have a problem with a cloud has a database one version back. I have seen lots of issues with databases that contain data back when the cloud was instantiated and then upgraded multiple times.

Another option is trying to unify/detangle the upgrade procedure. upgrading compute kit should be one or two commands if you can live with the defaults. Not weeks of poring through release notes, finding correct orders from pages of text and testing vigorously on test systems.

How about some tool that does the: dump database to somewhere temporary, iterate over all the upgrade job components, and see if it will successfully not corrupt your database. That takes a while to do manually. Ideally it could even upload stacktraces back a bug tracker for attention.

Thanks,
Kevin


From: Davanum Srinivas [davanum@gmail.com]
Sent: Tuesday, November 14, 2017 4:08 PM
To: OpenStack Development Mailing List (not for usage questions)
Cc: openstack-oper.
Subject: Re: [openstack-dev] [Openstack-operators] Upstream LTS Releases

On Wed, Nov 15, 2017 at 10:44 AM, John Dickinson me@not.mn wrote:

On 14 Nov 2017, at 15:18, Mathieu Gagné wrote:

On Tue, Nov 14, 2017 at 6:00 PM, Fox, Kevin M Kevin.Fox@pnnl.gov wrote:

The pressure for #2 comes from the inability to skip upgrades and the fact that upgrades are hugely time consuming still.

If you want to reduce the push for number #2 and help developers get their wish of getting features into users hands sooner, the path to upgrade really needs to be much less painful.

+1000

We are upgrading from Kilo to Mitaka. It took 1 year to plan and
execute the upgrade. (and we skipped a version)
Scheduling all the relevant internal teams is a monumental task
because we don't have dedicated teams for those projects and they have
other priorities.
Upgrading affects a LOT of our systems, some we don't fully have
control over. And it can takes months to get new deployment on those
systems. (and after, we have to test compatibility, of course)

So I guess you can understand my frustration when I'm told to upgrade
more often and that skipping versions is discouraged/unsupported.
At the current pace, I'm just falling behind. I need to skip
versions to keep up.

So for our next upgrades, we plan on skipping even more versions if
the database migration allows it. (except for Nova which is a huge
PITA to be honest due to CellsV1)
I just don't see any other ways to keep up otherwise.

?!?!

What does it take for this to never happen again? No operator should need to plan and execute an upgrade for a whole year to upgrade one year's worth of code development.

We don't need new policies, new teams, more releases, fewer releases, or anything like that. The goal is NOT "let's have an LTS release". The goal should be "How do we make sure Mattieu and everyone else in the world can actually deploy and use the software we are writing?"

Can we drop the entire LTS discussion for now and focus on "make upgrades take less than a year" instead? After we solve that, let's come back around to LTS versions, if needed. I know there's already some work around that. Let's focus there and not be distracted about the best bureaucracy for not deleting two-year-old branches.

--John

John,

So... Any concrete ideas on how to achieve that?

Thanks,
Dims

/me puts on asbestos pants

--
Mathieu


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Davanum Srinivas :: https://twitter.com/dims


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 15, 2017 by Fox,_Kevin_M (29,360 points)   1 3 4
0 votes

On 14 Nov 2017, at 16:08, Davanum Srinivas wrote:

On Wed, Nov 15, 2017 at 10:44 AM, John Dickinson me@not.mn wrote:

On 14 Nov 2017, at 15:18, Mathieu Gagné wrote:

On Tue, Nov 14, 2017 at 6:00 PM, Fox, Kevin M Kevin.Fox@pnnl.gov wrote:

The pressure for #2 comes from the inability to skip upgrades and the fact that upgrades are hugely time consuming still.

If you want to reduce the push for number #2 and help developers get their wish of getting features into users hands sooner, the path to upgrade really needs to be much less painful.

+1000

We are upgrading from Kilo to Mitaka. It took 1 year to plan and
execute the upgrade. (and we skipped a version)
Scheduling all the relevant internal teams is a monumental task
because we don't have dedicated teams for those projects and they have
other priorities.
Upgrading affects a LOT of our systems, some we don't fully have
control over. And it can takes months to get new deployment on those
systems. (and after, we have to test compatibility, of course)

So I guess you can understand my frustration when I'm told to upgrade
more often and that skipping versions is discouraged/unsupported.
At the current pace, I'm just falling behind. I need to skip
versions to keep up.

So for our next upgrades, we plan on skipping even more versions if
the database migration allows it. (except for Nova which is a huge
PITA to be honest due to CellsV1)
I just don't see any other ways to keep up otherwise.

?!?!

What does it take for this to never happen again? No operator should need to plan and execute an upgrade for a whole year to upgrade one year's worth of code development.

We don't need new policies, new teams, more releases, fewer releases, or anything like that. The goal is NOT "let's have an LTS release". The goal should be "How do we make sure Mattieu and everyone else in the world can actually deploy and use the software we are writing?"

Can we drop the entire LTS discussion for now and focus on "make upgrades take less than a year" instead? After we solve that, let's come back around to LTS versions, if needed. I know there's already some work around that. Let's focus there and not be distracted about the best bureaucracy for not deleting two-year-old branches.

--John

John,

So... Any concrete ideas on how to achieve that?

Thanks,
Dims

Depends on what the upgrade problems are. I'd think the project teams that can't currently do seamless or skip-level upgrades would know best about what's needed. I suspect there will be both small and large changes needed in some projects.

Mathieu's list of realities in a different reply seem very normal. Operators are responsible for more than just OpenStack projects, and they've got to coordinate changes in deployed OpenStack projects with other systems they are running. Working through that list of realities could help identify some areas of improvement.

Spitballing process ideas...
* use a singular tag in launchpad to track upgrade stories. better yet, report on the status of these across all openstack projects so anyone can see what's needed to get to a smooth upgrade
* redouble efforts on multi-node and rolling upgrade testing. make sure every project is using it
* make smooth (and skip-level) upgrades a cross-project goal and don't set others until that one is achieved
* add upgrade stories and tests to the interop tests
* allocate time for ops to specifically talk about upgrade stories at the PTG. make sure as many devs are in the room as possible.
* add your cell phone number to the project README so that any operator can call you as soon as they try to upgrade (perhaps not 100% serious)
* add testing infrastructure that is locked to distro-provided versions of dependencies (eg install on xenial with only apt or install on rhel 7 with only yum)
* only do one openstack release a year. keep N-2 releases around. give ops a chance to upgrade before we delete branches
* do an openstack release every month. severely compress the release cycle and force everything to work with disparate versions. this will drive good testing, strong, stable interfaces, and smooth upgrades

Ah, just saw Kevin's reply in a different message. I really like his idea of "use ops tooling for day-to-day dev work. stop using devstack".

Ultimately it will come down to typing in some code and merging it into a project. I do not know what's needed there. It's probably different for every project.

--John

/me puts on asbestos pants

--
Mathieu


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Davanum Srinivas :: https://twitter.com/dims


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

responded Nov 15, 2017 by John_Dickinson (10,400 points)   2 4 7
...