settingsLogin | Registersettings

[openstack-dev] [grenade] future direction on partial upgrade support

0 votes

Back when Nova first wanted to test partial upgrade, we did a bunch of
slightly odd conditionals inside of grenade and devstack to make it so
that if you were very careful, you could just not stop some of the old
services on a single node, upgrade everything else, and as long as the
old services didn't stop, they'd be running cached code in memory, and
it would look a bit like a 2 node worker not upgraded model. It worked,
but it was weird.

There has been some interest by the Nova team to expand what's not being
touched, as well as the Neutron team to add partial upgrade testing
support. Both are great initiatives, but I think going about it the old
way is going to add a lot of complexity in weird places, and not be as
good of a test as we really want.

Nodepool now supports allocating multiple nodes. We have a multinode job
in Nova regularly testing live migration using this.

If we slice this problem differently, I think we get a better
architecture, a much easier way to add new configs, and a much more
realistic end test.

Conceptually, use devstack-gate multinode support to set up 2 nodes, an
all in one, and a worker. Let grenade upgrade the all in one, leave the
worker alone.

I think the only complexity here is the fact that grenade.sh implicitly
drives stack.sh. Which means one of:

1) devstack-gate could build the worker first, then run grenade.sh

2) we make it so grenade.sh can execute in parts more easily, so it can
hand something else running stack.sh for it.'

3) we make grenade understand the subnode for partial upgrade, so it
will run the stack phase on the subnode itself (given credentials).

This kind of approach means deciding which services you don't want to
upgrade doesn't require devstack changes, it's just a change of the
services on the worker.

We need a volunteer for taking this on, but I think all the follow on
partial upgrade support will be much much easier to do after we have
this kind of mechanism in place.

-Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Jun 16, 2015 in openstack-dev by Sean_Dague (66,200 points)   4 8 14

23 Responses

0 votes

On Wed, Jun 24, 2015 at 11:03 AM, Sean Dague sean@dague.net wrote:

On 06/24/2015 01:41 PM, Russell Bryant wrote:

On 06/24/2015 01:31 PM, Joe Gordon wrote:

On Tue, Jun 16, 2015 at 9:58 AM, Sean Dague <sean@dague.net
sean@dague.net> wrote:

Back when Nova first wanted to test partial upgrade, we did a bunch

of
slightly odd conditionals inside of grenade and devstack to make it
so
that if you were very careful, you could just not stop some of the
old
services on a single node, upgrade everything else, and as long as
the
old services didn't stop, they'd be running cached code in memory,
and
it would look a bit like a 2 node worker not upgraded model. It
worked,
but it was weird.

There has been some interest by the Nova team to expand what's not

being
touched, as well as the Neutron team to add partial upgrade testing
support. Both are great initiatives, but I think going about it the
old
way is going to add a lot of complexity in weird places, and not be
as
good of a test as we really want.

Nodepool now supports allocating multiple nodes. We have a

multinode job
in Nova regularly testing live migration using this.

If we slice this problem differently, I think we get a better
architecture, a much easier way to add new configs, and a much more
realistic end test.

Conceptually, use devstack-gate multinode support to set up 2

nodes, an
all in one, and a worker. Let grenade upgrade the all in one, leave
the
worker alone.

I think the only complexity here is the fact that grenade.sh

implicitly
drives stack.sh. Which means one of:

1) devstack-gate could build the worker first, then run grenade.sh

2) we make it so grenade.sh can execute in parts more easily, so it

can
hand something else running stack.sh for it.'

3) we make grenade understand the subnode for partial upgrade, so it
will run the stack phase on the subnode itself (given credentials).

This kind of approach means deciding which services you don't want

to
upgrade doesn't require devstack changes, it's just a change of the
services on the worker.

We need a volunteer for taking this on, but I think all the follow

on
partial upgrade support will be much much easier to do after we have
this kind of mechanism in place.

I think this is a great approach for the future of partial upgrade
support in grenade. I would like to point out step 0 here, is to get
tempest passing consistently in multinode.

Currently the neutron job is failing consistently, and nova-network
fails roughly 10% of the time due
to https://bugs.launchpad.net/nova/+bug/1462305
and https://bugs.launchpad.net/nova/+bug/1445569

If multi-node isn't reliable more generally yet, do you think the
simpler implementation of partial-upgrade testing could proceed? I've
already done all of the patches to do it for Neutron. That way we could
quickly get something in place to help block regressions and work on the
longer-term multinode refactoring without as much time pressure.

The thing is, these partial service bits are sneaker than one realizes
over time. There have been all kinds of edge conditions that crept up on
the n-cpu one that are really subtle because code is running in memory
on stale versions of dependencies which are no longer on disk. And the
number of people that have this model in their head is basically down to
a SPOF.

I agree, As the author of the current multinode job it is definitely a ugly
hack (but one that has worked surprisingly well until now).

The fact that neutron-grenade is at a 40% fail rate right now (and has
been for over a week) is not preventing anyone from just rechecking to
get past it. So I think assuming additional failing grenade tests are
going to keep folks from landing bugs is probably not a good assumption.
Making the whole path more complicated for other people to debug is an
explosion waiting to happen.

So I do want to take a hard line on doing this right, because the debt
here is higher than you might think. The partial code was always very
conceptually fragile, and fails in really funny ways some times, because
of the fact that old is not isolated from new in a way that would be
expected.

Assuming the smoke jobs work, I don't think making grenade do mulitnode
should take very long. In which case we get a much more realistic upgrade
situation.

I -1ed the n-net partial upgrade changes for the same reason.

    -Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 24, 2015 by Joe_Gordon (24,620 points)   2 5 8
0 votes

On Wed, Jun 24, 2015 at 11:44 AM, Joe Gordon joe.gordon0@gmail.com wrote:

On Wed, Jun 24, 2015 at 11:03 AM, Sean Dague sean@dague.net wrote:

On 06/24/2015 01:41 PM, Russell Bryant wrote:

On 06/24/2015 01:31 PM, Joe Gordon wrote:

On Tue, Jun 16, 2015 at 9:58 AM, Sean Dague <sean@dague.net
sean@dague.net> wrote:

Back when Nova first wanted to test partial upgrade, we did a

bunch of
slightly odd conditionals inside of grenade and devstack to make
it so
that if you were very careful, you could just not stop some of the
old
services on a single node, upgrade everything else, and as long as
the
old services didn't stop, they'd be running cached code in memory,
and
it would look a bit like a 2 node worker not upgraded model. It
worked,
but it was weird.

There has been some interest by the Nova team to expand what's not

being
touched, as well as the Neutron team to add partial upgrade testing
support. Both are great initiatives, but I think going about it
the old
way is going to add a lot of complexity in weird places, and not
be as
good of a test as we really want.

Nodepool now supports allocating multiple nodes. We have a

multinode job
in Nova regularly testing live migration using this.

If we slice this problem differently, I think we get a better
architecture, a much easier way to add new configs, and a much more
realistic end test.

Conceptually, use devstack-gate multinode support to set up 2

nodes, an
all in one, and a worker. Let grenade upgrade the all in one,
leave the
worker alone.

I think the only complexity here is the fact that grenade.sh

implicitly
drives stack.sh. Which means one of:

1) devstack-gate could build the worker first, then run grenade.sh

2) we make it so grenade.sh can execute in parts more easily, so

it can
hand something else running stack.sh for it.'

3) we make grenade understand the subnode for partial upgrade, so

it
will run the stack phase on the subnode itself (given credentials).

This kind of approach means deciding which services you don't want

to
upgrade doesn't require devstack changes, it's just a change of the
services on the worker.

We need a volunteer for taking this on, but I think all the follow

on
partial upgrade support will be much much easier to do after we
have
this kind of mechanism in place.

I think this is a great approach for the future of partial upgrade
support in grenade. I would like to point out step 0 here, is to get
tempest passing consistently in multinode.

Currently the neutron job is failing consistently, and nova-network
fails roughly 10% of the time due
to https://bugs.launchpad.net/nova/+bug/1462305
and https://bugs.launchpad.net/nova/+bug/1445569

If multi-node isn't reliable more generally yet, do you think the
simpler implementation of partial-upgrade testing could proceed? I've
already done all of the patches to do it for Neutron. That way we could
quickly get something in place to help block regressions and work on the
longer-term multinode refactoring without as much time pressure.

The thing is, these partial service bits are sneaker than one realizes
over time. There have been all kinds of edge conditions that crept up on
the n-cpu one that are really subtle because code is running in memory
on stale versions of dependencies which are no longer on disk. And the
number of people that have this model in their head is basically down to
a SPOF.

I agree, As the author of the current multinode job it is definitely a
ugly hack (but one that has worked surprisingly well until now).

The fact that neutron-grenade is at a 40% fail rate right now (and has
been for over a week) is not preventing anyone from just rechecking to
get past it. So I think assuming additional failing grenade tests are
going to keep folks from landing bugs is probably not a good assumption.
Making the whole path more complicated for other people to debug is an
explosion waiting to happen.

So I do want to take a hard line on doing this right, because the debt
here is higher than you might think. The partial code was always very
conceptually fragile, and fails in really funny ways some times, because
of the fact that old is not isolated from new in a way that would be
expected.

Assuming the smoke jobs work, I don't think making grenade do mulitnode
should take very long. In which case we get a much more realistic upgrade
situation.

Good news, it looks like both smoke jobs are working (ignoring failures
from https://review.openstack.org/#/c/195748/).

I -1ed the n-net partial upgrade changes for the same reason.

    -Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 26, 2015 by Joe_Gordon (24,620 points)   2 5 8
0 votes

No

On Fri, Jun 26, 2015 at 10:15 AM, Joe Gordon joe.gordon0@gmail.com wrote:

On Wed, Jun 24, 2015 at 11:44 AM, Joe Gordon joe.gordon0@gmail.com
wrote:

On Wed, Jun 24, 2015 at 11:03 AM, Sean Dague sean@dague.net wrote:

On 06/24/2015 01:41 PM, Russell Bryant wrote:

On 06/24/2015 01:31 PM, Joe Gordon wrote:

On Tue, Jun 16, 2015 at 9:58 AM, Sean Dague <sean@dague.net
sean@dague.net> wrote:

Back when Nova first wanted to test partial upgrade, we did a

bunch of
slightly odd conditionals inside of grenade and devstack to make
it so
that if you were very careful, you could just not stop some of
the old
services on a single node, upgrade everything else, and as long
as the
old services didn't stop, they'd be running cached code in
memory, and
it would look a bit like a 2 node worker not upgraded model. It
worked,
but it was weird.

There has been some interest by the Nova team to expand what's

not being
touched, as well as the Neutron team to add partial upgrade
testing
support. Both are great initiatives, but I think going about it
the old
way is going to add a lot of complexity in weird places, and not
be as
good of a test as we really want.

Nodepool now supports allocating multiple nodes. We have a

multinode job
in Nova regularly testing live migration using this.

If we slice this problem differently, I think we get a better
architecture, a much easier way to add new configs, and a much

more
realistic end test.

Conceptually, use devstack-gate multinode support to set up 2

nodes, an
all in one, and a worker. Let grenade upgrade the all in one,
leave the
worker alone.

I think the only complexity here is the fact that grenade.sh

implicitly
drives stack.sh. Which means one of:

1) devstack-gate could build the worker first, then run grenade.sh

2) we make it so grenade.sh can execute in parts more easily, so

it can
hand something else running stack.sh for it.'

3) we make grenade understand the subnode for partial upgrade, so

it
will run the stack phase on the subnode itself (given
credentials).

This kind of approach means deciding which services you don't

want to
upgrade doesn't require devstack changes, it's just a change of
the
services on the worker.

We need a volunteer for taking this on, but I think all the

follow on
partial upgrade support will be much much easier to do after we
have
this kind of mechanism in place.

I think this is a great approach for the future of partial upgrade
support in grenade. I would like to point out step 0 here, is to get
tempest passing consistently in multinode.

Currently the neutron job is failing consistently, and nova-network
fails roughly 10% of the time due
to https://bugs.launchpad.net/nova/+bug/1462305
and https://bugs.launchpad.net/nova/+bug/1445569

If multi-node isn't reliable more generally yet, do you think the
simpler implementation of partial-upgrade testing could proceed? I've
already done all of the patches to do it for Neutron. That way we
could
quickly get something in place to help block regressions and work on
the
longer-term multinode refactoring without as much time pressure.

The thing is, these partial service bits are sneaker than one realizes
over time. There have been all kinds of edge conditions that crept up on
the n-cpu one that are really subtle because code is running in memory
on stale versions of dependencies which are no longer on disk. And the
number of people that have this model in their head is basically down to
a SPOF.

I agree, As the author of the current multinode job it is definitely a
ugly hack (but one that has worked surprisingly well until now).

The fact that neutron-grenade is at a 40% fail rate right now (and has
been for over a week) is not preventing anyone from just rechecking to
get past it. So I think assuming additional failing grenade tests are
going to keep folks from landing bugs is probably not a good assumption.
Making the whole path more complicated for other people to debug is an
explosion waiting to happen.

So I do want to take a hard line on doing this right, because the debt
here is higher than you might think. The partial code was always very
conceptually fragile, and fails in really funny ways some times, because
of the fact that old is not isolated from new in a way that would be
expected.

Assuming the smoke jobs work, I don't think making grenade do mulitnode
should take very long. In which case we get a much more realistic upgrade
situation.

Good news, it looks like both smoke jobs are working (ignoring failures
from https://review.openstack.org/#/c/195748/).

So next step is to teach grenade to do multinode.

I -1ed the n-net partial upgrade changes for the same reason.

    -Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 26, 2015 by Joe_Gordon (24,620 points)   2 5 8
0 votes

Hi,

Not sure if we reached any conclusion with this thread, and I would like to
resume it so that we don't derail the initial plan set forth by Russell and
agreed during the Liberty summit, among other things.

If I look at the thread I think this can be summarized as follow. Please
correct me if I am wrong:

  1. There is a desire for making Grenade more modular by relying on
    multi-node support. This is beneficial for all the projects that aim at
    testing partial upgrades.
  2. There are a number of steps required to achieve 1. The work required
    is not overly complicated, but it requires some discipline and good
    understanding of the overall OpenStack machine to get it to completion.
  3. Should this effort be given priority, it can impact stuff that is
    currently in flight, like the patches from Russell on Neutron partial
    upgrade, and Dan on improvements for nova-net upgrades.
  4. With minor tweaks single-node Grenade can be useful in the interim,
    while everything gets ported over a more robust multi-node Grenade job
    configuration.

Have we identified a volunteer for activity 1? For what I can tell, Joe was
kind to set the infra to start gathering data on the reliability of the
multi-node jobs, but they are clearly flaky [1], and currently broken. I
have seen nothing else. If I am mistaken, please fill me in.

Now, in terms of a resolution for this, would it be fair to say that until
we get 1) bootstrapped, Russell and Dan's efforts are a low-hanging fruit
worth taking? I would personally think so: after all patches [2,3,4] seem
trivial enough:

  • they don't add much complexity
  • they are fairly self-contained, and
  • can be easily swept away with the other grenade 'odd conditionals' in
    the context of 1.

Thoughts?

Thanks,
Armando

[1] http://goo.gl/NPkeZh
[2] https://review.openstack.org/#/q/topic:partial-neutron-upgrade,n,z
[3] https://review.openstack.org/#/q/topic:neutron-agent-control,n,z
[4] https://review.openstack.org/#/c/189478/

On 26 June 2015 at 15:54, Joe Gordon joe.gordon0@gmail.com wrote:

No

On Fri, Jun 26, 2015 at 10:15 AM, Joe Gordon joe.gordon0@gmail.com
wrote:

On Wed, Jun 24, 2015 at 11:44 AM, Joe Gordon joe.gordon0@gmail.com
wrote:

On Wed, Jun 24, 2015 at 11:03 AM, Sean Dague sean@dague.net wrote:

On 06/24/2015 01:41 PM, Russell Bryant wrote:

On 06/24/2015 01:31 PM, Joe Gordon wrote:

On Tue, Jun 16, 2015 at 9:58 AM, Sean Dague <sean@dague.net
sean@dague.net> wrote:

Back when Nova first wanted to test partial upgrade, we did a

bunch of
slightly odd conditionals inside of grenade and devstack to make
it so
that if you were very careful, you could just not stop some of
the old
services on a single node, upgrade everything else, and as long
as the
old services didn't stop, they'd be running cached code in
memory, and
it would look a bit like a 2 node worker not upgraded model. It
worked,
but it was weird.

There has been some interest by the Nova team to expand what's

not being
touched, as well as the Neutron team to add partial upgrade
testing
support. Both are great initiatives, but I think going about it
the old
way is going to add a lot of complexity in weird places, and not
be as
good of a test as we really want.

Nodepool now supports allocating multiple nodes. We have a

multinode job
in Nova regularly testing live migration using this.

If we slice this problem differently, I think we get a better
architecture, a much easier way to add new configs, and a much

more
realistic end test.

Conceptually, use devstack-gate multinode support to set up 2

nodes, an
all in one, and a worker. Let grenade upgrade the all in one,
leave the
worker alone.

I think the only complexity here is the fact that grenade.sh

implicitly
drives stack.sh. Which means one of:

1) devstack-gate could build the worker first, then run

grenade.sh

2) we make it so grenade.sh can execute in parts more easily, so

it can
hand something else running stack.sh for it.'

3) we make grenade understand the subnode for partial upgrade,

so it
will run the stack phase on the subnode itself (given
credentials).

This kind of approach means deciding which services you don't

want to
upgrade doesn't require devstack changes, it's just a change of
the
services on the worker.

We need a volunteer for taking this on, but I think all the

follow on
partial upgrade support will be much much easier to do after we
have
this kind of mechanism in place.

I think this is a great approach for the future of partial upgrade
support in grenade. I would like to point out step 0 here, is to get
tempest passing consistently in multinode.

Currently the neutron job is failing consistently, and nova-network
fails roughly 10% of the time due
to https://bugs.launchpad.net/nova/+bug/1462305
and https://bugs.launchpad.net/nova/+bug/1445569

If multi-node isn't reliable more generally yet, do you think the
simpler implementation of partial-upgrade testing could proceed? I've
already done all of the patches to do it for Neutron. That way we
could
quickly get something in place to help block regressions and work on
the
longer-term multinode refactoring without as much time pressure.

The thing is, these partial service bits are sneaker than one realizes
over time. There have been all kinds of edge conditions that crept up on
the n-cpu one that are really subtle because code is running in memory
on stale versions of dependencies which are no longer on disk. And the
number of people that have this model in their head is basically down to
a SPOF.

I agree, As the author of the current multinode job it is definitely a
ugly hack (but one that has worked surprisingly well until now).

The fact that neutron-grenade is at a 40% fail rate right now (and has
been for over a week) is not preventing anyone from just rechecking to
get past it. So I think assuming additional failing grenade tests are
going to keep folks from landing bugs is probably not a good assumption.
Making the whole path more complicated for other people to debug is an
explosion waiting to happen.

So I do want to take a hard line on doing this right, because the debt
here is higher than you might think. The partial code was always very
conceptually fragile, and fails in really funny ways some times, because
of the fact that old is not isolated from new in a way that would be
expected.

Assuming the smoke jobs work, I don't think making grenade do mulitnode
should take very long. In which case we get a much more realistic upgrade
situation.

Good news, it looks like both smoke jobs are working (ignoring failures
from https://review.openstack.org/#/c/195748/).

So next step is to teach grenade to do multinode.

I -1ed the n-net partial upgrade changes for the same reason.

    -Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 6, 2015 by Armando_M. (23,560 points)   2 4 7
0 votes

On 2015-07-06 11:54:45 -0700 (-0700), Armando M. wrote:
[...]
For what I can tell, Joe was kind to set the infra to start
gathering data on the reliability of the multi-node jobs, but they
are clearly flaky [1], and currently broken.
[...]

Well, a check-.* pass rate of 25% is likely explained by running
against proposed bad changes (after all these are running in the
check pipeline, not the gate). The recent 100% failure we think will
be fixed with a new release of glean incorporating
https://review.openstack.org/198576 since we recently started
exceeding the 64-byte HOSTNAMEMAX on our test platforms.
--
Jeremy Stanley


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 6, 2015 by Jeremy_Stanley (56,700 points)   3 5 7
0 votes

On 6 July 2015 at 13:13, Jeremy Stanley fungi@yuggoth.org wrote:

On 2015-07-06 11:54:45 -0700 (-0700), Armando M. wrote:
[...]

For what I can tell, Joe was kind to set the infra to start
gathering data on the reliability of the multi-node jobs, but they
are clearly flaky [1], and currently broken.
[...]

Well, a check-.* pass rate of 25% is likely explained by running
against proposed bad changes (after all these are running in the
check pipeline, not the gate). The recent 100% failure we think will
be fixed with a new release of glean incorporating
https://review.openstack.org/198576 since we recently started
exceeding the 64-byte HOSTNAMEMAX on our test platforms.

Thanks for the heads-up, Jeremy. That said, the rate is still remarkably
higher as a like-for-like comparison. I don't think we have a way to
compare the rate on the gate pipeline, if I am not mistaken, but that's
besides the point of my attempt at reviving this discussion.

Cheers,
Armando

--
Jeremy Stanley


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 6, 2015 by Armando_M. (23,560 points)   2 4 7
0 votes

I'd also like to chime in - we've had some discussions on -infra today
about the partial upgrade issue, and collected the following notes on an
etherpad.

https://etherpad.openstack.org/p/neutron-partial-upgrades

One of the things identified, was the complexity of the DVR feature in
Neutron, and an attempt to simplify the partial upgrade job by not
enabling the DVR feature.

http://eavesdrop.openstack.org/meetings/networking/2015/networking.2015-07-06-21.00.log.html

Clark Boylan has proposed a patch to create a new job that runs on
multiple nodes, but does not have DVR enabled, in the hopes that having
less moving parts will allow the multinode grenade work to continue on a
parallel track, with bugfixes or additional work on the Neutron DVR
feature not blocking the overall effort. The idea would be eventually to
enable DVR, once we have taken care of all the other work that needs to
be done.

https://review.openstack.org/#/c/198906/

What is everyone's thoughts?

--
Sean M. Collins


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 6, 2015 by Sean_M._Collins (11,480 points)   3 7 9
0 votes

Thanks Sean, comments inline.

On 6 July 2015 at 16:58, Sean M. Collins sean@coreitpro.com wrote:

I'd also like to chime in - we've had some discussions on -infra today
about the partial upgrade issue, and collected the following notes on an
etherpad.

https://etherpad.openstack.org/p/neutron-partial-upgrades

One of the things identified, was the complexity of the DVR feature in
Neutron, and an attempt to simplify the partial upgrade job by not
enabling the DVR feature.

The DVR issue is entirely orthogonal to this, but I am willing to play
along.

http://eavesdrop.openstack.org/meetings/networking/2015/networking.2015-07-06-21.00.log.html

Clark Boylan has proposed a patch to create a new job that runs on
multiple nodes, but does not have DVR enabled, in the hopes that having
less moving parts will allow the multinode grenade work to continue on a
parallel track,

Who is leading the Grenade effort? Is it Clark?

with bugfixes or additional work on the Neutron DVR
feature not blocking the overall effort. The idea would be eventually to
enable DVR, once we have taken care of all the other work that needs to
be done.

https://review.openstack.org/#/c/198906/

I thought that's what Joe did in [1]. Am I barking up at the wrong tree?

What is everyone's thoughts?

[1] https://review.openstack.org/#/c/195259/

--
Sean M. Collins


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 7, 2015 by Armando_M. (23,560 points)   2 4 7
0 votes

On 07/06/2015 09:02 PM, Armando M. wrote:
Thanks Sean, comments inline.

On 6 July 2015 at 16:58, Sean M. Collins sean@coreitpro.com wrote:

I'd also like to chime in - we've had some discussions on -infra today
about the partial upgrade issue, and collected the following notes on an
etherpad.

https://etherpad.openstack.org/p/neutron-partial-upgrades

One of the things identified, was the complexity of the DVR feature in
Neutron, and an attempt to simplify the partial upgrade job by not
enabling the DVR feature.

The DVR issue is entirely orthogonal to this, but I am willing to play
along.

http://eavesdrop.openstack.org/meetings/networking/2015/networking.2015-07-06-21.00.log.html

Clark Boylan has proposed a patch to create a new job that runs on
multiple nodes, but does not have DVR enabled, in the hopes that having
less moving parts will allow the multinode grenade work to continue on a
parallel track,

Who is leading the Grenade effort? Is it Clark?

Actually in terms of who stirred the pot, it's me.

There were too many people talking in too small of groups for me to
stand aside any longer. The grenade job looked like it was going to
continue to get blocked without everyone understanding all the factors
so I wanted to have folks have a discussion.

with bugfixes or additional work on the Neutron DVR
feature not blocking the overall effort. The idea would be eventually to
enable DVR, once we have taken care of all the other work that needs to
be done.

https://review.openstack.org/#/c/198906/

I thought that's what Joe did in [1]. Am I barking up at the wrong tree?

Joe did smoke tests in that patch yes. Clark's patch just takes a job
that was already running full tempest and turned it into two tests one
with dvr on and one with dvr off. That is all Clark's patch did.

Thanks,
Anita.

What is everyone's thoughts?

[1] https://review.openstack.org/#/c/195259/

--
Sean M. Collins


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 7, 2015 by Anita_Kuno (21,320 points)   3 3 4
0 votes

On 07/06/2015 09:31 PM, Anita Kuno wrote:
On 07/06/2015 09:02 PM, Armando M. wrote:

Thanks Sean, comments inline.

On 6 July 2015 at 16:58, Sean M. Collins sean@coreitpro.com wrote:

I'd also like to chime in - we've had some discussions on -infra today
about the partial upgrade issue, and collected the following notes on an
etherpad.

https://etherpad.openstack.org/p/neutron-partial-upgrades

One of the things identified, was the complexity of the DVR feature in
Neutron, and an attempt to simplify the partial upgrade job by not
enabling the DVR feature.

The DVR issue is entirely orthogonal to this, but I am willing to play
along.

http://eavesdrop.openstack.org/meetings/networking/2015/networking.2015-07-06-21.00.log.html

Clark Boylan has proposed a patch to create a new job that runs on
multiple nodes, but does not have DVR enabled, in the hopes that having
less moving parts will allow the multinode grenade work to continue on a
parallel track,

Who is leading the Grenade effort? Is it Clark?

Actually in terms of who stirred the pot, it's me.

There were too many people talking in too small of groups for me to
stand aside any longer. The grenade job looked like it was going to
continue to get blocked without everyone understanding all the factors
so I wanted to have folks have a discussion.

Was out last week, so still catching up on some of this. Thanks Anita
for stirring the pot.

I've got a POC approach proposed in the following 3 patches to do
partial testing in multinode via a post-stack.sh script in grenade (a
way to tell grenade to do another thing after the base stack call is done).

The grenade change - https://review.openstack.org/#/c/199073/

The devstack-gate change that would put subnode setup into post-stack.sh
- https://review.openstack.org/#/c/199091/

And the project-config change to make this experimental on devstack-gate
and grenade is here - https://review.openstack.org/#/c/199103/

The first job I created here was a nova-net one, because I know enough
about the paths, and the partial upgrade story on nova (which has been
voting for a year), to know that all bugs introduced here are probably
my own. But if we can get that working, and the nova partial job moved
over, I think expanding it to arbitrary configs is probably pretty simple.

Assistance ploughing through on this direction would be appreciated.

-Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 8, 2015 by Sean_Dague (66,200 points)   4 8 14
...