settingsLogin | Registersettings

[openstack-dev] [grenade] future direction on partial upgrade support

0 votes

Back when Nova first wanted to test partial upgrade, we did a bunch of
slightly odd conditionals inside of grenade and devstack to make it so
that if you were very careful, you could just not stop some of the old
services on a single node, upgrade everything else, and as long as the
old services didn't stop, they'd be running cached code in memory, and
it would look a bit like a 2 node worker not upgraded model. It worked,
but it was weird.

There has been some interest by the Nova team to expand what's not being
touched, as well as the Neutron team to add partial upgrade testing
support. Both are great initiatives, but I think going about it the old
way is going to add a lot of complexity in weird places, and not be as
good of a test as we really want.

Nodepool now supports allocating multiple nodes. We have a multinode job
in Nova regularly testing live migration using this.

If we slice this problem differently, I think we get a better
architecture, a much easier way to add new configs, and a much more
realistic end test.

Conceptually, use devstack-gate multinode support to set up 2 nodes, an
all in one, and a worker. Let grenade upgrade the all in one, leave the
worker alone.

I think the only complexity here is the fact that grenade.sh implicitly
drives stack.sh. Which means one of:

1) devstack-gate could build the worker first, then run grenade.sh

2) we make it so grenade.sh can execute in parts more easily, so it can
hand something else running stack.sh for it.'

3) we make grenade understand the subnode for partial upgrade, so it
will run the stack phase on the subnode itself (given credentials).

This kind of approach means deciding which services you don't want to
upgrade doesn't require devstack changes, it's just a change of the
services on the worker.

We need a volunteer for taking this on, but I think all the follow on
partial upgrade support will be much much easier to do after we have
this kind of mechanism in place.

-Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Jun 16, 2015 in openstack-dev by Sean_Dague (66,200 points)   4 8 14

23 Responses

0 votes

On 2015-06-16 12:58:18 -0400 (-0400), Sean Dague wrote:
[...]
I think the only complexity here is the fact that grenade.sh
implicitly drives stack.sh. Which means one of:

1) devstack-gate could build the worker first, then run grenade.sh

2) we make it so grenade.sh can execute in parts more easily, so
it can hand something else running stack.sh for it.'

3) we make grenade understand the subnode for partial upgrade, so
it will run the stack phase on the subnode itself (given
credentials).
[...]

As a point of reference, have a look at Clark's change which
introduced Ansible for driving commands on arbitrary systems in a
devstack-gate based job:

https://review.openstack.org/172614

The idea is that you wrap all relevant commands in calls to ansible,
and then the only additional logic you need to abstract out is the
decision of which node(s) you want running those commands. It
generalizes fine to a single-node solution so that you don't need to
maintain separate multi-node-vs-single-node frameworks.
--
Jeremy Stanley


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 16, 2015 by Jeremy_Stanley (56,700 points)   3 5 7
0 votes

On Tue, Jun 16, 2015 at 9:58 AM, Sean Dague sean@dague.net wrote:

Back when Nova first wanted to test partial upgrade, we did a bunch of
slightly odd conditionals inside of grenade and devstack to make it so
that if you were very careful, you could just not stop some of the old
services on a single node, upgrade everything else, and as long as the
old services didn't stop, they'd be running cached code in memory, and
it would look a bit like a 2 node worker not upgraded model. It worked,
but it was weird.

There has been some interest by the Nova team to expand what's not being
touched, as well as the Neutron team to add partial upgrade testing
support. Both are great initiatives, but I think going about it the old
way is going to add a lot of complexity in weird places, and not be as
good of a test as we really want.

Nodepool now supports allocating multiple nodes. We have a multinode job
in Nova regularly testing live migration using this.

If we slice this problem differently, I think we get a better
architecture, a much easier way to add new configs, and a much more
realistic end test.

Conceptually, use devstack-gate multinode support to set up 2 nodes, an
all in one, and a worker. Let grenade upgrade the all in one, leave the
worker alone.

I think the only complexity here is the fact that grenade.sh implicitly
drives stack.sh. Which means one of:

1) devstack-gate could build the worker first, then run grenade.sh

2) we make it so grenade.sh can execute in parts more easily, so it can
hand something else running stack.sh for it.'

3) we make grenade understand the subnode for partial upgrade, so it
will run the stack phase on the subnode itself (given credentials).

This kind of approach means deciding which services you don't want to
upgrade doesn't require devstack changes, it's just a change of the
services on the worker.

We need a volunteer for taking this on, but I think all the follow on
partial upgrade support will be much much easier to do after we have
this kind of mechanism in place.

I think this is a great approach for the future of partial upgrade support
in grenade. I would like to point out step 0 here, is to get tempest
passing consistently in multinode.

Currently the neutron job is failing consistently, and nova-network fails
roughly 10% of the time due to https://bugs.launchpad.net/nova/+bug/1462305
and https://bugs.launchpad.net/nova/+bug/1445569

    -Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 24, 2015 by Joe_Gordon (24,620 points)   2 5 8
0 votes

On 06/24/2015 01:31 PM, Joe Gordon wrote:

On Tue, Jun 16, 2015 at 9:58 AM, Sean Dague <sean@dague.net
sean@dague.net> wrote:

Back when Nova first wanted to test partial upgrade, we did a bunch of
slightly odd conditionals inside of grenade and devstack to make it so
that if you were very careful, you could just not stop some of the old
services on a single node, upgrade everything else, and as long as the
old services didn't stop, they'd be running cached code in memory, and
it would look a bit like a 2 node worker not upgraded model. It worked,
but it was weird.

There has been some interest by the Nova team to expand what's not being
touched, as well as the Neutron team to add partial upgrade testing
support. Both are great initiatives, but I think going about it the old
way is going to add a lot of complexity in weird places, and not be as
good of a test as we really want.

Nodepool now supports allocating multiple nodes. We have a multinode job
in Nova regularly testing live migration using this.

If we slice this problem differently, I think we get a better
architecture, a much easier way to add new configs, and a much more
realistic end test.

Conceptually, use devstack-gate multinode support to set up 2 nodes, an
all in one, and a worker. Let grenade upgrade the all in one, leave the
worker alone.

I think the only complexity here is the fact that grenade.sh implicitly
drives stack.sh. Which means one of:

1) devstack-gate could build the worker first, then run grenade.sh

2) we make it so grenade.sh can execute in parts more easily, so it can
hand something else running stack.sh for it.'

3) we make grenade understand the subnode for partial upgrade, so it
will run the stack phase on the subnode itself (given credentials).

This kind of approach means deciding which services you don't want to
upgrade doesn't require devstack changes, it's just a change of the
services on the worker.

We need a volunteer for taking this on, but I think all the follow on
partial upgrade support will be much much easier to do after we have
this kind of mechanism in place.

I think this is a great approach for the future of partial upgrade
support in grenade. I would like to point out step 0 here, is to get
tempest passing consistently in multinode.

Currently the neutron job is failing consistently, and nova-network
fails roughly 10% of the time due
to https://bugs.launchpad.net/nova/+bug/1462305
and https://bugs.launchpad.net/nova/+bug/1445569

If multi-node isn't reliable more generally yet, do you think the
simpler implementation of partial-upgrade testing could proceed? I've
already done all of the patches to do it for Neutron. That way we could
quickly get something in place to help block regressions and work on the
longer-term multinode refactoring without as much time pressure.

For reference, the Neutron related partial-upgrade test related patches are:

devstack patches:
https://review.openstack.org/189408
https://review.openstack.org/189707
https://review.openstack.org/189710

grenade patches:
https://review.openstack.org/189417
https://review.openstack.org/189712

devstack-gate patches:
https://review.openstack.org/189424
https://review.openstack.org/189715

project-config patches:
https://review.openstack.org/189426
https://review.openstack.org/189727

--
Russell Bryant


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 24, 2015 by Russell_Bryant (19,240 points)   2 3 8
0 votes

On 06/24/2015 01:31 PM, Joe Gordon wrote:

On Tue, Jun 16, 2015 at 9:58 AM, Sean Dague <sean@dague.net
sean@dague.net> wrote:

Back when Nova first wanted to test partial upgrade, we did a bunch of
slightly odd conditionals inside of grenade and devstack to make it so
that if you were very careful, you could just not stop some of the old
services on a single node, upgrade everything else, and as long as the
old services didn't stop, they'd be running cached code in memory, and
it would look a bit like a 2 node worker not upgraded model. It worked,
but it was weird.

There has been some interest by the Nova team to expand what's not being
touched, as well as the Neutron team to add partial upgrade testing
support. Both are great initiatives, but I think going about it the old
way is going to add a lot of complexity in weird places, and not be as
good of a test as we really want.

Nodepool now supports allocating multiple nodes. We have a multinode job
in Nova regularly testing live migration using this.

If we slice this problem differently, I think we get a better
architecture, a much easier way to add new configs, and a much more
realistic end test.

Conceptually, use devstack-gate multinode support to set up 2 nodes, an
all in one, and a worker. Let grenade upgrade the all in one, leave the
worker alone.

I think the only complexity here is the fact that grenade.sh implicitly
drives stack.sh. Which means one of:

1) devstack-gate could build the worker first, then run grenade.sh

2) we make it so grenade.sh can execute in parts more easily, so it can
hand something else running stack.sh for it.'

3) we make grenade understand the subnode for partial upgrade, so it
will run the stack phase on the subnode itself (given credentials).

This kind of approach means deciding which services you don't want to
upgrade doesn't require devstack changes, it's just a change of the
services on the worker.

We need a volunteer for taking this on, but I think all the follow on
partial upgrade support will be much much easier to do after we have
this kind of mechanism in place.

I think this is a great approach for the future of partial upgrade
support in grenade. I would like to point out step 0 here, is to get
tempest passing consistently in multinode.

Currently the neutron job is failing consistently, and nova-network
fails roughly 10% of the time due
to https://bugs.launchpad.net/nova/+bug/1462305
and https://bugs.launchpad.net/nova/+bug/1445569

Grenade is only running tempest smoke, which is a quite small number of
tests (and not the shelve/unshelve one for instance). I would expect
it's pass rate to be much higher.

-Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 24, 2015 by Sean_Dague (66,200 points)   4 8 14
0 votes

If multi-node isn't reliable more generally yet, do you think the
simpler implementation of partial-upgrade testing could proceed? I've
already done all of the patches to do it for Neutron. That way we could
quickly get something in place to help block regressions and work on the
longer-term multinode refactoring without as much time pressure.

I was going to ask the same about the (arguably MUUCH tinier) patch to
include nova-net in the nova-compute partial upgrade bin.

I know the right answer is working on the multinode job, but fixing that
and then extending grenade to work that way is a significant amount of
work. The fact that we're not testing nova-net with nova-compute
properly right now and claiming we're safe is just lying to ourselves.

--Dan


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 24, 2015 by Dan_Smith (9,860 points)   1 2 4
0 votes

On 06/24/2015 01:49 PM, Dan Smith wrote:

If multi-node isn't reliable more generally yet, do you think the
simpler implementation of partial-upgrade testing could proceed? I've
already done all of the patches to do it for Neutron. That way we could
quickly get something in place to help block regressions and work on the
longer-term multinode refactoring without as much time pressure.

I was going to ask the same about the (arguably MUUCH tinier) patch to
include nova-net in the nova-compute partial upgrade bin.

I know the right answer is working on the multinode job, but fixing that
and then extending grenade to work that way is a significant amount of
work. The fact that we're not testing nova-net with nova-compute
properly right now and claiming we're safe is just lying to ourselves.

Yeah, that's certainly a tiny change. There are several Neutron
patches, but they're all pretty simple IMO.

I picked up this task to help with the Neutron and nova-network parity
work. It tests the backend that isn't even my primary focus/interest,
but I still wanted to help the parity work along. If it ends up
requiring this significant extra work, I'll honestly most likely just
drop it completely and hope someone else feels like taking it on.

--
Russell Bryant


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 24, 2015 by Russell_Bryant (19,240 points)   2 3 8
0 votes

On Wed, Jun 24, 2015 at 10:45 AM, Sean Dague sean@dague.net wrote:

On 06/24/2015 01:31 PM, Joe Gordon wrote:

On Tue, Jun 16, 2015 at 9:58 AM, Sean Dague <sean@dague.net
sean@dague.net> wrote:

Back when Nova first wanted to test partial upgrade, we did a bunch

of
slightly odd conditionals inside of grenade and devstack to make it
so
that if you were very careful, you could just not stop some of the
old
services on a single node, upgrade everything else, and as long as
the
old services didn't stop, they'd be running cached code in memory,
and
it would look a bit like a 2 node worker not upgraded model. It
worked,
but it was weird.

There has been some interest by the Nova team to expand what's not

being
touched, as well as the Neutron team to add partial upgrade testing
support. Both are great initiatives, but I think going about it the
old
way is going to add a lot of complexity in weird places, and not be
as
good of a test as we really want.

Nodepool now supports allocating multiple nodes. We have a multinode

job
in Nova regularly testing live migration using this.

If we slice this problem differently, I think we get a better
architecture, a much easier way to add new configs, and a much more
realistic end test.

Conceptually, use devstack-gate multinode support to set up 2 nodes,

an
all in one, and a worker. Let grenade upgrade the all in one, leave
the
worker alone.

I think the only complexity here is the fact that grenade.sh

implicitly
drives stack.sh. Which means one of:

1) devstack-gate could build the worker first, then run grenade.sh

2) we make it so grenade.sh can execute in parts more easily, so it

can
hand something else running stack.sh for it.'

3) we make grenade understand the subnode for partial upgrade, so it
will run the stack phase on the subnode itself (given credentials).

This kind of approach means deciding which services you don't want to
upgrade doesn't require devstack changes, it's just a change of the
services on the worker.

We need a volunteer for taking this on, but I think all the follow on
partial upgrade support will be much much easier to do after we have
this kind of mechanism in place.

I think this is a great approach for the future of partial upgrade
support in grenade. I would like to point out step 0 here, is to get
tempest passing consistently in multinode.

Currently the neutron job is failing consistently, and nova-network
fails roughly 10% of the time due
to https://bugs.launchpad.net/nova/+bug/1462305
and https://bugs.launchpad.net/nova/+bug/1445569

Grenade is only running tempest smoke, which is a quite small number of
tests (and not the shelve/unshelve one for instance). I would expect
it's pass rate to be much higher.

One way to find out. Want to get a multinode tempest smoke job running and
see how it looks after running for a few days.

    -Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 24, 2015 by Joe_Gordon (24,620 points)   2 5 8
0 votes

On Wed, Jun 24, 2015 at 11:01 AM, Joe Gordon joe.gordon0@gmail.com wrote:

On Wed, Jun 24, 2015 at 10:45 AM, Sean Dague sean@dague.net wrote:

On 06/24/2015 01:31 PM, Joe Gordon wrote:

On Tue, Jun 16, 2015 at 9:58 AM, Sean Dague <sean@dague.net
sean@dague.net> wrote:

Back when Nova first wanted to test partial upgrade, we did a bunch

of
slightly odd conditionals inside of grenade and devstack to make it
so
that if you were very careful, you could just not stop some of the
old
services on a single node, upgrade everything else, and as long as
the
old services didn't stop, they'd be running cached code in memory,
and
it would look a bit like a 2 node worker not upgraded model. It
worked,
but it was weird.

There has been some interest by the Nova team to expand what's not

being
touched, as well as the Neutron team to add partial upgrade testing
support. Both are great initiatives, but I think going about it the
old
way is going to add a lot of complexity in weird places, and not be
as
good of a test as we really want.

Nodepool now supports allocating multiple nodes. We have a

multinode job
in Nova regularly testing live migration using this.

If we slice this problem differently, I think we get a better
architecture, a much easier way to add new configs, and a much more
realistic end test.

Conceptually, use devstack-gate multinode support to set up 2

nodes, an
all in one, and a worker. Let grenade upgrade the all in one, leave
the
worker alone.

I think the only complexity here is the fact that grenade.sh

implicitly
drives stack.sh. Which means one of:

1) devstack-gate could build the worker first, then run grenade.sh

2) we make it so grenade.sh can execute in parts more easily, so it

can
hand something else running stack.sh for it.'

3) we make grenade understand the subnode for partial upgrade, so it
will run the stack phase on the subnode itself (given credentials).

This kind of approach means deciding which services you don't want

to
upgrade doesn't require devstack changes, it's just a change of the
services on the worker.

We need a volunteer for taking this on, but I think all the follow

on
partial upgrade support will be much much easier to do after we have
this kind of mechanism in place.

I think this is a great approach for the future of partial upgrade
support in grenade. I would like to point out step 0 here, is to get
tempest passing consistently in multinode.

Currently the neutron job is failing consistently, and nova-network
fails roughly 10% of the time due
to https://bugs.launchpad.net/nova/+bug/1462305
and https://bugs.launchpad.net/nova/+bug/1445569

Grenade is only running tempest smoke, which is a quite small number of
tests (and not the shelve/unshelve one for instance). I would expect
it's pass rate to be much higher.

One way to find out. Want to get a multinode tempest smoke job running and
see how it looks after running for a few days.

smoke jobs*, one for nova-net and one for neutron.

    -Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 24, 2015 by Joe_Gordon (24,620 points)   2 5 8
0 votes

On 06/24/2015 01:41 PM, Russell Bryant wrote:
On 06/24/2015 01:31 PM, Joe Gordon wrote:

On Tue, Jun 16, 2015 at 9:58 AM, Sean Dague <sean@dague.net
sean@dague.net> wrote:

Back when Nova first wanted to test partial upgrade, we did a bunch of
slightly odd conditionals inside of grenade and devstack to make it so
that if you were very careful, you could just not stop some of the old
services on a single node, upgrade everything else, and as long as the
old services didn't stop, they'd be running cached code in memory, and
it would look a bit like a 2 node worker not upgraded model. It worked,
but it was weird.

There has been some interest by the Nova team to expand what's not being
touched, as well as the Neutron team to add partial upgrade testing
support. Both are great initiatives, but I think going about it the old
way is going to add a lot of complexity in weird places, and not be as
good of a test as we really want.

Nodepool now supports allocating multiple nodes. We have a multinode job
in Nova regularly testing live migration using this.

If we slice this problem differently, I think we get a better
architecture, a much easier way to add new configs, and a much more
realistic end test.

Conceptually, use devstack-gate multinode support to set up 2 nodes, an
all in one, and a worker. Let grenade upgrade the all in one, leave the
worker alone.

I think the only complexity here is the fact that grenade.sh implicitly
drives stack.sh. Which means one of:

1) devstack-gate could build the worker first, then run grenade.sh

2) we make it so grenade.sh can execute in parts more easily, so it can
hand something else running stack.sh for it.'

3) we make grenade understand the subnode for partial upgrade, so it
will run the stack phase on the subnode itself (given credentials).

This kind of approach means deciding which services you don't want to
upgrade doesn't require devstack changes, it's just a change of the
services on the worker.

We need a volunteer for taking this on, but I think all the follow on
partial upgrade support will be much much easier to do after we have
this kind of mechanism in place.

I think this is a great approach for the future of partial upgrade
support in grenade. I would like to point out step 0 here, is to get
tempest passing consistently in multinode.

Currently the neutron job is failing consistently, and nova-network
fails roughly 10% of the time due
to https://bugs.launchpad.net/nova/+bug/1462305
and https://bugs.launchpad.net/nova/+bug/1445569

If multi-node isn't reliable more generally yet, do you think the
simpler implementation of partial-upgrade testing could proceed? I've
already done all of the patches to do it for Neutron. That way we could
quickly get something in place to help block regressions and work on the
longer-term multinode refactoring without as much time pressure.

The thing is, these partial service bits are sneaker than one realizes
over time. There have been all kinds of edge conditions that crept up on
the n-cpu one that are really subtle because code is running in memory
on stale versions of dependencies which are no longer on disk. And the
number of people that have this model in their head is basically down to
a SPOF.

The fact that neutron-grenade is at a 40% fail rate right now (and has
been for over a week) is not preventing anyone from just rechecking to
get past it. So I think assuming additional failing grenade tests are
going to keep folks from landing bugs is probably not a good assumption.
Making the whole path more complicated for other people to debug is an
explosion waiting to happen.

So I do want to take a hard line on doing this right, because the debt
here is higher than you might think. The partial code was always very
conceptually fragile, and fails in really funny ways some times, because
of the fact that old is not isolated from new in a way that would be
expected.

I -1ed the n-net partial upgrade changes for the same reason.

-Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 24, 2015 by Sean_Dague (66,200 points)   4 8 14
0 votes

On Wed, Jun 24, 2015 at 11:02 AM, Joe Gordon joe.gordon0@gmail.com wrote:

On Wed, Jun 24, 2015 at 11:01 AM, Joe Gordon joe.gordon0@gmail.com
wrote:

On Wed, Jun 24, 2015 at 10:45 AM, Sean Dague sean@dague.net wrote:

On 06/24/2015 01:31 PM, Joe Gordon wrote:

On Tue, Jun 16, 2015 at 9:58 AM, Sean Dague <sean@dague.net
sean@dague.net> wrote:

Back when Nova first wanted to test partial upgrade, we did a

bunch of
slightly odd conditionals inside of grenade and devstack to make
it so
that if you were very careful, you could just not stop some of the
old
services on a single node, upgrade everything else, and as long as
the
old services didn't stop, they'd be running cached code in memory,
and
it would look a bit like a 2 node worker not upgraded model. It
worked,
but it was weird.

There has been some interest by the Nova team to expand what's not

being
touched, as well as the Neutron team to add partial upgrade testing
support. Both are great initiatives, but I think going about it
the old
way is going to add a lot of complexity in weird places, and not
be as
good of a test as we really want.

Nodepool now supports allocating multiple nodes. We have a

multinode job
in Nova regularly testing live migration using this.

If we slice this problem differently, I think we get a better
architecture, a much easier way to add new configs, and a much more
realistic end test.

Conceptually, use devstack-gate multinode support to set up 2

nodes, an
all in one, and a worker. Let grenade upgrade the all in one,
leave the
worker alone.

I think the only complexity here is the fact that grenade.sh

implicitly
drives stack.sh. Which means one of:

1) devstack-gate could build the worker first, then run grenade.sh

2) we make it so grenade.sh can execute in parts more easily, so

it can
hand something else running stack.sh for it.'

3) we make grenade understand the subnode for partial upgrade, so

it
will run the stack phase on the subnode itself (given credentials).

This kind of approach means deciding which services you don't want

to
upgrade doesn't require devstack changes, it's just a change of the
services on the worker.

We need a volunteer for taking this on, but I think all the follow

on
partial upgrade support will be much much easier to do after we
have
this kind of mechanism in place.

I think this is a great approach for the future of partial upgrade
support in grenade. I would like to point out step 0 here, is to get
tempest passing consistently in multinode.

Currently the neutron job is failing consistently, and nova-network
fails roughly 10% of the time due
to https://bugs.launchpad.net/nova/+bug/1462305
and https://bugs.launchpad.net/nova/+bug/1445569

Grenade is only running tempest smoke, which is a quite small number of
tests (and not the shelve/unshelve one for instance). I would expect
it's pass rate to be much higher.

One way to find out. Want to get a multinode tempest smoke job running
and see how it looks after running for a few days.

smoke jobs*, one for nova-net and one for neutron.

Proposal for multinode smoke jobs: https://review.openstack.org/#/c/195259/

    -Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 24, 2015 by Joe_Gordon (24,620 points)   2 5 8
...