settingsLogin | Registersettings

[openstack-dev] [tripleo] enabling third party CI

0 votes

Something I like in TripleO is third party drivers enablement, thanks to
the plug-able interface in Puppet modules & Heat Templates.
Though I don't see any testing regarding these drivers, it sounds like we
add a lot of parameters and Puppet code that is supposed to deploy the
drivers, but we never verify it actually works.

OpenStack Infra provides an easy way to plug CI systems and some CIs (Nova,
Neutron, Cinder, etc) already gate on some third party systems.
I was wondering if we would not be interested to investigate this area and
maybe ask to our third party drivers contributors (Bigswitch, Nuage,
Midonet, Cisco, Netapp, etc) to run on their own hardware TripleO CI jobs
running their specific environment to enable the drivers.
This CI would be plugged to TripleO CI, and would provide awesome feedback.

We need to know from these vendors if they are interested in such a thing,
which could improve the quality of TripleO, something we both want.
If they agree, we could help them to setup this environment (we already
have the framework, I don't think it would require a lot of work).

I see an opportunity to involve our vendors in the quality of TripleO, and
eventually increase the number of contributors by this result.

Any feedback is welcome here.
--
Emilien Macchi


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Mar 10, 2016 in openstack-dev by emilien_at_redhat.co (36,940 points)   3 8 13

17 Responses

0 votes

This is something I would love to enable as well.

One of the requirements we have with Midonet is to be able to
parametrize the image build, since we need to add extra packages on it.

For the deployment options, I also think that it should not be hard either.

If you want to discuss it in any TripleO meeting, let me know.

Count on us, Emilien.

On Thu, 10 Mar 2016 09:50, Emilien Macchi wrote:
Something I like in TripleO is third party drivers enablement, thanks to
the plug-able interface in Puppet modules & Heat Templates.
Though I don't see any testing regarding these drivers, it sounds like we
add a lot of parameters and Puppet code that is supposed to deploy the
drivers, but we never verify it actually works.

OpenStack Infra provides an easy way to plug CI systems and some CIs (Nova,
Neutron, Cinder, etc) already gate on some third party systems.
I was wondering if we would not be interested to investigate this area and
maybe ask to our third party drivers contributors (Bigswitch, Nuage,
Midonet, Cisco, Netapp, etc) to run on their own hardware TripleO CI jobs
running their specific environment to enable the drivers.
This CI would be plugged to TripleO CI, and would provide awesome feedback.

We need to know from these vendors if they are interested in such a thing,
which could improve the quality of TripleO, something we both want.
If they agree, we could help them to setup this environment (we already
have the framework, I don't think it would require a lot of work).

I see an opportunity to involve our vendors in the quality of TripleO, and
eventually increase the number of contributors by this result.

Any feedback is welcome here.
--
Emilien Macchi


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Jaume Devesa
Software Engineer at Midokura


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Mar 10, 2016 by Jaume_Devesa (1,820 points)   1 3
0 votes

On 2016-03-10 09:50:03 -0500 (-0500), Emilien Macchi wrote:
[...]
OpenStack Infra provides an easy way to plug CI systems and some
CIs (Nova, Neutron, Cinder, etc) already gate on some third party
systems. I was wondering if we would not be interested to
investigate this area and maybe ask to our third party drivers
contributors (Bigswitch, Nuage, Midonet, Cisco, Netapp, etc) to
run on their own hardware TripleO CI jobs running their specific
environment to enable the drivers. This CI would be plugged to
TripleO CI, and would provide awesome feedback.
[...]

It's also worth broadening the discussion to reassess whether the
existing TripleO CI should itself follow our third-party integration
model instead of the current implementation relying on our main
community Zuul/Nodepool/Jenkins servers. When this was first
implemented, there was a promise of adding more regions for
robustness and of being able to use the surplus resources maintained
in the TripleO CI clouds to augment our generic CI workload. It's
been years now and these things have not really come to pass; if
anything, that system and its operators are still struggling to keep
a single region up and operational and providing enough resources to
handle the current TripleO test load.

The majority of unplanned whole-provider outages we've experienced
in Nodepool have been from the TripleO cloud going completely
offline (sometimes for a week or more straight), by far the
longest-running jobs we have are running there (which substantially
hampers our ability to do things like gracefully restart our Jenkins
masters without aborting running jobs), and ultimately the benefits
to TripleO for this model are very minimal anyway (different
pipelines means the jobs aren't effectively even voting, much less
gating).

I'm not trying to slam the TripleO cloud operators, I think they're
doing an amazing job given the limitations they're working under and
much of their work has provided inspiration for our Infra-Cloud
project too. They're helpful and responsive and a joy to collaborate
with, but ultimately I think TripleO might actually realize more
benefit from adding a Zuul/Nodepool/Jenkins of their own to this
(we've massively streamlined the Puppet for maintaining these
recently and have very thorough deployment and operational
documentation) rather than dealing with the issues which arise from
being half-integrated into one they don't control.

I've been meaning to bring that up for discussion for a while, just
keep forgetting, but this thread seems like a good segue into the
topic.
--
Jeremy Stanley


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Mar 10, 2016 by Jeremy_Stanley (56,700 points)   3 5 7
0 votes

On Thu, Mar 10, 2016 at 05:32:15PM +0000, Jeremy Stanley wrote:
On 2016-03-10 09:50:03 -0500 (-0500), Emilien Macchi wrote:
[...]

OpenStack Infra provides an easy way to plug CI systems and some
CIs (Nova, Neutron, Cinder, etc) already gate on some third party
systems. I was wondering if we would not be interested to
investigate this area and maybe ask to our third party drivers
contributors (Bigswitch, Nuage, Midonet, Cisco, Netapp, etc) to
run on their own hardware TripleO CI jobs running their specific
environment to enable the drivers. This CI would be plugged to
TripleO CI, and would provide awesome feedback.
[...]

It's also worth broadening the discussion to reassess whether the
existing TripleO CI should itself follow our third-party integration
model instead of the current implementation relying on our main
community Zuul/Nodepool/Jenkins servers. When this was first
implemented, there was a promise of adding more regions for
robustness and of being able to use the surplus resources maintained
in the TripleO CI clouds to augment our generic CI workload. It's
been years now and these things have not really come to pass; if
anything, that system and its operators are still struggling to keep
a single region up and operational and providing enough resources to
handle the current TripleO test load.

The majority of unplanned whole-provider outages we've experienced
in Nodepool have been from the TripleO cloud going completely
offline (sometimes for a week or more straight), by far the
longest-running jobs we have are running there (which substantially
hampers our ability to do things like gracefully restart our Jenkins
masters without aborting running jobs), and ultimately the benefits
to TripleO for this model are very minimal anyway (different
pipelines means the jobs aren't effectively even voting, much less
gating).

I'm not trying to slam the TripleO cloud operators, I think they're
doing an amazing job given the limitations they're working under and
much of their work has provided inspiration for our Infra-Cloud
project too. They're helpful and responsive and a joy to collaborate
with, but ultimately I think TripleO might actually realize more
benefit from adding a Zuul/Nodepool/Jenkins of their own to this
(we've massively streamlined the Puppet for maintaining these
recently and have very thorough deployment and operational
documentation) rather than dealing with the issues which arise from
being half-integrated into one they don't control.

I've been meaning to bring that up for discussion for a while, just
keep forgetting, but this thread seems like a good segue into the
topic.
I tend to agree here, I think a lot of great work has been done to allow new 3rd
party CI system to come online. Especially considering we have the
puppet-openstackci[1] module.

However, I would also like to see tripleO move more inline with our existing CI
tooling, if possible. I know that wouldn't happen over night, but would at
least give better insight into how the CI is working.

Now that we have the infracloud too, it might be worth talking about doing the
samething with TripleO hardware, again if possible. There is likely corner cases
where it wouldn't work, but would be interesting to talk about it.

[1] https://github.com/openstack-infra/puppet-openstackci


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Mar 10, 2016 by pabelanger_at_redhat (6,560 points)   1 1 2
0 votes

This seems to be the week people want to pile it on TripleO. Talking
about upstream is great but I suppose I'd rather debate major changes
after we branch Mitaka. :/

Anyways, might as well get into it now. replies inline....

On Thu, 2016-03-10 at 17:32 +0000, Jeremy Stanley wrote:
On 2016-03-10 09:50:03 -0500 (-0500), Emilien Macchi wrote:
[...]

OpenStack Infra provides an easy way to plug CI systems and some
CIs (Nova, Neutron, Cinder, etc) already gate on some third party
systems. I was wondering if we would not be interested to
investigate this area and maybe ask to our third party drivers
contributors (Bigswitch, Nuage, Midonet, Cisco, Netapp, etc) to
run on their own hardware TripleO CI jobs running their specific
environment to enable the drivers. This CI would be plugged to
TripleO CI, and would provide awesome feedback.
[...]

It's also worth broadening the discussion to reassess whether the
existing TripleO CI should itself follow our third-party integration
model instead of the current implementation relying on our main
community Zuul/Nodepool/Jenkins servers. When this was first
implemented, there was a promise of adding more regions for
robustness and of being able to use the surplus resources maintained
in the TripleO CI clouds to augment our generic CI workload. It's
been years now and these things have not really come to pass; if
anything, that system and its operators are still struggling to keep
a single region up and operational and providing enough resources to
handle the current TripleO test load.

Yeah. We actually lost a region of hardware this last year too.

I think there is a distinction between our cloud being up and trunk
being broken. Now we've had some troubles with both over the last
couple years but in general I think our CI cloud (which provides
instances) has been up 98 maybe even 99% of the time. To be honest I've
not been tracking our actual uptime for bragging rights but I think the
actual cloud (which is connected to nodepool) has a good uptime.

We have been dealing with a lot of trunk breakages however. This is
something that occurs because we are not a gate... and it is related to
the fact that we have limited resources, and a long job wall time. So
taking a step away from the common infrastructure pipelines which do
act as an upstream gate would likely only make this worse for us.

To be fair the last outage you refer to occurred over the course of
days because we made a config change only to discover the breakage days
later (because nodepool caches the keystone endpoints). We are learning
and we do timebox our systems administration a bit more than most pure
administrators but I think the general uptime of our cloud has been
good.

The majority of unplanned whole-provider outages we've experienced
in Nodepool have been from the TripleO cloud going completely
offline (sometimes for a week or more straight), by far the
longest-running jobs we have are running there (which substantially
hampers our ability to do things like gracefully restart our Jenkins
masters without aborting running jobs), and ultimately the benefits
to TripleO for this model are very minimal anyway (different
pipelines means the jobs aren't effectively even voting, much less
gating).

With regards to Jenkins restarts I think it is understood that our job
times are long. How often do you find infra needs to restart Jenkins?
And regardless of that what if we just said we didn't mind the
destructiveness of losing a few jobs now and then (until our job times
are under the line... say 1.5 hours or so). To be clear I'd be fine
with infra pulling the rug on running jobs if this is the root cause of
the long running jobs in TripleO.

I think the "benefits are minimal" is bit of an overstatement. The
initial vision for TripleO CI stands and I would still like to see
individual projects entertain the option to use us in their gates.
Perhaps the strongest community influences are within Heat, Ironic, and
Puppet. The ability to manage the interaction with Heat, Ironic, and
Puppet in the common infrastructure is a clear benefit and there are
members of these communities that I think would agree.

I'm not trying to slam the TripleO cloud operators, I think they're
doing an amazing job given the limitations they're working under and
much of their work has provided inspiration for our Infra-Cloud
project too. They're helpful and responsive and a joy to collaborate
with, but ultimately I think TripleO might actually realize more
benefit from adding a Zuul/Nodepool/Jenkins of their own to this
(we've massively streamlined the Puppet for maintaining these
recently and have very thorough deployment and operational
documentation) rather than dealing with the issues which arise from
being half-integrated into one they don't control.

We've actually move most of our daily management tasks for TripleO into
the tripleo-ci project so we don't have to bother infra with minor
config changes. So I don't think its like we are taking up a huge
amount of infra review time or causing a burden to you. We have very
few 'system' side changes for infra to deal with... and like I said
above I think it would be reasonable to give infra a pass on restarting
things due to our long job times.

Anyways, if we go off and run our own Zuul/Nodepool/Jenkins I do agree
it could work. But I think it is a step backwards for our integration
with some of the communities, and I'm not sure it costs that much for
us to stay where we are at. Especially given we are willing to make
some changes to make any of the pain points you mention more agreeable.

Dan

I've been meaning to bring that up for discussion for a while, just
keep forgetting, but this thread seems like a good segue into the
topic.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Mar 10, 2016 by Dan_Prince (8,160 points)   1 5 8
0 votes

On Thu, 2016-03-10 at 13:45 -0500, Paul Belanger wrote:
On Thu, Mar 10, 2016 at 05:32:15PM +0000, Jeremy Stanley wrote:

On 2016-03-10 09:50:03 -0500 (-0500), Emilien Macchi wrote:
[...]

OpenStack Infra provides an easy way to plug CI systems and some
CIs (Nova, Neutron, Cinder, etc) already gate on some third party
systems. I was wondering if we would not be interested to
investigate this area and maybe ask to our third party drivers
contributors (Bigswitch, Nuage, Midonet, Cisco, Netapp, etc) to
run on their own hardware TripleO CI jobs running their specific
environment to enable the drivers. This CI would be plugged to
TripleO CI, and would provide awesome feedback.
[...]

It's also worth broadening the discussion to reassess whether the
existing TripleO CI should itself follow our third-party
integration
model instead of the current implementation relying on our main
community Zuul/Nodepool/Jenkins servers. When this was first
implemented, there was a promise of adding more regions for
robustness and of being able to use the surplus resources
maintained
in the TripleO CI clouds to augment our generic CI workload. It's
been years now and these things have not really come to pass; if
anything, that system and its operators are still struggling to
keep
a single region up and operational and providing enough resources
to
handle the current TripleO test load.

The majority of unplanned whole-provider outages we've experienced
in Nodepool have been from the TripleO cloud going completely
offline (sometimes for a week or more straight), by far the
longest-running jobs we have are running there (which substantially
hampers our ability to do things like gracefully restart our
Jenkins
masters without aborting running jobs), and ultimately the benefits
to TripleO for this model are very minimal anyway (different
pipelines means the jobs aren't effectively even voting, much less
gating).

I'm not trying to slam the TripleO cloud operators, I think they're
doing an amazing job given the limitations they're working under
and
much of their work has provided inspiration for our Infra-Cloud
project too. They're helpful and responsive and a joy to
collaborate
with, but ultimately I think TripleO might actually realize more
benefit from adding a Zuul/Nodepool/Jenkins of their own to this
(we've massively streamlined the Puppet for maintaining these
recently and have very thorough deployment and operational
documentation) rather than dealing with the issues which arise from
being half-integrated into one they don't control.

I've been meaning to bring that up for discussion for a while, just
keep forgetting, but this thread seems like a good segue into the
topic.
I tend to agree here, I think a lot of great work has been done to
allow new 3rd
party CI system to come online.  Especially considering we have the
puppet-openstackci[1] module.

However, I would also like to see tripleO move more inline with our
existing CI
tooling, if possible.  I know that wouldn't happen over night, but
would at
least give better insight into how the CI is working.

TripleO uses more of OpenStack tooling than just about any project I
know of. We do have some unique requirements related to the fact that
our CI actually PXE boots instances in a cloud. Something like this:

http://blog.nemebean.com/tags/quintupleo

We have plans on the table to potentially split our Heat stack (or make
it more configurable) such that we can test the configuration side on
normal cloud instances. I'm all for the effort in this and it would get
us closer to what I think you are talking about is "normal".

Like it our not our CI does catch things that nobody else is catching.
Quirky deployment things happen and until someone gets nested virt
working on commodity cloud servers (well) I think we have a case for
our own CI cloud.

Now that we have the infracloud too, it might be worth talking about
doing the
samething with TripleO hardware, again if possible. There is likely
corner cases
where it wouldn't work, but would be interesting to talk about it.

The corner case would be does it allow us to PXE boot an instance (thus
allowing provisioning via Ironic, etc.).

We could certainly entertain the option of creating our own OVB cloud
and managing it alongside of infracloud if that is what you are asking.
 I don't think it is the best fit for TripleO today given our unique
requirements at the moment.

Dan

[1] https://github.com/openstack-infra/puppet-openstackci



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubs
cribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Mar 10, 2016 by Dan_Prince (8,160 points)   1 5 8
0 votes

On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote:
This seems to be the week people want to pile it on TripleO. Talking
about upstream is great but I suppose I'd rather debate major changes
after we branch Mitaka. :/
[...]

I didn't mean to pile on TripleO, nor did I intend to imply this was
something which should happen ASAP (or even necessarily at all), but
I do want to better understand what actual benefit is currently
derived from this implementation vs. a more typical third-party CI
(which lots of projects are doing when they find their testing needs
are not met by the constraints of our generic test infrastructure).

With regards to Jenkins restarts I think it is understood that our job
times are long. How often do you find infra needs to restart Jenkins?

We're restarting all 8 of our production Jenkins masters weekly at a
minimum, but generally more often when things are busy (2-3 times a
week). For many months we've been struggling with a thread leak for
which their development team has not seen as a priority to even
triage our bug report effectively. At this point I think we've
mostly given up on expecting it to be solved by anything other than
our upcoming migration off of Jenkins, but that's another topic
altogether.

And regardless of that what if we just said we didn't mind the
destructiveness of losing a few jobs now and then (until our job
times are under the line... say 1.5 hours or so). To be clear I'd
be fine with infra pulling the rug on running jobs if this is the
root cause of the long running jobs in TripleO.

For manual Jenkins restarts this is probably doable (if additional
hassle), but I don't know whether that's something we can easily
shoehorn into our orchestrated/automated restarts.

I think the "benefits are minimal" is bit of an overstatement. The
initial vision for TripleO CI stands and I would still like to see
individual projects entertain the option to use us in their gates.
[...]

This is what I'd like to delve deeper into. The current
implementation isn't providing you with any mechanism to prevent
changes which fail jobs running in the tripleo-test cloud from
merging to your repos, is it? You're still having to manually
inspect the job results posted by it? How is that particularly
different from relying on third-party CI integration?

As for other projects making use of the same jobs, right now the
only convenience I'm aware of is that they can add check-tripleo
pipeline jobs in our Zuul layout file instead of having you add it
to yours (which could itself reside in a Git repo under your
control, giving you even more flexibility over those choices). In
fact, with a third-party CI using its own separate Gerrit account,
you would be able to leave clear -1/+1 votes on check results which
is not possible with the present solution.

So anyway, I'm not saying that I definitely believe the third-party
CI route will be better for TripleO, but I'm not (yet) clear on what
tangible benefit you're receiving now that you lose by switching to
that model.
--
Jeremy Stanley


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Mar 10, 2016 by Jeremy_Stanley (56,700 points)   3 5 7
0 votes

On 03/10/2016 05:24 PM, Jeremy Stanley wrote:
On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote:

This seems to be the week people want to pile it on TripleO. Talking
about upstream is great but I suppose I'd rather debate major changes
after we branch Mitaka. :/
[...]

I didn't mean to pile on TripleO, nor did I intend to imply this was
something which should happen ASAP (or even necessarily at all), but
I do want to better understand what actual benefit is currently
derived from this implementation vs. a more typical third-party CI
(which lots of projects are doing when they find their testing needs
are not met by the constraints of our generic test infrastructure).

With regards to Jenkins restarts I think it is understood that our job
times are long. How often do you find infra needs to restart Jenkins?

We're restarting all 8 of our production Jenkins masters weekly at a
minimum, but generally more often when things are busy (2-3 times a
week). For many months we've been struggling with a thread leak for
which their development team has not seen as a priority to even
triage our bug report effectively. At this point I think we've
mostly given up on expecting it to be solved by anything other than
our upcoming migration off of Jenkins, but that's another topic
altogether.

And regardless of that what if we just said we didn't mind the
destructiveness of losing a few jobs now and then (until our job
times are under the line... say 1.5 hours or so). To be clear I'd
be fine with infra pulling the rug on running jobs if this is the
root cause of the long running jobs in TripleO.

For manual Jenkins restarts this is probably doable (if additional
hassle), but I don't know whether that's something we can easily
shoehorn into our orchestrated/automated restarts.

I think the "benefits are minimal" is bit of an overstatement. The
initial vision for TripleO CI stands and I would still like to see
individual projects entertain the option to use us in their gates.
[...]

This is what I'd like to delve deeper into. The current
implementation isn't providing you with any mechanism to prevent
changes which fail jobs running in the tripleo-test cloud from
merging to your repos, is it? You're still having to manually
inspect the job results posted by it? How is that particularly
different from relying on third-party CI integration?

As for other projects making use of the same jobs, right now the
only convenience I'm aware of is that they can add check-tripleo
pipeline jobs in our Zuul layout file instead of having you add it
to yours (which could itself reside in a Git repo under your
control, giving you even more flexibility over those choices). In
fact, with a third-party CI using its own separate Gerrit account,
you would be able to leave clear -1/+1 votes on check results which
is not possible with the present solution.

So anyway, I'm not saying that I definitely believe the third-party
CI route will be better for TripleO, but I'm not (yet) clear on what
tangible benefit you're receiving now that you lose by switching to
that model.

FWIW, I think third-party CI probably makes sense for TripleO.
Practically speaking we are third-party CI right now - we run our own
independent hardware infrastructure, we aren't multi-region, and we
can't leave a vote on changes. Since the first two aren't likely to
change any time soon (although I believe it's still a long-term goal to
get to a place where we can run in regular infra and just contribute our
existing CI hardware to the general infra pool, but that's still a long
way off), and moving to actual third-party CI would get us the ability
to vote, I think it's worth pursuing.

As an added bit of fun, we have a forced move of our CI hardware coming
up in the relatively near future, and if we don't want to have multiple
days (and possibly more, depending on how the move goes) of TripleO CI
outage we're probably going to need to stand up a new environment in
parallel anyway. If we're doing that it might make sense to try hooking
it in through the third-party infra instead of the way we do it today.
Hopefully that would allow us to work out the kinks before the old
environment goes away.

Anyway, I'm sure we'll need a bunch more discussion about this, but I
wanted to chime in with my two cents.

-Ben


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Mar 17, 2016 by Ben_Nemec (19,660 points)   2 3 5
0 votes

On Thu, Mar 17, 2016 at 11:59:22AM -0500, Ben Nemec wrote:
On 03/10/2016 05:24 PM, Jeremy Stanley wrote:

On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote:

This seems to be the week people want to pile it on TripleO. Talking
about upstream is great but I suppose I'd rather debate major changes
after we branch Mitaka. :/
[...]

I didn't mean to pile on TripleO, nor did I intend to imply this was
something which should happen ASAP (or even necessarily at all), but
I do want to better understand what actual benefit is currently
derived from this implementation vs. a more typical third-party CI
(which lots of projects are doing when they find their testing needs
are not met by the constraints of our generic test infrastructure).

With regards to Jenkins restarts I think it is understood that our job
times are long. How often do you find infra needs to restart Jenkins?

We're restarting all 8 of our production Jenkins masters weekly at a
minimum, but generally more often when things are busy (2-3 times a
week). For many months we've been struggling with a thread leak for
which their development team has not seen as a priority to even
triage our bug report effectively. At this point I think we've
mostly given up on expecting it to be solved by anything other than
our upcoming migration off of Jenkins, but that's another topic
altogether.

And regardless of that what if we just said we didn't mind the
destructiveness of losing a few jobs now and then (until our job
times are under the line... say 1.5 hours or so). To be clear I'd
be fine with infra pulling the rug on running jobs if this is the
root cause of the long running jobs in TripleO.

For manual Jenkins restarts this is probably doable (if additional
hassle), but I don't know whether that's something we can easily
shoehorn into our orchestrated/automated restarts.

I think the "benefits are minimal" is bit of an overstatement. The
initial vision for TripleO CI stands and I would still like to see
individual projects entertain the option to use us in their gates.
[...]

This is what I'd like to delve deeper into. The current
implementation isn't providing you with any mechanism to prevent
changes which fail jobs running in the tripleo-test cloud from
merging to your repos, is it? You're still having to manually
inspect the job results posted by it? How is that particularly
different from relying on third-party CI integration?

As for other projects making use of the same jobs, right now the
only convenience I'm aware of is that they can add check-tripleo
pipeline jobs in our Zuul layout file instead of having you add it
to yours (which could itself reside in a Git repo under your
control, giving you even more flexibility over those choices). In
fact, with a third-party CI using its own separate Gerrit account,
you would be able to leave clear -1/+1 votes on check results which
is not possible with the present solution.

So anyway, I'm not saying that I definitely believe the third-party
CI route will be better for TripleO, but I'm not (yet) clear on what
tangible benefit you're receiving now that you lose by switching to
that model.

FWIW, I think third-party CI probably makes sense for TripleO.
Practically speaking we are third-party CI right now - we run our own
independent hardware infrastructure, we aren't multi-region, and we
can't leave a vote on changes. Since the first two aren't likely to
change any time soon (although I believe it's still a long-term goal to
get to a place where we can run in regular infra and just contribute our
existing CI hardware to the general infra pool, but that's still a long
way off), and moving to actual third-party CI would get us the ability
to vote, I think it's worth pursuing.

As an added bit of fun, we have a forced move of our CI hardware coming
up in the relatively near future, and if we don't want to have multiple
days (and possibly more, depending on how the move goes) of TripleO CI
outage we're probably going to need to stand up a new environment in
parallel anyway. If we're doing that it might make sense to try hooking
it in through the third-party infra instead of the way we do it today.
Hopefully that would allow us to work out the kinks before the old
environment goes away.

Anyway, I'm sure we'll need a bunch more discussion about this, but I
wanted to chime in with my two cents.

Do you have any ETA on when your outage would be? Is it before or after the
summit in Austin?

Personally, I'm going to attend a few TripleO design session where ever
possible in Austin. It would be great to maybe have a fishbowl session about it.

-Ben


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Mar 17, 2016 by pabelanger_at_redhat (6,560 points)   1 1 2
0 votes

On 03/17/2016 01:13 PM, Paul Belanger wrote:
On Thu, Mar 17, 2016 at 11:59:22AM -0500, Ben Nemec wrote:

On 03/10/2016 05:24 PM, Jeremy Stanley wrote:

On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote:

This seems to be the week people want to pile it on TripleO. Talking
about upstream is great but I suppose I'd rather debate major changes
after we branch Mitaka. :/
[...]

I didn't mean to pile on TripleO, nor did I intend to imply this was
something which should happen ASAP (or even necessarily at all), but
I do want to better understand what actual benefit is currently
derived from this implementation vs. a more typical third-party CI
(which lots of projects are doing when they find their testing needs
are not met by the constraints of our generic test infrastructure).

With regards to Jenkins restarts I think it is understood that our job
times are long. How often do you find infra needs to restart Jenkins?

We're restarting all 8 of our production Jenkins masters weekly at a
minimum, but generally more often when things are busy (2-3 times a
week). For many months we've been struggling with a thread leak for
which their development team has not seen as a priority to even
triage our bug report effectively. At this point I think we've
mostly given up on expecting it to be solved by anything other than
our upcoming migration off of Jenkins, but that's another topic
altogether.

And regardless of that what if we just said we didn't mind the
destructiveness of losing a few jobs now and then (until our job
times are under the line... say 1.5 hours or so). To be clear I'd
be fine with infra pulling the rug on running jobs if this is the
root cause of the long running jobs in TripleO.

For manual Jenkins restarts this is probably doable (if additional
hassle), but I don't know whether that's something we can easily
shoehorn into our orchestrated/automated restarts.

I think the "benefits are minimal" is bit of an overstatement. The
initial vision for TripleO CI stands and I would still like to see
individual projects entertain the option to use us in their gates.
[...]

This is what I'd like to delve deeper into. The current
implementation isn't providing you with any mechanism to prevent
changes which fail jobs running in the tripleo-test cloud from
merging to your repos, is it? You're still having to manually
inspect the job results posted by it? How is that particularly
different from relying on third-party CI integration?

As for other projects making use of the same jobs, right now the
only convenience I'm aware of is that they can add check-tripleo
pipeline jobs in our Zuul layout file instead of having you add it
to yours (which could itself reside in a Git repo under your
control, giving you even more flexibility over those choices). In
fact, with a third-party CI using its own separate Gerrit account,
you would be able to leave clear -1/+1 votes on check results which
is not possible with the present solution.

So anyway, I'm not saying that I definitely believe the third-party
CI route will be better for TripleO, but I'm not (yet) clear on what
tangible benefit you're receiving now that you lose by switching to
that model.

FWIW, I think third-party CI probably makes sense for TripleO.
Practically speaking we are third-party CI right now - we run our own
independent hardware infrastructure, we aren't multi-region, and we
can't leave a vote on changes. Since the first two aren't likely to
change any time soon (although I believe it's still a long-term goal to
get to a place where we can run in regular infra and just contribute our
existing CI hardware to the general infra pool, but that's still a long
way off), and moving to actual third-party CI would get us the ability
to vote, I think it's worth pursuing.

As an added bit of fun, we have a forced move of our CI hardware coming
up in the relatively near future, and if we don't want to have multiple
days (and possibly more, depending on how the move goes) of TripleO CI
outage we're probably going to need to stand up a new environment in
parallel anyway. If we're doing that it might make sense to try hooking
it in through the third-party infra instead of the way we do it today.
Hopefully that would allow us to work out the kinks before the old
environment goes away.

Anyway, I'm sure we'll need a bunch more discussion about this, but I
wanted to chime in with my two cents.

Do you have any ETA on when your outage would be? Is it before or after the
summit in Austin?

Personally, I'm going to attend a few TripleO design session where ever
possible in Austin. It would be great to maybe have a fishbowl session about it.

It's after, but we'll only have a couple of months or so at that point
to wrap everything up, so I suspect we'll need to have some basic plan
in place before or we'll never be able to get hardware in time. It may
be too late already. :-/

Probably the first thing I need to do is follow up with people
internally and find out if there's already a plan in place for this that
I just don't know about. That's entirely possible.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Mar 17, 2016 by Ben_Nemec (19,660 points)   2 3 5
0 votes

On Thu, Mar 17, 2016 at 01:55:24PM -0500, Ben Nemec wrote:
On 03/17/2016 01:13 PM, Paul Belanger wrote:

On Thu, Mar 17, 2016 at 11:59:22AM -0500, Ben Nemec wrote:

On 03/10/2016 05:24 PM, Jeremy Stanley wrote:

On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote:

This seems to be the week people want to pile it on TripleO. Talking
about upstream is great but I suppose I'd rather debate major changes
after we branch Mitaka. :/
[...]

I didn't mean to pile on TripleO, nor did I intend to imply this was
something which should happen ASAP (or even necessarily at all), but
I do want to better understand what actual benefit is currently
derived from this implementation vs. a more typical third-party CI
(which lots of projects are doing when they find their testing needs
are not met by the constraints of our generic test infrastructure).

With regards to Jenkins restarts I think it is understood that our job
times are long. How often do you find infra needs to restart Jenkins?

We're restarting all 8 of our production Jenkins masters weekly at a
minimum, but generally more often when things are busy (2-3 times a
week). For many months we've been struggling with a thread leak for
which their development team has not seen as a priority to even
triage our bug report effectively. At this point I think we've
mostly given up on expecting it to be solved by anything other than
our upcoming migration off of Jenkins, but that's another topic
altogether.

And regardless of that what if we just said we didn't mind the
destructiveness of losing a few jobs now and then (until our job
times are under the line... say 1.5 hours or so). To be clear I'd
be fine with infra pulling the rug on running jobs if this is the
root cause of the long running jobs in TripleO.

For manual Jenkins restarts this is probably doable (if additional
hassle), but I don't know whether that's something we can easily
shoehorn into our orchestrated/automated restarts.

I think the "benefits are minimal" is bit of an overstatement. The
initial vision for TripleO CI stands and I would still like to see
individual projects entertain the option to use us in their gates.
[...]

This is what I'd like to delve deeper into. The current
implementation isn't providing you with any mechanism to prevent
changes which fail jobs running in the tripleo-test cloud from
merging to your repos, is it? You're still having to manually
inspect the job results posted by it? How is that particularly
different from relying on third-party CI integration?

As for other projects making use of the same jobs, right now the
only convenience I'm aware of is that they can add check-tripleo
pipeline jobs in our Zuul layout file instead of having you add it
to yours (which could itself reside in a Git repo under your
control, giving you even more flexibility over those choices). In
fact, with a third-party CI using its own separate Gerrit account,
you would be able to leave clear -1/+1 votes on check results which
is not possible with the present solution.

So anyway, I'm not saying that I definitely believe the third-party
CI route will be better for TripleO, but I'm not (yet) clear on what
tangible benefit you're receiving now that you lose by switching to
that model.

FWIW, I think third-party CI probably makes sense for TripleO.
Practically speaking we are third-party CI right now - we run our own
independent hardware infrastructure, we aren't multi-region, and we
can't leave a vote on changes. Since the first two aren't likely to
change any time soon (although I believe it's still a long-term goal to
get to a place where we can run in regular infra and just contribute our
existing CI hardware to the general infra pool, but that's still a long
way off), and moving to actual third-party CI would get us the ability
to vote, I think it's worth pursuing.

As an added bit of fun, we have a forced move of our CI hardware coming
up in the relatively near future, and if we don't want to have multiple
days (and possibly more, depending on how the move goes) of TripleO CI
outage we're probably going to need to stand up a new environment in
parallel anyway. If we're doing that it might make sense to try hooking
it in through the third-party infra instead of the way we do it today.
Hopefully that would allow us to work out the kinks before the old
environment goes away.

Anyway, I'm sure we'll need a bunch more discussion about this, but I
wanted to chime in with my two cents.

Do you have any ETA on when your outage would be? Is it before or after the
summit in Austin?

Personally, I'm going to attend a few TripleO design session where ever
possible in Austin. It would be great to maybe have a fishbowl session about it.

It's after, but we'll only have a couple of months or so at that point
to wrap everything up, so I suspect we'll need to have some basic plan
in place before or we'll never be able to get hardware in time. It may
be too late already. :-/

Probably the first thing I need to do is follow up with people
internally and find out if there's already a plan in place for this that
I just don't know about. That's entirely possible.

Agreed, it would be great to have the discussion in the public as much as
possible. However, if moving to thirdparty CI, I can understand the need for
internal facing talks.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Mar 17, 2016 by pabelanger_at_redhat (6,560 points)   1 1 2
...