settingsLogin | Registersettings

[openstack-dev] [all] 3rd Party CI vs. Gerrit

0 votes

It's clear that lots of projects want 3rd Party CI information on
patches. But it's also clear that 6 months into this experiment with a
lot of 3rd Party CI systems, the Gerrit UI is really not great for this.

A couple of things have fallen out of this. 3rd Party CI bots outnumber
Human comments on changes on some projects (Nova / Neutron). That has an
impact on the readability of the votes section (on a neutron change the
files in the change are rarely above the fold), the readability of the
comments.

3rd Party CI systems haven't become all that reliable. They fall into
the same problems that Jenkins hits with cloud networking, race bugs in
OpenStack, but also new bugs around site configs. It's kind of a
testament to how much we've learned about how to keep the machine
running that the upstream CI system, even with all it's faults, still
trends more reliable than most of the 3rd Party systems.

Commenting in Gerrit was to eventually get towards voting in Gerrit. But
my experience at this point is reviewers are at CI fatigue and are
mostly not paying attention to the votes. Heck, when we're dealing with
a bunch of bugs in the gate most reviewers want to ignore the Jenkins
votes too, which is why you get the recheck grinding behavior.

This has all gone far enough that someone actually wrote a Grease Monkey
script to purge all the 3rd Party CI content out of Jenkins UI. People
are writing mail filters to dump all the notifications. Dan Berange
filters all them out of his gerrit query tools.

It seems what we actually want is a dashboard of these results. We want
them available when we go to Gerrit, but we don't want them in Gerrit
itself.

What if 3rd Party CI didn't vote in Gerrit? What if it instead published
to some 3rd party test reporting site (a thing that doesn't yet exist).
Gerrit has the facility so that we could inject the dashboard content
for this in Gerrit in a little table somewhere, but the data would
fundamentally live outside of Gerrit. It would also mean that all the
aggregate reporting of 3rd Party CI that's being done in custom gerrit
scripts, could be integrated directly into such a thing.

I'm not signing up for this particular mission, but I wanted to stick it
out there to figure out if the idea had merrit, and if it did, if it
excited anyone to enough to dive on it.

-Sean

--
Sean Dague
http://dague.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: OpenPGP digital signature
URL: http://lists.openstack.org/pipermail/openstack-dev/attachments/20140627/043342bc/attachment.pgp

asked Jun 27, 2014 in openstack-dev by Sean_Dague (66,200 points)   4 11 18
retagged Feb 25, 2015 by admin

20 Responses

0 votes

On 6/27/2014 7:35 AM, Daniel P. Berrange wrote:
On Fri, Jun 27, 2014 at 07:40:51AM -0400, Sean Dague wrote:

It's clear that lots of projects want 3rd Party CI information on
patches. But it's also clear that 6 months into this experiment with a
lot of 3rd Party CI systems, the Gerrit UI is really not great for this.

That's an understatement about the UI :-)

It seems what we actually want is a dashboard of these results. We want
them available when we go to Gerrit, but we don't want them in Gerrit
itself.

What if 3rd Party CI didn't vote in Gerrit? What if it instead published
to some 3rd party test reporting site (a thing that doesn't yet exist).
Gerrit has the facility so that we could inject the dashboard content
for this in Gerrit in a little table somewhere, but the data would
fundamentally live outside of Gerrit. It would also mean that all the
aggregate reporting of 3rd Party CI that's being done in custom gerrit
scripts, could be integrated directly into such a thing.

Agreed, it would be a great improvement in usability if we stopped all
CI systems, including our default Jenkins, from ever commenting on
reviews. At most gating CIs should +1/-1. Having a table of results
displayed, pulling the data from an external result tracking system
would be a great idea.

Even better if this external system had a nice button you can press
to trigger re-check, so we can stop using comments for that too.

I would disagree with this idea since it's equivalent to 'recheck no
bug' and that's naughty, because then we don't track race bugs as well.

To me the ideal world is where the only things adding comments to
reviews are human and their comments are actually about the code
in the patch :-)

Regards,
Daniel

I would be good with Jenkins not reporting on a successful run, or if
rather than a comment from Jenkins the vote in the table had a link to
the test results, so if you get a -1 from Jenkins you can follow the
link from the -1 in the table rather than the comment (to avoid
cluttering up the review comments, especially if it's a +1).

--

Thanks,

Matt Riedemann

responded Jun 28, 2014 by Matt_Riedemann (48,320 points)   3 10 25
0 votes

Matt Riedemann writes:

I would be good with Jenkins not reporting on a successful run, or if
rather than a comment from Jenkins the vote in the table had a link to
the test results, so if you get a -1 from Jenkins you can follow the
link from the -1 in the table rather than the comment (to avoid
cluttering up the review comments, especially if it's a +1).

The problem with that is it makes non-voting jobs very difficult to see,
not to mention ancillary information like job runtime. Plus it adds
extra clicks to get to jobs whose output are frequently reviewed even
with positive votes, such as docs jobs (on docs-draft.o.o).

I think a new table on the page, separate from the comment stream, with
only the latest results of each job is the ideal presentation.

-Jim

responded Jun 28, 2014 by James_E._Blair (5,080 points)   1 6 8
0 votes

On 6/28/14 10:40 AM, James E. Blair wrote:
An alternate approach would be to have third-party CI systems register
jobs with OpenStack's Zuul rather than using their own account. This
would mean only a single report of all jobs (upstream and 3rd-party)
per-patchset. It significantly reduces clutter and makes results more
accessible -- but even with one system we've never actually wanted to
have Jenkins results in comments, so I think one of the other options
would be preferred. Nonetheless, this is possible with a little bit of
work.

I agree this isn't the preferred solution, but I disagree with the
little bit of work. This would require CI systems registering with
gearman which would mean security issues. The biggest problem with this
though is that zuul would be stuck waiting from results from 3rd parties
which often have very slow return times.

responded Jun 29, 2014 by Joshua_Hesketh (1,500 points)   1 2
0 votes

On Sat, Jun 28, 2014 at 08:26:44AM -0500, Matt Riedemann wrote:

On 6/27/2014 7:35 AM, Daniel P. Berrange wrote:

On Fri, Jun 27, 2014 at 07:40:51AM -0400, Sean Dague wrote:

It's clear that lots of projects want 3rd Party CI information on
patches. But it's also clear that 6 months into this experiment with a
lot of 3rd Party CI systems, the Gerrit UI is really not great for this.

That's an understatement about the UI :-)

It seems what we actually want is a dashboard of these results. We want
them available when we go to Gerrit, but we don't want them in Gerrit
itself.

What if 3rd Party CI didn't vote in Gerrit? What if it instead published
to some 3rd party test reporting site (a thing that doesn't yet exist).
Gerrit has the facility so that we could inject the dashboard content
for this in Gerrit in a little table somewhere, but the data would
fundamentally live outside of Gerrit. It would also mean that all the
aggregate reporting of 3rd Party CI that's being done in custom gerrit
scripts, could be integrated directly into such a thing.

Agreed, it would be a great improvement in usability if we stopped all
CI systems, including our default Jenkins, from ever commenting on
reviews. At most gating CIs should +1/-1. Having a table of results
displayed, pulling the data from an external result tracking system
would be a great idea.

Even better if this external system had a nice button you can press
to trigger re-check, so we can stop using comments for that too.

I would disagree with this idea since it's equivalent to 'recheck no bug'
and that's naughty, because then we don't track race bugs as well.

It could easily have a text field for entering a bug number. The point
is to stop adding comments to gerrit that aren't related to code review.

responded Jun 30, 2014 by Daniel_P._Berrange (27,920 points)   2 4 10
0 votes

On 06/29/2014 09:39 AM, Joshua Hesketh wrote:
On 6/28/14 10:40 AM, James E. Blair wrote:

An alternate approach would be to have third-party CI systems register
jobs with OpenStack's Zuul rather than using their own account. This
would mean only a single report of all jobs (upstream and 3rd-party)
per-patchset. It significantly reduces clutter and makes results more
accessible -- but even with one system we've never actually wanted to
have Jenkins results in comments, so I think one of the other options
would be preferred. Nonetheless, this is possible with a little bit of
work.

I agree this isn't the preferred solution, but I disagree with the
little bit of work. This would require CI systems registering with
gearman which would mean security issues. The biggest problem with this
though is that zuul would be stuck waiting from results from 3rd parties
which often have very slow return times.

Right, one of the other issues is the quality of the CI results varies
as well.

I think one of the test result burn out issues right now is based on the
fact that they are too rolled up as it is. For instance, a docs only
change gets Tempest results, which humans know are irrelevant, but
Jenkins insists they aren't. I think that if we rolled up more
information, and waited longer, we'd be in a worse state.

-Sean

--
Sean Dague
http://dague.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: OpenPGP digital signature
URL:

responded Jun 30, 2014 by Sean_Dague (66,200 points)   4 11 18
0 votes

Joshua Hesketh <joshua.hesketh at rackspace.com> writes:

On 6/28/14 10:40 AM, James E. Blair wrote:

An alternate approach would be to have third-party CI systems register
jobs with OpenStack's Zuul rather than using their own account. This
would mean only a single report of all jobs (upstream and 3rd-party)
per-patchset. It significantly reduces clutter and makes results more
accessible -- but even with one system we've never actually wanted to
have Jenkins results in comments, so I think one of the other options
would be preferred. Nonetheless, this is possible with a little bit of
work.

I agree this isn't the preferred solution, but I disagree with the
little bit of work. This would require CI systems registering with
gearman which would mean security issues. The biggest problem with
this though is that zuul would be stuck waiting from results from 3rd
parties which often have very slow return times.

"Security issues" is a bit vague. They already register with Gerrit;
I'm only suggesting that the point of aggregation would change. I'm
anticipating that they would use authenticated SSL, with ACLs scoped to
the names of jobs each system is permitted to run. From the perspective
of overall security as well as network topology (ie, firewalls), very
little changes. The main differences are third party CI systems don't
have to run Zuul anymore, and we go back to having a smaller number of
votes/comments.

Part of the "little bit of work" I was referring to was adding a
timeout. That should truly be not much work, and work we're planning on
doing anyway to help with the tripleo cloud.

But anyway, it's not important to design this out if we prefer another
solution (and I prefer the table of results separated from comments).

-Jim

responded Jun 30, 2014 by James_E._Blair (5,080 points)   1 6 8
0 votes

Dan Smith wrote on 06/27/2014 12:33:48 PM:

What if 3rd Party CI didn't vote in Gerrit? What if it instead
published to some 3rd party test reporting site (a thing that
doesn't yet exist). Gerrit has the facility so that we could inject
the dashboard content for this in Gerrit in a little table
somewhere, but the data would fundamentally live outside of Gerrit.
It would also mean that all the aggregate reporting of 3rd Party CI
that's being done in custom gerrit scripts, could be integrated
directly into such a thing.

If it really does show up right in Gerrit as if it were integrated,
then that would be fine with me. I think the biggest problem we have
right now is that a lot of the CI systems are very inconsistent in
their reporting and we often don't realize when one of them hasn't
voted. If the new thing could fill out the chart based on everything
we expect to see a vote from, so that it's very clear that one is
missing, then that's a net win right there.

There is a similar old bug for that, with a good suggestion for how it
could possibly be done:

https://bugs.launchpad.net/openstack-ci/+bug/1251758

Kurt Taylor (krtaylor)
OpenStack Development Lead - PowerKVM CI
IBM Linux Technology Center
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

responded Jun 30, 2014 by krtaylor_at_us.ibm.c (280 points)   1
0 votes

Sean Dague wrote on 06/30/2014 06:03:50 AM:

From:

Sean Dague

To:

"OpenStack Development Mailing List (not for usage questions)"
,

Date:

06/30/2014 06:09 AM

Subject:

Re: [openstack-dev] [all] 3rd Party CI vs. Gerrit

On 06/29/2014 09:39 AM, Joshua Hesketh wrote:

On 6/28/14 10:40 AM, James E. Blair wrote:

An alternate approach would be to have third-party CI systems register
jobs with OpenStack's Zuul rather than using their own account. This
would mean only a single report of all jobs (upstream and 3rd-party)
per-patchset. It significantly reduces clutter and makes results more
accessible -- but even with one system we've never actually wanted to
have Jenkins results in comments, so I think one of the other options
would be preferred. Nonetheless, this is possible with a little bit
of
work.

I agree this isn't the preferred solution, but I disagree with the
little bit of work. This would require CI systems registering with
gearman which would mean security issues. The biggest problem with this
though is that zuul would be stuck waiting from results from 3rd
parties
which often have very slow return times.

Right, one of the other issues is the quality of the CI results varies
as well.

Agreed. After last summit, Anita, Jay and I decided to start gathering a
team
of 3rd party testers that have the goal of improving the quality of the
third
party systems. We are starting with gathering global unwritten
requirements,
improving documentation and reaching out to new projects that will have
third
party testing needs. We are still in the early stages but now have weekly
meetings to discuss what needs to be done and track progress.

https://wiki.openstack.org/wiki/Meetings/ThirdParty

I think one of the test result burn out issues right now is based on the
fact that they are too rolled up as it is. For instance, a docs only
change gets Tempest results, which humans know are irrelevant, but
Jenkins insists they aren't. I think that if we rolled up more
information, and waited longer, we'd be in a worse state.

Maybe it could promptly timeout and then report the system that did not
complete? That would also have the benefit of enforcing a time limit on
reporting results.

Kurt Taylor (krtaylor)
OpenStack Development Lead - PowerKVM CI
IBM Linux Technology Center
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

responded Jun 30, 2014 by krtaylor_at_us.ibm.c (280 points)   1
0 votes

2014-06-30 19:17 GMT+04:00 Kurt Taylor :

Dan Smith wrote on 06/27/2014 12:33:48 PM:

If it really does show up right in Gerrit as if it were integrated,
then that would be fine with me. I think the biggest problem we have
right now is that a lot of the CI systems are very inconsistent in
their reporting and we often don't realize when one of them hasn't
voted. If the new thing could fill out the chart based on everything
we expect to see a vote from, so that it's very clear that one is
missing, then that's a net win right there.

There is a similar old bug for that, with a good suggestion for how it
could possibly be done:

https://bugs.launchpad.net/openstack-ci/+bug/1251758

What about to have report like this:
http://stackalytics.com/report/ci/neutron/7 ?

Thanks,
Ilya
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

responded Jun 30, 2014 by Ilya_Shakhat (1,460 points)   4
0 votes

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

There is a similar old bug for that, with a good suggestion for how
it could possibly be done:

https://bugs.launchpad.net/openstack-ci/+bug/1251758

This isn't what I'm talking about. What we need is, for each new
patchset on a given change, an empty table listing all the answers we
expect to see (i.e. one for each of the usual suspect nova CI
systems). The above bug (AFAICT) is simply for tracking last-status of
each, so that if one stops reporting entirely (as minesweeper often
does), we get some indication that the system is broken.

iQEcBAEBAgAGBQJTsYc+AAoJEBeZxaMESjNVcQYH+wayg4T9QFe4tTvGn24PisCf
5cEaeSkwXl+Adiae5cCfCGSTjlErK4lpFtzFapKukcM0+eEp464toskl7vNC0izp
UWCpcg2gbON6Ef/AMa1+PT8uXR9OYAo+/eU8NUJNM01ajeZqqe3H3jnltgoUau0O
fq3O3+Wa2PxBTAVVGi3HXJl4SWpVdEuYDZYOBOtkDXwhIS/hvBdIRuJwt0CygxHx
78WatFsQ09tBHaQCJbs2E+Oar0rD4sF93qjG8jAFiVB/0SJ6wV7AsLColVId2hbe
Qfua3Q6CufJBO2WHV7JORX2fBSOTUmcPcOM1IE4/lgXGiyu3aw5ataL9e4qxudQ=
=VoHH
-----END PGP SIGNATURE-----

responded Jun 30, 2014 by Dan_Smith (9,860 points)   1 2 4
...