settingsLogin | Registersettings

[openstack-dev] [tripleo] Blocking gate - do not recheck / rebase / approve any patch now (please)

0 votes

We have been working very hard to get a package/container promotions
(since 44 days) and now our blocker is
https://review.openstack.org/#/c/513701/.

Because the gate queue is huge, we decided to block the gate and kill
all the jobs running there until we can get
https://review.openstack.org/#/c/513701/ and its backport
https://review.openstack.org/#/c/514584 (both are blocking the whole
production chain).
We hope to promote after these 2 patches, unless there is something
else, in that case we would iterate to the next problem.

We hope you understand and support us during this effort.
So please do not recheck, rebase or approve any patch until further notice.

Thank you,
--
Emilien Macchi


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Oct 26, 2017 in openstack-dev by emilien_at_redhat.co (36,940 points)   3 8 13

5 Responses

0 votes

Status:

  • Heat Convergence switch might be a reason why overcloud timeout so
    much. Thomas proposed to disable it:
    https://review.openstack.org/515077
  • Every time a patch fails in the tripleo gate queue, it reset the
    gate. I proposed to remove this common queue:
    https://review.openstack.org/515070
  • I cleared the patches in check and queue to make sure the 2 blockers
    are tested and can be merged in priority. I'll keep an eye on it
    today.

Any help is very welcome.

On Wed, Oct 25, 2017 at 5:58 AM, Emilien Macchi emilien@redhat.com wrote:
We have been working very hard to get a package/container promotions
(since 44 days) and now our blocker is
https://review.openstack.org/#/c/513701/.

Because the gate queue is huge, we decided to block the gate and kill
all the jobs running there until we can get
https://review.openstack.org/#/c/513701/ and its backport
https://review.openstack.org/#/c/514584 (both are blocking the whole
production chain).
We hope to promote after these 2 patches, unless there is something
else, in that case we would iterate to the next problem.

We hope you understand and support us during this effort.
So please do not recheck, rebase or approve any patch until further notice.

Thank you,
--
Emilien Macchi

--
Emilien Macchi


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 25, 2017 by emilien_at_redhat.co (36,940 points)   3 8 13
0 votes

Quick update before being afk for some hours:

Once again, we'll need to retrospect and see why we reached that
terrible state but let's focus on bringing our CI in a good shape
again.
Thanks a ton to everyone who is involved,

On Wed, Oct 25, 2017 at 7:25 AM, Emilien Macchi emilien@redhat.com wrote:
Status:

  • Heat Convergence switch might be a reason why overcloud timeout so
    much. Thomas proposed to disable it:
    https://review.openstack.org/515077
  • Every time a patch fails in the tripleo gate queue, it reset the
    gate. I proposed to remove this common queue:
    https://review.openstack.org/515070
  • I cleared the patches in check and queue to make sure the 2 blockers
    are tested and can be merged in priority. I'll keep an eye on it
    today.

Any help is very welcome.

On Wed, Oct 25, 2017 at 5:58 AM, Emilien Macchi emilien@redhat.com wrote:

We have been working very hard to get a package/container promotions
(since 44 days) and now our blocker is
https://review.openstack.org/#/c/513701/.

Because the gate queue is huge, we decided to block the gate and kill
all the jobs running there until we can get
https://review.openstack.org/#/c/513701/ and its backport
https://review.openstack.org/#/c/514584 (both are blocking the whole
production chain).
We hope to promote after these 2 patches, unless there is something
else, in that case we would iterate to the next problem.

We hope you understand and support us during this effort.
So please do not recheck, rebase or approve any patch until further notice.

Thank you,
--
Emilien Macchi

--
Emilien Macchi

--
Emilien Macchi


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 25, 2017 by emilien_at_redhat.co (36,940 points)   3 8 13
0 votes

On Wed, Oct 25, 2017 at 1:59 PM, Emilien Macchi emilien@redhat.com wrote:
Quick update before being afk for some hours:

Landed.

Done, please be very careful while these jobs are not voting.
If any doubt, please ping me or fultonj or gfidente on #tripleo.

Merged - Dougal will work on the real fix this week but not urgent anymore.

In gate right now and hopefully merged in less than 2 hours.
Otherwise, please keep rechecking it.
According to Thomas Hervé, il will reduce the change to timeout.

  • puppet-tripleo gate broken on stable branches (syntax jobs not
    running properly) - jeblair is looking at it now

jeblair will provide a fix hopefully this week but this is not
critical at this time.
Thanks Jim for your help.

Once again, we'll need to retrospect and see why we reached that
terrible state but let's focus on bringing our CI in a good shape
again.
Thanks a ton to everyone who is involved,

I'm now restoring all patches that I killed from the gate.
You can now recheck / rebase / approve what you want, but please save
our CI resources and do it with moderation. We are not done yet.

I won't call victory but we've merged almost all our blockers, one is
missing but currently in gate:
https://review.openstack.org/515123 - need babysit until merged.

Now let's see how RDO promotion works. We're close :-)

Thanks everyone,

On Wed, Oct 25, 2017 at 7:25 AM, Emilien Macchi emilien@redhat.com wrote:

Status:

  • Heat Convergence switch might be a reason why overcloud timeout so
    much. Thomas proposed to disable it:
    https://review.openstack.org/515077
  • Every time a patch fails in the tripleo gate queue, it reset the
    gate. I proposed to remove this common queue:
    https://review.openstack.org/515070
  • I cleared the patches in check and queue to make sure the 2 blockers
    are tested and can be merged in priority. I'll keep an eye on it
    today.

Any help is very welcome.

On Wed, Oct 25, 2017 at 5:58 AM, Emilien Macchi emilien@redhat.com wrote:

We have been working very hard to get a package/container promotions
(since 44 days) and now our blocker is
https://review.openstack.org/#/c/513701/.

Because the gate queue is huge, we decided to block the gate and kill
all the jobs running there until we can get
https://review.openstack.org/#/c/513701/ and its backport
https://review.openstack.org/#/c/514584 (both are blocking the whole
production chain).
We hope to promote after these 2 patches, unless there is something
else, in that case we would iterate to the next problem.

We hope you understand and support us during this effort.
So please do not recheck, rebase or approve any patch until further notice.

Thank you,
--
Emilien Macchi

--
Emilien Macchi

--
Emilien Macchi

--
Emilien Macchi


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 26, 2017 by emilien_at_redhat.co (36,940 points)   3 8 13
0 votes

Thank you for working on this!
I know it is needed to unblock development of tripleo. I have though a
few comments inline.

On 10/26/17 6:14 AM, Emilien Macchi wrote:
On Wed, Oct 25, 2017 at 1:59 PM, Emilien Macchi emilien@redhat.com wrote:

Quick update before being afk for some hours:

Landed.

Done, please be very careful while these jobs are not voting.
If any doubt, please ping me or fultonj or gfidente on #tripleo.

Merged - Dougal will work on the real fix this week but not urgent anymore.

In gate right now and hopefully merged in less than 2 hours.
Otherwise, please keep rechecking it.
According to Thomas Hervé, il will reduce the change to timeout.

  • puppet-tripleo gate broken on stable branches (syntax jobs not
    running properly) - jeblair is looking at it now

jeblair will provide a fix hopefully this week but this is not
critical at this time.
Thanks Jim for your help.

Once again, we'll need to retrospect and see why we reached that
terrible state but let's focus on bringing our CI in a good shape
again.
Thanks a ton to everyone who is involved,

I'm now restoring all patches that I killed from the gate.
You can now recheck / rebase / approve what you want, but please save
our CI resources and do it with moderation. We are not done yet.

I won't call victory but we've merged almost all our blockers, one is
missing but currently in gate:
https://review.openstack.org/515123 - need babysit until merged.

I have to warn tripleo folks about any instack-only changes these days.
Please make sure each instack-only change, like Hiera overrides, has
follow-up patches for containerized cases as well, which do not use
instack. Otherwise, we're putting the whole containers thing under high
risk to keep in place the regressions fixed for non-containers. That is
dangerous, given that we disable voting for it from time to time.

For this particular case, please add it in a separate review in
puppet/services/zaqar*. Thanks @bandini for confirming that on IRC.

Now let's see how RDO promotion works. We're close :-)

Thanks everyone,

On Wed, Oct 25, 2017 at 7:25 AM, Emilien Macchi emilien@redhat.com wrote:

Status:

  • Heat Convergence switch might be a reason why overcloud timeout so
    much. Thomas proposed to disable it:
    https://review.openstack.org/515077
  • Every time a patch fails in the tripleo gate queue, it reset the
    gate. I proposed to remove this common queue:
    https://review.openstack.org/515070
  • I cleared the patches in check and queue to make sure the 2 blockers
    are tested and can be merged in priority. I'll keep an eye on it
    today.

Any help is very welcome.

On Wed, Oct 25, 2017 at 5:58 AM, Emilien Macchi emilien@redhat.com wrote:

We have been working very hard to get a package/container promotions
(since 44 days) and now our blocker is
https://review.openstack.org/#/c/513701/.

Because the gate queue is huge, we decided to block the gate and kill
all the jobs running there until we can get
https://review.openstack.org/#/c/513701/ and its backport
https://review.openstack.org/#/c/514584 (both are blocking the whole
production chain).
We hope to promote after these 2 patches, unless there is something
else, in that case we would iterate to the next problem.

We hope you understand and support us during this effort.
So please do not recheck, rebase or approve any patch until further notice.

Thank you,
--
Emilien Macchi

--
Emilien Macchi

--
Emilien Macchi

--
Best regards,
Bogdan Dobrelya,
Irc #bogdando


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 26, 2017 by bdobreli_at_redhat.c (2,260 points)   2 3
0 votes

On 10/26/2017 06:14 AM, Emilien Macchi wrote:
On Wed, Oct 25, 2017 at 1:59 PM, Emilien Macchi emilien@redhat.com wrote:

Quick update before being afk for some hours:

Landed.

Done, please be very careful while these jobs are not voting.
If any doubt, please ping me or fultonj or gfidente on #tripleo.

Merged - Dougal will work on the real fix this week but not urgent anymore.

In gate right now and hopefully merged in less than 2 hours.
Otherwise, please keep rechecking it.
According to Thomas Hervé, il will reduce the change to timeout.

  • puppet-tripleo gate broken on stable branches (syntax jobs not
    running properly) - jeblair is looking at it now

jeblair will provide a fix hopefully this week but this is not
critical at this time.
Thanks Jim for your help.

Once again, we'll need to retrospect and see why we reached that
terrible state but let's focus on bringing our CI in a good shape
again.
Thanks a ton to everyone who is involved,

I'm now restoring all patches that I killed from the gate.
You can now recheck / rebase / approve what you want, but please save
our CI resources and do it with moderation. We are not done yet.

I won't call victory but we've merged almost all our blockers, one is
missing but currently in gate:
https://review.openstack.org/515123 - need babysit until merged.

Now let's see how RDO promotion works. We're close :-)

We also have to change the tenant rc file from overcloudrc to
overcloudrc.v3 for the validate-simple role to unblock promotion on master.

I created a bug to track that problem and going to post a fix soon:

https://bugs.launchpad.net/tripleo/+bug/1727698

Attila


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 26, 2017 by Attila_Darazs (840 points)   1 1
...