settingsLogin | Registersettings

[openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

0 votes

Hi everyone,

I'm looking for some help regarding an issue that we're having with
the Puppet OpenStack modules, we've had very inconsistent failures in
the Xenial with the following error:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/
http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/testr_results.html.gz
Details: {u'message': u'Unable to associate floating IP

172.24.5.17 to fixed IP 10.100.0.8 for instance
d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to
https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
timed out', u'code': 400}

At this point, we're at a bit of a loss. I've tried my best in order
to find the root cause however we have not been able to do this. It
was persistent enough that we elected to go non-voting for our Xenial
gates, however, with no fix ahead of us, I feel like this is a waste
of resources and we need to either fix this or drop CI for Ubuntu. We
don't deploy on Ubuntu and most of the developers working on the
project don't either at this point, so we need a bit of resources.

If you're a user of Puppet on Xenial, we need your help! Without any
resources going to fix this, we'd unfortunately have to drop support
for Ubuntu because of the lack of resources to maintain it (or
assistance). We (Puppet OpenStack team) would be more than happy to
work together to fix this so pop-in at #puppet-openstack or reply to
this email and let's get this issue fixed.

Thanks,
Mohammed


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Nov 14, 2017 in openstack-dev by Mohammed_Naser (3,860 points)   1 3

15 Responses

0 votes

From a quick glance at the logs my guess is that the issue is related to this stack trace in the l3 agent logs:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=TRACE#_2017-10-29_23_11_15_146

I'm not sure what's causing it to complain there. But, I'm on a plane right now (which is why this is a top post, sorry) so I can't really dig much more than that. I'll try to take a deeper look at things later when I'm on solid ground. (hopefully someone will beat me to it by then though)

-Matt Treinish

On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser mnaser@vexxhost.com wrote:

Hi everyone,

I'm looking for some help regarding an issue that we're having with
the Puppet OpenStack modules, we've had very inconsistent failures in
the Xenial with the following error:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/
http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/testr_results.html.gz
Details: {u'message': u'Unable to associate floating IP
172.24.5.17 to fixed IP 10.100.0.8 for instance
d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to
https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
timed out', u'code': 400}

At this point, we're at a bit of a loss. I've tried my best in order
to find the root cause however we have not been able to do this. It
was persistent enough that we elected to go non-voting for our Xenial
gates, however, with no fix ahead of us, I feel like this is a waste
of resources and we need to either fix this or drop CI for Ubuntu. We
don't deploy on Ubuntu and most of the developers working on the
project don't either at this point, so we need a bit of resources.

If you're a user of Puppet on Xenial, we need your help! Without any
resources going to fix this, we'd unfortunately have to drop support
for Ubuntu because of the lack of resources to maintain it (or
assistance). We (Puppet OpenStack team) would be more than happy to
work together to fix this so pop-in at #puppet-openstack or reply to
this email and let's get this issue fixed.

Thanks,
Mohammed


OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 30, 2017 by Matthew_Treinish (11,200 points)   2 5 5
0 votes

On 10/30/2017 05:46 PM, Matthew Treinish wrote:
From a quick glance at the logs my guess is that the issue is related
to this stack trace in the l3 agent logs:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=TRACE#_2017-10-29_23_11_15_146

I'm not sure what's causing it to complain there. But, I'm on a plane
right now (which is why this is a top post, sorry) so I can't really dig
much more than that. I'll try to take a deeper look at things later when
I'm on solid ground. (hopefully someone will beat me to it by then though)

I don't think that l3-agent trace is it, as the failure is coming from
the API. It's actually a trace that's happening due to the async nature
of how the agent runs arping, fix is
https://review.openstack.org/#/c/507914/ but it only removes the log noise.

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-server.txt.gz
has some tracebacks that look config related, possible missing DB table?
But I haven't looked very closely.

-Brian

On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser
mnaser@vexxhost.com wrote:

Hi everyone,

I'm looking for some help regarding an issue that we're having with
the Puppet OpenStack modules, we've had very inconsistent failures in
the Xenial with the following error:

     http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/
     http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/testr_results.html.gz
     Details: {u'message': u'Unable to associate floating IP
172.24.5.17   to fixed IP10.100.0.8   for instance
d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to
https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
timed out', u'code': 400}

At this point, we're at a bit of a loss.  I've tried my best in order
to find the root cause however we have not been able to do this.  It
was persistent enough that we elected to go non-voting for our Xenial
gates, however, with no fix ahead of us, I feel like this is a waste
of resources and we need to either fix this or drop CI for Ubuntu.  We
don't deploy on Ubuntu and most of the developers working on the
project don't either at this point, so we need a bit of resources.

If you're a user of Puppet on Xenial, we need your help!  Without any
resources going to fix this, we'd unfortunately have to drop support
for Ubuntu because of the lack of resources to maintain it (or
assistance).  We (Puppet OpenStack team) would be more than happy to
work together to fix this so pop-in at #puppet-openstack or reply to
this email and let's get this issue fixed.

Thanks,
Mohammed

------------------------------------------------------------------------

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 30, 2017 by haleyb.dev_at_gmail. (880 points)  
0 votes

On Mon, Oct 30, 2017 at 6:07 PM, Brian Haley haleyb.dev@gmail.com wrote:
On 10/30/2017 05:46 PM, Matthew Treinish wrote:

From a quick glance at the logs my guess is that the issue is related to
this stack trace in the l3 agent logs:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=TRACE#_2017-10-29_23_11_15_146

I'm not sure what's causing it to complain there. But, I'm on a plane
right now (which is why this is a top post, sorry) so I can't really dig
much more than that. I'll try to take a deeper look at things later when I'm
on solid ground. (hopefully someone will beat me to it by then though)

I don't think that l3-agent trace is it, as the failure is coming from the
API. It's actually a trace that's happening due to the async nature of how
the agent runs arping, fix is https://review.openstack.org/#/c/507914/ but
it only removes the log noise.

Indeed, I've reached out to Neutron team on IRC and Brian informed me
that this was just log noise.

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-server.txt.gz
has some tracebacks that look config related, possible missing DB table?
But I haven't looked very closely.

The tracebacks are because the Neutron server is started before the
MySQL database is sync'd (afaik, Ubuntu behaviour is to start services
on install, so we haven't had a chance to sync the db). You can see
the service later restart with none of these database issues. The
other reason to eliminate config issues is the fact that this happens
intermittently (though, often enough that we had to switch it to
non-voting). If it was a config issue, it would constantly and always
fail.

Thank you Brian & Matthew for your help so far.

-Brian

On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser
mnaser@vexxhost.com wrote:

Hi everyone,

I'm looking for some help regarding an issue that we're having with
the Puppet OpenStack modules, we've had very inconsistent failures in
the Xenial with the following error:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/testr_results.html.gz
Details: {u'message': u'Unable to associate floating IP
172.24.5.17 to fixed IP10.100.0.8
for instance
d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to

https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
timed out', u'code': 400}

At this point, we're at a bit of a loss.  I've tried my best in order
to find the root cause however we have not been able to do this.  It
was persistent enough that we elected to go non-voting for our Xenial
gates, however, with no fix ahead of us, I feel like this is a waste
of resources and we need to either fix this or drop CI for Ubuntu.  We
don't deploy on Ubuntu and most of the developers working on the
project don't either at this point, so we need a bit of resources.

If you're a user of Puppet on Xenial, we need your help!  Without any
resources going to fix this, we'd unfortunately have to drop support
for Ubuntu because of the lack of resources to maintain it (or
assistance).  We (Puppet OpenStack team) would be more than happy to
work together to fix this so pop-in at #puppet-openstack or reply to
this email and let's get this issue fixed.

Thanks,
Mohammed

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:

OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 30, 2017 by Mohammed_Naser (3,860 points)   1 3
0 votes

I've been staring at this for almost an hour now going through all the logs and I can't really pin a point from

where that error message is generated. Cannot find any references for the timed out message that the API returns or the unable to associate part.

What I'm currently staring at is why would the instance fixed ip 172.24.5.17 be references as a network:router_gateway port in the OVS agent logs.

2017-10-29 23:19:27.591 11856 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovsneutronagent [req-7274c6f7-18ef-420d-ad5a-9d0fe4eb35c6 - - - - -] Port 053a625c-4227-41fb-9a26-45eda7bd2055 updated. Details: {'profile': {}, 'networkqospolicyid': None, 'qospolicyid': None, 'allowedaddresspairs': [], 'adminstateup': True, 'networkid': 'f9647756-41ad-4ec5-af49-daefe410815e', 'segmentationid': None, 'fixedips': [{'subnetid': 'a31c7115-1f3e-4220-8bdb-981b6df2e18c', 'ipaddress': '172.24.5.17'}], 'deviceowner': u'network:routergateway', 'physicalnetwork': u'external', 'macaddress': 'fa:16:3e:3b:ec:c3', 'device': u'053a625c-4227-41fb-9a26-45eda7bd2055', 'portsecurityenabled': False, 'portid': '053a625c-4227-41fb-9a26-45eda7bd2055', 'networktype': u'flat', 'security_groups': []}

Anybody else seen anything interesting?

On 10/30/2017 11:08 PM, Brian Haley wrote:

On 10/30/2017 05:46 PM, Matthew Treinish wrote:

From a quick glance at the logs my guess is that the issue is related
to this stack trace in the l3 agent logs:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=TRACE#_2017-10-29_23_11_15_146

I'm not sure what's causing it to complain there. But, I'm on a plane
right now (which is why this is a top post, sorry) so I can't really dig
much more than that. I'll try to take a deeper look at things later when
I'm on solid ground. (hopefully someone will beat me to it by then though)

I don't think that l3-agent trace is it, as the failure is coming from
the API. It's actually a trace that's happening due to the async nature
of how the agent runs arping, fix is
https://review.openstack.org/#/c/507914/ but it only removes the log noise.

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-server.txt.gz
has some tracebacks that look config related, possible missing DB table?
But I haven't looked very closely.

-Brian

On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser
mnaser@vexxhost.commnaser@vexxhost.com wrote:

Hi everyone,

I'm looking for some help regarding an issue that we're having with
the Puppet OpenStack modules, we've had very inconsistent failures in
the Xenial with the following error:

     http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/
     http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/testr_results.html.gz
     Details: {u'message': u'Unable to associate floating IP
172.24.5.17   to fixed IP10.100.0.8   for instance
d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to
https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
timed out', u'code': 400}

At this point, we're at a bit of a loss.  I've tried my best in order
to find the root cause however we have not been able to do this.  It
was persistent enough that we elected to go non-voting for our Xenial
gates, however, with no fix ahead of us, I feel like this is a waste
of resources and we need to either fix this or drop CI for Ubuntu.  We
don't deploy on Ubuntu and most of the developers working on the
project don't either at this point, so we need a bit of resources.

If you're a user of Puppet on Xenial, we need your help!  Without any
resources going to fix this, we'd unfortunately have to drop support
for Ubuntu because of the lack of resources to maintain it (or
assistance).  We (Puppet OpenStack team) would be more than happy to
work together to fix this so pop-in at #puppet-openstack or reply to
this email and let's get this issue fixed.

Thanks,
Mohammed

------------------------------------------------------------------------

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 2, 2017 by Tobias_Urdin (1,300 points)   1
0 votes

On Thu, Nov 2, 2017 at 1:02 PM, Tobias Urdin tobias.urdin@crystone.com wrote:
I've been staring at this for almost an hour now going through all the logs
and I can't really pin a point from

where that error message is generated. Cannot find any references for the
timed out message that the API returns or the unable to associate part.

What I'm currently staring at is why would the instance fixed ip 172.24.5.17
be references as a network:router_gateway port in the OVS agent logs.

2017-10-29 23:19:27.591 11856 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovsneutronagent
[req-7274c6f7-18ef-420d-ad5a-9d0fe4eb35c6 - - - - -] Port
053a625c-4227-41fb-9a26-45eda7bd2055 updated. Details: {'profile': {},
'networkqospolicyid': None, 'qospolicyid': None,
'allowed
addresspairs': [], 'adminstateup': True, 'networkid':
'f9647756-41ad-4ec5-af49-daefe410815e', 'segmentationid': None,
'fixed
ips': [{'subnetid': 'a31c7115-1f3e-4220-8bdb-981b6df2e18c',
'ip
address': '172.24.5.17'}], 'deviceowner': u'network:routergateway',
'physicalnetwork': u'external', 'macaddress': 'fa:16:3e:3b:ec:c3',
'device': u'053a625c-4227-41fb-9a26-45eda7bd2055', 'portsecurityenabled':
False, 'portid': '053a625c-4227-41fb-9a26-45eda7bd2055', 'networktype':
u'flat', 'security_groups': []}

Anybody else seen anything interesting?

Hi Tobias,

Thanks for looking out. I've spent a lot of time and I haven't
successfully identified much, I can add the following

  • This issue is intermittent in CI
  • It does not happen on any specific providers, I've seen it fail on
    both OVH and Rackspace.
  • Not all floating iP assignments fail, if you look at the logs, you
    can see it attach a few successfully before failing

But yeah, I'm still quite at a loss and not having this coverage isn't fun.

On 10/30/2017 11:08 PM, Brian Haley wrote:

On 10/30/2017 05:46 PM, Matthew Treinish wrote:

From a quick glance at the logs my guess is that the issue is related
to this stack trace in the l3 agent logs:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=TRACE#_2017-10-29_23_11_15_146

I'm not sure what's causing it to complain there. But, I'm on a plane
right now (which is why this is a top post, sorry) so I can't really dig
much more than that. I'll try to take a deeper look at things later when
I'm on solid ground. (hopefully someone will beat me to it by then though)

I don't think that l3-agent trace is it, as the failure is coming from
the API. It's actually a trace that's happening due to the async nature
of how the agent runs arping, fix is
https://review.openstack.org/#/c/507914/ but it only removes the log noise.

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-server.txt.gz
has some tracebacks that look config related, possible missing DB table?
But I haven't looked very closely.

-Brian

On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser
mnaser@vexxhost.com wrote:

Hi everyone,

I'm looking for some help regarding an issue that we're having with
the Puppet OpenStack modules, we've had very inconsistent failures in
the Xenial with the following error:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/testr_results.html.gz
Details: {u'message': u'Unable to associate floating IP
172.24.5.17 to fixed IP10.100.0.8
for instance
d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to

https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
timed out', u'code': 400}

At this point, we're at a bit of a loss.  I've tried my best in order
to find the root cause however we have not been able to do this.  It
was persistent enough that we elected to go non-voting for our Xenial
gates, however, with no fix ahead of us, I feel like this is a waste
of resources and we need to either fix this or drop CI for Ubuntu.  We
don't deploy on Ubuntu and most of the developers working on the
project don't either at this point, so we need a bit of resources.

If you're a user of Puppet on Xenial, we need your help!  Without any
resources going to fix this, we'd unfortunately have to drop support
for Ubuntu because of the lack of resources to maintain it (or
assistance).  We (Puppet OpenStack team) would be more than happy to
work together to fix this so pop-in at #puppet-openstack or reply to
this email and let's get this issue fixed.

Thanks,
Mohammed

------------------------------------------------------------------------

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:

OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 2, 2017 by Mohammed_Naser (3,860 points)   1 3
0 votes

Hi,

Hope that everyone had safe travels and enjoyed their time at Sydney
(and those who weren't there enjoyed a bit of quiet time!). I'm just
sending this email if anyone had a chance to look more into this (or
perhaps we can get some help if there are any Canonical folks on the
list?)

I would be really sad if we had to drop Ubuntu/Debian support because
we cannot test it. I think there are a lot of users out there using
it! I'd be more than happy to provide any assistance/information in
troubleshooting this.

Thank you,
Mohammed

On Thu, Nov 2, 2017 at 1:10 PM, Mohammed Naser mnaser@vexxhost.com wrote:
On Thu, Nov 2, 2017 at 1:02 PM, Tobias Urdin tobias.urdin@crystone.com wrote:

I've been staring at this for almost an hour now going through all the logs
and I can't really pin a point from

where that error message is generated. Cannot find any references for the
timed out message that the API returns or the unable to associate part.

What I'm currently staring at is why would the instance fixed ip 172.24.5.17
be references as a network:router_gateway port in the OVS agent logs.

2017-10-29 23:19:27.591 11856 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovsneutronagent
[req-7274c6f7-18ef-420d-ad5a-9d0fe4eb35c6 - - - - -] Port
053a625c-4227-41fb-9a26-45eda7bd2055 updated. Details: {'profile': {},
'networkqospolicyid': None, 'qospolicyid': None,
'allowed
addresspairs': [], 'adminstateup': True, 'networkid':
'f9647756-41ad-4ec5-af49-daefe410815e', 'segmentationid': None,
'fixed
ips': [{'subnetid': 'a31c7115-1f3e-4220-8bdb-981b6df2e18c',
'ip
address': '172.24.5.17'}], 'deviceowner': u'network:routergateway',
'physicalnetwork': u'external', 'macaddress': 'fa:16:3e:3b:ec:c3',
'device': u'053a625c-4227-41fb-9a26-45eda7bd2055', 'portsecurityenabled':
False, 'portid': '053a625c-4227-41fb-9a26-45eda7bd2055', 'networktype':
u'flat', 'security_groups': []}

Anybody else seen anything interesting?

Hi Tobias,

Thanks for looking out. I've spent a lot of time and I haven't
successfully identified much, I can add the following

  • This issue is intermittent in CI
  • It does not happen on any specific providers, I've seen it fail on
    both OVH and Rackspace.
  • Not all floating iP assignments fail, if you look at the logs, you
    can see it attach a few successfully before failing

But yeah, I'm still quite at a loss and not having this coverage isn't fun.

On 10/30/2017 11:08 PM, Brian Haley wrote:

On 10/30/2017 05:46 PM, Matthew Treinish wrote:

From a quick glance at the logs my guess is that the issue is related
to this stack trace in the l3 agent logs:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=TRACE#_2017-10-29_23_11_15_146

I'm not sure what's causing it to complain there. But, I'm on a plane
right now (which is why this is a top post, sorry) so I can't really dig
much more than that. I'll try to take a deeper look at things later when
I'm on solid ground. (hopefully someone will beat me to it by then though)

I don't think that l3-agent trace is it, as the failure is coming from
the API. It's actually a trace that's happening due to the async nature
of how the agent runs arping, fix is
https://review.openstack.org/#/c/507914/ but it only removes the log noise.

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-server.txt.gz
has some tracebacks that look config related, possible missing DB table?
But I haven't looked very closely.

-Brian

On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser
mnaser@vexxhost.com wrote:

Hi everyone,

I'm looking for some help regarding an issue that we're having with
the Puppet OpenStack modules, we've had very inconsistent failures in
the Xenial with the following error:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/testr_results.html.gz
Details: {u'message': u'Unable to associate floating IP
172.24.5.17 to fixed IP10.100.0.8
for instance
d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to

https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
timed out', u'code': 400}

At this point, we're at a bit of a loss.  I've tried my best in order
to find the root cause however we have not been able to do this.  It
was persistent enough that we elected to go non-voting for our Xenial
gates, however, with no fix ahead of us, I feel like this is a waste
of resources and we need to either fix this or drop CI for Ubuntu.  We
don't deploy on Ubuntu and most of the developers working on the
project don't either at this point, so we need a bit of resources.

If you're a user of Puppet on Xenial, we need your help!  Without any
resources going to fix this, we'd unfortunately have to drop support
for Ubuntu because of the lack of resources to maintain it (or
assistance).  We (Puppet OpenStack team) would be more than happy to
work together to fix this so pop-in at #puppet-openstack or reply to
this email and let's get this issue fixed.

Thanks,
Mohammed

------------------------------------------------------------------------

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:

OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 13, 2017 by Mohammed_Naser (3,860 points)   1 3
0 votes

Hey,
Do you know if the bug appears on a specific Ubuntu / openstack version?
As far as I remember it was not related to the puppet branch? I mean the
bug is existing on master but also on newton puppet branches, right?

We are using Ubuntu in my company so we would love to see that continue ;)
I'll try to take a look again.

Cheers.

Le 14 nov. 2017 00:00, "Mohammed Naser" mnaser@vexxhost.com a écrit :

Hi,

Hope that everyone had safe travels and enjoyed their time at Sydney
(and those who weren't there enjoyed a bit of quiet time!). I'm just
sending this email if anyone had a chance to look more into this (or
perhaps we can get some help if there are any Canonical folks on the
list?)

I would be really sad if we had to drop Ubuntu/Debian support because
we cannot test it. I think there are a lot of users out there using
it! I'd be more than happy to provide any assistance/information in
troubleshooting this.

Thank you,
Mohammed

On Thu, Nov 2, 2017 at 1:10 PM, Mohammed Naser mnaser@vexxhost.com
wrote:

On Thu, Nov 2, 2017 at 1:02 PM, Tobias Urdin tobias.urdin@crystone.com
wrote:

I've been staring at this for almost an hour now going through all the
logs
and I can't really pin a point from

where that error message is generated. Cannot find any references for
the
timed out message that the API returns or the unable to associate part.

What I'm currently staring at is why would the instance fixed ip
172.24.5.17
be references as a network:router_gateway port in the OVS agent logs.

2017-10-29 23:19:27.591 11856 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovsneutronagent
[req-7274c6f7-18ef-420d-ad5a-9d0fe4eb35c6 - - - - -] Port
053a625c-4227-41fb-9a26-45eda7bd2055 updated. Details: {'profile': {},
'networkqospolicyid': None, 'qospolicyid': None,
'allowed
addresspairs': [], 'adminstateup': True, 'networkid':
'f9647756-41ad-4ec5-af49-daefe410815e', 'segmentationid': None,
'fixed
ips': [{'subnetid': 'a31c7115-1f3e-4220-8bdb-981b6df2e18c',
'ip
address': '172.24.5.17'}], 'deviceowner':
u'network:router
gateway',
'physicalnetwork': u'external', 'macaddress': 'fa:16:3e:3b:ec:c3',
'device': u'053a625c-4227-41fb-9a26-45eda7bd2055',
'portsecurityenabled':
False, 'portid': '053a625c-4227-41fb-9a26-45eda7bd2055',
'network
type':
u'flat', 'security_groups': []}

Anybody else seen anything interesting?

Hi Tobias,

Thanks for looking out. I've spent a lot of time and I haven't
successfully identified much, I can add the following

  • This issue is intermittent in CI
  • It does not happen on any specific providers, I've seen it fail on
    both OVH and Rackspace.
  • Not all floating iP assignments fail, if you look at the logs, you
    can see it attach a few successfully before failing

But yeah, I'm still quite at a loss and not having this coverage isn't
fun.

On 10/30/2017 11:08 PM, Brian Haley wrote:

On 10/30/2017 05:46 PM, Matthew Treinish wrote:

From a quick glance at the logs my guess is that the issue is related
to this stack trace in the l3 agent logs:

http://logs.openstack.org/47/514347/1/check/puppet-
openstack-integration-4-scenario001-tempest-ubuntu-
xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=
TRACE#_2017-10-29_23_11_15_146

I'm not sure what's causing it to complain there. But, I'm on a plane
right now (which is why this is a top post, sorry) so I can't really dig
much more than that. I'll try to take a deeper look at things later when
I'm on solid ground. (hopefully someone will beat me to it by then
though)

I don't think that l3-agent trace is it, as the failure is coming from
the API. It's actually a trace that's happening due to the async nature
of how the agent runs arping, fix is
https://review.openstack.org/#/c/507914/ but it only removes the log
noise.

http://logs.openstack.org/47/514347/1/check/puppet-
openstack-integration-4-scenario001-tempest-ubuntu-
xenial/ed5a657/logs/neutron/neutron-server.txt.gz
has some tracebacks that look config related, possible missing DB table?
But I haven't looked very closely.

-Brian

On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser
mnaser@vexxhost.com wrote:

Hi everyone,

I'm looking for some help regarding an issue that we're having with
the Puppet OpenStack modules, we've had very inconsistent failures

in
the Xenial with the following error:

http://logs.openstack.org/47/514347/1/check/puppet-
openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/

http://logs.openstack.org/47/514347/1/check/puppet-
openstack-integration-4-scenario001-tempest-ubuntu-
xenial/ed5a657/logs/testr_results.html.gz
Details: {u'message': u'Unable to associate floating IP
172.24.5.17 to fixed IP10.100.0.8
for instance
d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to

https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-
443c-b5ba-eeb521d6324c
timed out', u'code': 400}

At this point, we're at a bit of a loss.  I've tried my best in

order
to find the root cause however we have not been able to do this. It
was persistent enough that we elected to go non-voting for our
Xenial
gates, however, with no fix ahead of us, I feel like this is a waste
of resources and we need to either fix this or drop CI for Ubuntu.
We
don't deploy on Ubuntu and most of the developers working on the
project don't either at this point, so we need a bit of resources.

If you're a user of Puppet on Xenial, we need your help!  Without

any
resources going to fix this, we'd unfortunately have to drop support
for Ubuntu because of the lack of resources to maintain it (or
assistance). We (Puppet OpenStack team) would be more than happy to
work together to fix this so pop-in at #puppet-openstack or reply to
this email and let's get this issue fixed.

Thanks,
Mohammed

------------------------------------------------------------

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:

OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:
unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:
unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:
unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 14, 2017 by arnaud.morin_at_gmai (320 points)  
0 votes

Hello,

Same here, I will continue looking at this aswell.

Would be great if we could get some input from a neutron dev with good insight into the project.

Can we backtrace the timed out message from where it's thrown/returned.

I'm interested why we would get 400 code back, the floating ip operations should be async right so this would be the response from the API layer with information from the database that should

return more information.

Best regards

On 11/14/2017 07:41 AM, Arnaud MORIN wrote:
Hey,
Do you know if the bug appears on a specific Ubuntu / openstack version?
As far as I remember it was not related to the puppet branch? I mean the bug is existing on master but also on newton puppet branches, right?

We are using Ubuntu in my company so we would love to see that continue ;)
I'll try to take a look again.

Cheers.

Le 14 nov. 2017 00:00, "Mohammed Naser" mnaser@vexxhost.com a écrit :
Hi,

Hope that everyone had safe travels and enjoyed their time at Sydney
(and those who weren't there enjoyed a bit of quiet time!). I'm just
sending this email if anyone had a chance to look more into this (or
perhaps we can get some help if there are any Canonical folks on the
list?)

I would be really sad if we had to drop Ubuntu/Debian support because
we cannot test it. I think there are a lot of users out there using
it! I'd be more than happy to provide any assistance/information in
troubleshooting this.

Thank you,
Mohammed

On Thu, Nov 2, 2017 at 1:10 PM, Mohammed Naser mnaser@vexxhost.com wrote:
On Thu, Nov 2, 2017 at 1:02 PM, Tobias Urdin tobias.urdin@crystone.com wrote:

I've been staring at this for almost an hour now going through all the logs
and I can't really pin a point from

where that error message is generated. Cannot find any references for the
timed out message that the API returns or the unable to associate part.

What I'm currently staring at is why would the instance fixed ip 172.24.5.17
be references as a network:router_gateway port in the OVS agent logs.

2017-10-29 23:19:27.591 11856 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovsneutronagent
[req-7274c6f7-18ef-420d-ad5a-9d0fe4eb35c6 - - - - -] Port
053a625c-4227-41fb-9a26-45eda7bd2055 updated. Details: {'profile': {},
'networkqospolicyid': None, 'qospolicyid': None,
'allowed
addresspairs': [], 'adminstateup': True, 'networkid':
'f9647756-41ad-4ec5-af49-daefe410815e', 'segmentationid': None,
'fixed
ips': [{'subnetid': 'a31c7115-1f3e-4220-8bdb-981b6df2e18c',
'ip
address': '172.24.5.17'}], 'deviceowner': u'network:routergateway',
'physicalnetwork': u'external', 'macaddress': 'fa:16:3e:3b:ec:c3',
'device': u'053a625c-4227-41fb-9a26-45eda7bd2055', 'portsecurityenabled':
False, 'portid': '053a625c-4227-41fb-9a26-45eda7bd2055', 'networktype':
u'flat', 'security_groups': []}

Anybody else seen anything interesting?

Hi Tobias,

Thanks for looking out. I've spent a lot of time and I haven't
successfully identified much, I can add the following

  • This issue is intermittent in CI
  • It does not happen on any specific providers, I've seen it fail on
    both OVH and Rackspace.
  • Not all floating iP assignments fail, if you look at the logs, you
    can see it attach a few successfully before failing

But yeah, I'm still quite at a loss and not having this coverage isn't fun.

On 10/30/2017 11:08 PM, Brian Haley wrote:

On 10/30/2017 05:46 PM, Matthew Treinish wrote:

From a quick glance at the logs my guess is that the issue is related
to this stack trace in the l3 agent logs:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=TRACE#_2017-10-29_23_11_15_146

I'm not sure what's causing it to complain there. But, I'm on a plane
right now (which is why this is a top post, sorry) so I can't really dig
much more than that. I'll try to take a deeper look at things later when
I'm on solid ground. (hopefully someone will beat me to it by then though)

I don't think that l3-agent trace is it, as the failure is coming from
the API. It's actually a trace that's happening due to the async nature
of how the agent runs arping, fix is
https://review.openstack.org/#/c/507914/ but it only removes the log noise.

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-server.txt.gz
has some tracebacks that look config related, possible missing DB table?
But I haven't looked very closely.

-Brian

On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser
mnaser@vexxhost.com wrote:

Hi everyone,

I'm looking for some help regarding an issue that we're having with
the Puppet OpenStack modules, we've had very inconsistent failures in
the Xenial with the following error:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/testr_results.html.gz
Details: {u'message': u'Unable to associate floating IP
172.24.5.17 to fixed IP10.100.0.8
for instance
d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to

https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
timed out', u'code': 400}

At this point, we're at a bit of a loss.  I've tried my best in order
to find the root cause however we have not been able to do this.  It
was persistent enough that we elected to go non-voting for our Xenial
gates, however, with no fix ahead of us, I feel like this is a waste
of resources and we need to either fix this or drop CI for Ubuntu.  We
don't deploy on Ubuntu and most of the developers working on the
project don't either at this point, so we need a bit of resources.

If you're a user of Puppet on Xenial, we need your help!  Without any
resources going to fix this, we'd unfortunately have to drop support
for Ubuntu because of the lack of resources to maintain it (or
assistance).  We (Puppet OpenStack team) would be more than happy to
work together to fix this so pop-in at #puppet-openstack or reply to
this email and let's get this issue fixed.

Thanks,
Mohammed

------------------------------------------------------------------------

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:

OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 14, 2017 by Tobias_Urdin (1,300 points)   1
0 votes

Trying to trace this, tempest calls the POST /servers//action API endpoint for the nova compute api.

https://github.com/openstack/tempest/blob/master/tempest/lib/services/compute/floating_ips_client.py#L82

Nova then takes the requests and tries to do this floating ip association using the neutron server api.

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/nova/nova-api.txt.gz

2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips [req-7f810cc7-a498-4bf4-b27e-8fc80d652785 42526a28b1a14c629b83908b2d75c647 2493426e6a3c4253a60c0b7eb35cfe19 - default default] Unable to associate floating IP 172.24.5.17 to fixed IP 10.100.0.8 for instance d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c timed out: ConnectTimeout: Request to https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c timed out

Checking that timestamp in the neutron-server logs:
http://paste.openstack.org/show/626240/

We can see that during this timestamp right before at 23:12:30.377 and then after 23:12:35.611 everything seems to be doing fine.
So there is some connectivity issues to the neutron API from where the Nova API is running causing a timeout.

Now some more questions would be:

  • Why is the return code 400? Are we being fooled or is it actually a connection timeout.
  • Is the Neutron API stuck causing the failed connection? All talk are done over loopback so chance of a problem there is very low.
  • Any firewall catching this? Not likely since the agent processes requests right before and after.

I can't find anything interesting in the overall other system logs that could explain that.
Back to the logs!

Best regards
Tobias

On 11/14/2017 08:35 AM, Tobias Urdin wrote:

Hello,

Same here, I will continue looking at this aswell.

Would be great if we could get some input from a neutron dev with good insight into the project.

Can we backtrace the timed out message from where it's thrown/returned.

Error: Request to https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c timed out', u'code': 400}

I'm interested why we would get 400 code back, the floating ip operations should be async right so this would be the response from the API layer with information from the database that should

return more information.

Best regards

On 11/14/2017 07:41 AM, Arnaud MORIN wrote:
Hey,
Do you know if the bug appears on a specific Ubuntu / openstack version?
As far as I remember it was not related to the puppet branch? I mean the bug is existing on master but also on newton puppet branches, right?

We are using Ubuntu in my company so we would love to see that continue ;)
I'll try to take a look again.

Cheers.

Le 14 nov. 2017 00:00, "Mohammed Naser" mnaser@vexxhost.com a écrit :
Hi,

Hope that everyone had safe travels and enjoyed their time at Sydney
(and those who weren't there enjoyed a bit of quiet time!). I'm just
sending this email if anyone had a chance to look more into this (or
perhaps we can get some help if there are any Canonical folks on the
list?)

I would be really sad if we had to drop Ubuntu/Debian support because
we cannot test it. I think there are a lot of users out there using
it! I'd be more than happy to provide any assistance/information in
troubleshooting this.

Thank you,
Mohammed

On Thu, Nov 2, 2017 at 1:10 PM, Mohammed Naser mnaser@vexxhost.com wrote:
On Thu, Nov 2, 2017 at 1:02 PM, Tobias Urdin tobias.urdin@crystone.com wrote:

I've been staring at this for almost an hour now going through all the logs
and I can't really pin a point from

where that error message is generated. Cannot find any references for the
timed out message that the API returns or the unable to associate part.

What I'm currently staring at is why would the instance fixed ip 172.24.5.17
be references as a network:router_gateway port in the OVS agent logs.

2017-10-29 23:19:27.591 11856 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovsneutronagent
[req-7274c6f7-18ef-420d-ad5a-9d0fe4eb35c6 - - - - -] Port
053a625c-4227-41fb-9a26-45eda7bd2055 updated. Details: {'profile': {},
'networkqospolicyid': None, 'qospolicyid': None,
'allowed
addresspairs': [], 'adminstateup': True, 'networkid':
'f9647756-41ad-4ec5-af49-daefe410815e', 'segmentationid': None,
'fixed
ips': [{'subnetid': 'a31c7115-1f3e-4220-8bdb-981b6df2e18c',
'ip
address': '172.24.5.17'}], 'deviceowner': u'network:routergateway',
'physicalnetwork': u'external', 'macaddress': 'fa:16:3e:3b:ec:c3',
'device': u'053a625c-4227-41fb-9a26-45eda7bd2055', 'portsecurityenabled':
False, 'portid': '053a625c-4227-41fb-9a26-45eda7bd2055', 'networktype':
u'flat', 'security_groups': []}

Anybody else seen anything interesting?

Hi Tobias,

Thanks for looking out. I've spent a lot of time and I haven't
successfully identified much, I can add the following

  • This issue is intermittent in CI
  • It does not happen on any specific providers, I've seen it fail on
    both OVH and Rackspace.
  • Not all floating iP assignments fail, if you look at the logs, you
    can see it attach a few successfully before failing

But yeah, I'm still quite at a loss and not having this coverage isn't fun.

On 10/30/2017 11:08 PM, Brian Haley wrote:

On 10/30/2017 05:46 PM, Matthew Treinish wrote:

From a quick glance at the logs my guess is that the issue is related
to this stack trace in the l3 agent logs:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=TRACE#_2017-10-29_23_11_15_146

I'm not sure what's causing it to complain there. But, I'm on a plane
right now (which is why this is a top post, sorry) so I can't really dig
much more than that. I'll try to take a deeper look at things later when
I'm on solid ground. (hopefully someone will beat me to it by then though)

I don't think that l3-agent trace is it, as the failure is coming from
the API. It's actually a trace that's happening due to the async nature
of how the agent runs arping, fix is
https://review.openstack.org/#/c/507914/ but it only removes the log noise.

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-server.txt.gz
has some tracebacks that look config related, possible missing DB table?
But I haven't looked very closely.

-Brian

On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser
mnaser@vexxhost.com wrote:

Hi everyone,

I'm looking for some help regarding an issue that we're having with
the Puppet OpenStack modules, we've had very inconsistent failures in
the Xenial with the following error:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/testr_results.html.gz
Details: {u'message': u'Unable to associate floating IP
172.24.5.17 to fixed IP10.100.0.8
for instance
d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to

https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
timed out', u'code': 400}

At this point, we're at a bit of a loss.  I've tried my best in order
to find the root cause however we have not been able to do this.  It
was persistent enough that we elected to go non-voting for our Xenial
gates, however, with no fix ahead of us, I feel like this is a waste
of resources and we need to either fix this or drop CI for Ubuntu.  We
don't deploy on Ubuntu and most of the developers working on the
project don't either at this point, so we need a bit of resources.

If you're a user of Puppet on Xenial, we need your help!  Without any
resources going to fix this, we'd unfortunately have to drop support
for Ubuntu because of the lack of resources to maintain it (or
assistance).  We (Puppet OpenStack team) would be more than happy to
work together to fix this so pop-in at #puppet-openstack or reply to
this email and let's get this issue fixed.

Thanks,
Mohammed

------------------------------------------------------------------------

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:

OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 14, 2017 by Tobias_Urdin (1,300 points)   1
0 votes

Am I actually hallucinating or is it the nova API that cannot communicate with Keystone?
Cannot substantiate this with any logs for keystone.

2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floatingips [req-7f810cc7-a498-4bf4-b27e-8fc80d652785 42526a28b1a14c629b83908b2d75c647 2493426e6a3c4253a60c0b7eb35cfe19 - default default] Unable to associate floating IP 172.24.5.17 to fixed IP 10.100.0.8 for instance d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c timed out: ConnectTimeout: Request to https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c timed out
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating
ips Traceback (most recent call last):
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floatingips File "/usr/lib/python2.7/dist-packages/nova/api/openstack/compute/floatingips.py", line 267, in addfloatingip
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating
ips fixedaddress=fixedaddress)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floatingips File "/usr/lib/python2.7/dist-packages/nova/network/baseapi.py", line 83, in wrapper
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floatingips res = f(self, context, *args, kwargs)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floatingips File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 1759, in associatefloatingip
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating
ips client.updatefloatingip(fip['id'], {'floatingip': param})
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating
ips File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 99, in wrapper
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips ret = obj(*args, **kwargs)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips File "/usr/lib/python2.7/dist-packages/neutronclient/v2_0/client.py", line 935, in update_floatingip
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips return self.put(self.floatingip_path % (floatingip), body=body)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 99, in wrapper
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips ret = obj(*args, **kwargs)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips File "/usr/lib/python2.7/dist-packages/neutronclient/v2_0/client.py", line 361, in put
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips headers=headers, params=params)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 99, in wrapper
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips ret = obj(*args, **kwargs)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips File "/usr/lib/python2.7/dist-packages/neutronclient/v2_0/client.py", line 329, in retry_request
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips headers=headers, params=params)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 99, in wrapper
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips ret = obj(*args, **kwargs)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips File "/usr/lib/python2.7/dist-packages/neutronclient/v2_0/client.py", line 280, in do_request
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips resp, replybody = self.httpclient.do_request(action, method, body=body)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips File "/usr/lib/python2.7/dist-packages/neutronclient/client.py", line 342, in do_request
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips return self.request(url, method, **kwargs)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips File "/usr/lib/python2.7/dist-packages/neutronclient/client.py", line 330, in request
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips resp = super(SessionClient, self).request(*args, **kwargs)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips File "/usr/lib/python2.7/dist-packages/keystoneauth1/adapter.py", line 192, in request
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips return self.session.request(url, method, **kwargs)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips File "/usr/lib/python2.7/dist-packages/positional/__init__.py", line 101, in inner
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips return wrapped(*args, **kwargs)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips File "/usr/lib/python2.7/dist-packages/keystoneauth1/session.py", line 703, in request
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips resp = send(
kwargs)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating
ips File "/usr/lib/python2.7/dist-packages/keystoneauth1/session.py", line 768, in sendrequest
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floatingips raise exceptions.ConnectTimeout(msg)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating
ips ConnectTimeout: Request to https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c timed out

Nova compute API properly running with wsgi under apache2? And nova metadata API running under the nova-api process?
Yea, must be otherwise they would fail to bind.

Best regards
Tobias

On 11/14/2017 09:28 AM, Tobias Urdin wrote:
Trying to trace this, tempest calls the POST /servers//action API endpoint for the nova compute api.

https://github.com/openstack/tempest/blob/master/tempest/lib/services/compute/floating_ips_client.py#L82

Nova then takes the requests and tries to do this floating ip association using the neutron server api.

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/nova/nova-api.txt.gz

2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips [req-7f810cc7-a498-4bf4-b27e-8fc80d652785 42526a28b1a14c629b83908b2d75c647 2493426e6a3c4253a60c0b7eb35cfe19 - default default] Unable to associate floating IP 172.24.5.17 to fixed IP 10.100.0.8 for instance d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c timed out: ConnectTimeout: Request to https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c timed out

Checking that timestamp in the neutron-server logs:
http://paste.openstack.org/show/626240/

We can see that during this timestamp right before at 23:12:30.377 and then after 23:12:35.611 everything seems to be doing fine.
So there is some connectivity issues to the neutron API from where the Nova API is running causing a timeout.

Now some more questions would be:

  • Why is the return code 400? Are we being fooled or is it actually a connection timeout.
  • Is the Neutron API stuck causing the failed connection? All talk are done over loopback so chance of a problem there is very low.
  • Any firewall catching this? Not likely since the agent processes requests right before and after.

I can't find anything interesting in the overall other system logs that could explain that.
Back to the logs!

Best regards
Tobias

On 11/14/2017 08:35 AM, Tobias Urdin wrote:

Hello,

Same here, I will continue looking at this aswell.

Would be great if we could get some input from a neutron dev with good insight into the project.

Can we backtrace the timed out message from where it's thrown/returned.

Error: Request to https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c timed out', u'code': 400}

I'm interested why we would get 400 code back, the floating ip operations should be async right so this would be the response from the API layer with information from the database that should

return more information.

Best regards

On 11/14/2017 07:41 AM, Arnaud MORIN wrote:
Hey,
Do you know if the bug appears on a specific Ubuntu / openstack version?
As far as I remember it was not related to the puppet branch? I mean the bug is existing on master but also on newton puppet branches, right?

We are using Ubuntu in my company so we would love to see that continue ;)
I'll try to take a look again.

Cheers.

Le 14 nov. 2017 00:00, "Mohammed Naser" mnaser@vexxhost.com a écrit :
Hi,

Hope that everyone had safe travels and enjoyed their time at Sydney
(and those who weren't there enjoyed a bit of quiet time!). I'm just
sending this email if anyone had a chance to look more into this (or
perhaps we can get some help if there are any Canonical folks on the
list?)

I would be really sad if we had to drop Ubuntu/Debian support because
we cannot test it. I think there are a lot of users out there using
it! I'd be more than happy to provide any assistance/information in
troubleshooting this.

Thank you,
Mohammed

On Thu, Nov 2, 2017 at 1:10 PM, Mohammed Naser mnaser@vexxhost.com wrote:
On Thu, Nov 2, 2017 at 1:02 PM, Tobias Urdin tobias.urdin@crystone.com wrote:

I've been staring at this for almost an hour now going through all the logs
and I can't really pin a point from

where that error message is generated. Cannot find any references for the
timed out message that the API returns or the unable to associate part.

What I'm currently staring at is why would the instance fixed ip 172.24.5.17
be references as a network:router_gateway port in the OVS agent logs.

2017-10-29 23:19:27.591 11856 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovsneutronagent
[req-7274c6f7-18ef-420d-ad5a-9d0fe4eb35c6 - - - - -] Port
053a625c-4227-41fb-9a26-45eda7bd2055 updated. Details: {'profile': {},
'networkqospolicyid': None, 'qospolicyid': None,
'allowed
addresspairs': [], 'adminstateup': True, 'networkid':
'f9647756-41ad-4ec5-af49-daefe410815e', 'segmentationid': None,
'fixed
ips': [{'subnetid': 'a31c7115-1f3e-4220-8bdb-981b6df2e18c',
'ip
address': '172.24.5.17'}], 'deviceowner': u'network:routergateway',
'physicalnetwork': u'external', 'macaddress': 'fa:16:3e:3b:ec:c3',
'device': u'053a625c-4227-41fb-9a26-45eda7bd2055', 'portsecurityenabled':
False, 'portid': '053a625c-4227-41fb-9a26-45eda7bd2055', 'networktype':
u'flat', 'security_groups': []}

Anybody else seen anything interesting?

Hi Tobias,

Thanks for looking out. I've spent a lot of time and I haven't
successfully identified much, I can add the following

  • This issue is intermittent in CI
  • It does not happen on any specific providers, I've seen it fail on
    both OVH and Rackspace.
  • Not all floating iP assignments fail, if you look at the logs, you
    can see it attach a few successfully before failing

But yeah, I'm still quite at a loss and not having this coverage isn't fun.

On 10/30/2017 11:08 PM, Brian Haley wrote:

On 10/30/2017 05:46 PM, Matthew Treinish wrote:

From a quick glance at the logs my guess is that the issue is related
to this stack trace in the l3 agent logs:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=TRACE#_2017-10-29_23_11_15_146

I'm not sure what's causing it to complain there. But, I'm on a plane
right now (which is why this is a top post, sorry) so I can't really dig
much more than that. I'll try to take a deeper look at things later when
I'm on solid ground. (hopefully someone will beat me to it by then though)

I don't think that l3-agent trace is it, as the failure is coming from
the API. It's actually a trace that's happening due to the async nature
of how the agent runs arping, fix is
https://review.openstack.org/#/c/507914/ but it only removes the log noise.

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-server.txt.gz
has some tracebacks that look config related, possible missing DB table?
But I haven't looked very closely.

-Brian

On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser
mnaser@vexxhost.com wrote:

Hi everyone,

I'm looking for some help regarding an issue that we're having with
the Puppet OpenStack modules, we've had very inconsistent failures in
the Xenial with the following error:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/testr_results.html.gz
Details: {u'message': u'Unable to associate floating IP
172.24.5.17 to fixed IP10.100.0.8
for instance
d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to

https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
timed out', u'code': 400}

At this point, we're at a bit of a loss.  I've tried my best in order
to find the root cause however we have not been able to do this.  It
was persistent enough that we elected to go non-voting for our Xenial
gates, however, with no fix ahead of us, I feel like this is a waste
of resources and we need to either fix this or drop CI for Ubuntu.  We
don't deploy on Ubuntu and most of the developers working on the
project don't either at this point, so we need a bit of resources.

If you're a user of Puppet on Xenial, we need your help!  Without any
resources going to fix this, we'd unfortunately have to drop support
for Ubuntu because of the lack of resources to maintain it (or
assistance).  We (Puppet OpenStack team) would be more than happy to
work together to fix this so pop-in at #puppet-openstack or reply to
this email and let's get this issue fixed.

Thanks,
Mohammed

------------------------------------------------------------------------

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:

OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 14, 2017 by Tobias_Urdin (1,300 points)   1
...