settingsLogin | Registersettings

[openstack-dev] [neutron]OVS connection tracking cleanup

0 votes

Hi
I am performing a scale test and I see that after creating 500 Vms with ping traffic between them it took almost 1 hr for the connection tracking
To clean up and ovs agent was busy doing this and unable to service any new port bind requests on some computes for almost an hr
It took that long for conntrack clean up to complete

I see the following bug
https://bugs.launchpad.net/neutron/+bug/1513765

And I also have the fix below
https://git.openstack.org/cgit/openstack/neutron/commit/?id=d7aeb8dd4b1d122e17eef8687192cd122b79fd6e

Still see really long times for conntrack cleanup

What is the solution to this problem in scale scenarios?
Ajay


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Sep 28, 2017 in openstack-dev by Ajay_Kalambur_(akala (2,680 points)   7 14

8 Responses

0 votes

The biggest improvement will be switching to native netlink calls:
https://review.openstack.org/#/c/470912/

How many VMs were on a single compute node?

On Mon, Sep 11, 2017 at 9:15 AM, Ajay Kalambur (akalambu) <
akalambu@cisco.com> wrote:

Hi
I am performing a scale test and I see that after creating 500 Vms with
ping traffic between them it took almost 1 hr for the connection tracking
To clean up and ovs agent was busy doing this and unable to service any
new port bind requests on some computes for almost an hr
It took that long for conntrack clean up to complete

I see the following bug
https://bugs.launchpad.net/neutron/+bug/1513765

And I also have the fix below
https://git.openstack.org/cgit/openstack/neutron/commit/?id=
d7aeb8dd4b1d122e17eef8687192cd122b79fd6e

Still see really long times for conntrack cleanup

What is the solution to this problem in scale scenarios?
Ajay


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 11, 2017 by kevin_at_benton.pub (15,600 points)   2 3 4
0 votes

Hi Kevin
Thanks for your response it was about 50 vms
Ajay

On Sep 11, 2017, at 9:49 AM, Kevin Benton kevin@benton.pub wrote:

The biggest improvement will be switching to native netlink calls: https://review.openstack.org/#/c/470912/

How many VMs were on a single compute node?

On Mon, Sep 11, 2017 at 9:15 AM, Ajay Kalambur (akalambu) akalambu@cisco.com wrote:
Hi
I am performing a scale test and I see that after creating 500 Vms with ping traffic between them it took almost 1 hr for the connection tracking
To clean up and ovs agent was busy doing this and unable to service any new port bind requests on some computes for almost an hr
It took that long for conntrack clean up to complete

I see the following bug
https://bugs.launchpad.net/neutron/+bug/1513765

And I also have the fix below
https://git.openstack.org/cgit/openstack/neutron/commit/?id=d7aeb8dd4b1d122e17eef8687192cd122b79fd6e

Still see really long times for conntrack cleanup

What is the solution to this problem in scale scenarios?
Ajay


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 11, 2017 by Ajay_Kalambur_(akala (2,680 points)   7 14
0 votes

Hi Kevin
The information you asked for
For 1 compute node with 45 Vms here is the number of connection tracking entries getting deleted
cat conntrack.file | wc -l
38528

The file with output is 14MB so ill email it to Ian and he can share it if needed

Security group rules
Direction Ether Type IP Protocol Port Range Remote IP Prefix Remote Security Group Actions
Egress IPv4 Any Any 0.0.0.0/0
Ingress IPv6 Any Any - default
Egress IPv6 Any Any ::/0 -
Ingress IPv4 Any Any -

Please let me know if u need the dump of conntrack entries if so I can email it to email address of your choice

Ajay

From: Ajay Kalambur akalambu@cisco.com
Reply-To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Date: Monday, September 11, 2017 at 10:02 AM
To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [neutron]OVS connection tracking cleanup

Hi Kevin
Thanks for your response it was about 50 vms
Ajay

On Sep 11, 2017, at 9:49 AM, Kevin Benton kevin@benton.pub wrote:

The biggest improvement will be switching to native netlink calls: https://review.openstack.org/#/c/470912/

How many VMs were on a single compute node?

On Mon, Sep 11, 2017 at 9:15 AM, Ajay Kalambur (akalambu) akalambu@cisco.com wrote:
Hi
I am performing a scale test and I see that after creating 500 Vms with ping traffic between them it took almost 1 hr for the connection tracking
To clean up and ovs agent was busy doing this and unable to service any new port bind requests on some computes for almost an hr
It took that long for conntrack clean up to complete

I see the following bug
https://bugs.launchpad.net/neutron/+bug/1513765

And I also have the fix below
https://git.openstack.org/cgit/openstack/neutron/commit/?id=d7aeb8dd4b1d122e17eef8687192cd122b79fd6e

Still see really long times for conntrack cleanup

What is the solution to this problem in scale scenarios?
Ajay


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 11, 2017 by Ajay_Kalambur_(akala (2,680 points)   7 14
0 votes

Can you start a bug on launchpad and upload the conntrack attachment to the
bug?

Switching to the rootwrap daemon should also help significantly.

On Mon, Sep 11, 2017 at 12:32 PM, Ajay Kalambur (akalambu) <
akalambu@cisco.com> wrote:

Hi Kevin
The information you asked for
For 1 compute node with 45 Vms here is the number of connection tracking
entries getting deleted
cat conntrack.file | wc -l
38528

The file with output is 14MB so ill email it to Ian and he can share it if
needed

Security group rules
Direction Ether Type IP Protocol Port Range Remote IP Prefix Remote
Security Group Actions
Egress IPv4 Any Any 0.0.0.0/0
Ingress IPv6 Any Any - default
Egress IPv6 Any Any ::/0 -
Ingress IPv4 Any Any -

Please let me know if u need the dump of conntrack entries if so I can
email it to email address of your choice

Ajay

From: Ajay Kalambur akalambu@cisco.com
Reply-To: "OpenStack Development Mailing List (not for usage questions)" <
openstack-dev@lists.openstack.org>
Date: Monday, September 11, 2017 at 10:02 AM
To: "OpenStack Development Mailing List (not for usage questions)" <
openstack-dev@lists.openstack.org>
Subject: Re: [openstack-dev] [neutron]OVS connection tracking cleanup

Hi Kevin
Thanks for your response it was about 50 vms
Ajay

On Sep 11, 2017, at 9:49 AM, Kevin Benton kevin@benton.pub wrote:

The biggest improvement will be switching to native netlink calls:
https://review.openstack.org/#/c/470912/

How many VMs were on a single compute node?

On Mon, Sep 11, 2017 at 9:15 AM, Ajay Kalambur (akalambu) <
akalambu@cisco.com> wrote:

Hi
I am performing a scale test and I see that after creating 500 Vms with
ping traffic between them it took almost 1 hr for the connection tracking
To clean up and ovs agent was busy doing this and unable to service any
new port bind requests on some computes for almost an hr
It took that long for conntrack clean up to complete

I see the following bug
https://bugs.launchpad.net/neutron/+bug/1513765

And I also have the fix below
https://git.openstack.org/cgit/openstack/neutron/commit/?id=
d7aeb8dd4b1d122e17eef8687192cd122b79fd6e

Still see really long times for conntrack cleanup

What is the solution to this problem in scale scenarios?
Ajay



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscrib
e
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 11, 2017 by kevin_at_benton.pub (15,600 points)   2 3 4
0 votes

Hi Kevin
Sure will log a bug
Also does the config change involve having both these lines in the neutron.conf file?
[agent]
roothelper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
root
helper_daemon = sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf

If I have only the second line I see the exception below on neutron openvswitch agent bring up:

2017-09-12 09:23:03.633 35 DEBUG neutron.agent.linux.utils [req-0f8fe685-66bd-44d7-beac-bb4c24f0ccfa - - - - -] Running command: ['ps', '--ppid', '103', '-o', 'pid='] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
2017-09-12 09:23:03.762 35 ERROR ryu.lib.hub [req-0f8fe685-66bd-44d7-beac-bb4c24f0ccfa - - - - -] hub: uncaught exception: Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 54, in _launch
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_ryuapp.py", line 42, in agent_main_wrapper
ovs_agent.main(bridge_classes)
File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2184, in main
agent.daemon_loop()
File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 154, in wrapper
return f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2100, in daemon_loop
self.ovsdb_monitor_respawn_interval) as pm:
File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/polling.py", line 35, in get_polling_manager
pm.start()
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/polling.py", line 57, in start
while not self.is_active():
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py", line 100, in is_active
self.pid, self.cmd_without_namespace)
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py", line 159, in pid
run_as_root=self.run_as_root)
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 297, in get_root_helper_child_pid
pid = find_child_pids(pid)[0]
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 179, in find_child_pids
log_fail_as_error=False)
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 128, in execute
_stdout, _stderr = obj.communicate(_process_input)
File "/usr/lib64/python2.7/subprocess.py", line 800, in communicate
return self._communicate(input)
File "/usr/lib64/python2.7/subprocess.py", line 1403, in _communicate
stdout, stderr = self._communicate_with_select(input)
File "/usr/lib64/python2.7/subprocess.py", line 1504, in _communicate_with_select
rlist, wlist, xlist = select.select(read_set, write_set, [])
File "/usr/lib/python2.7/site-packages/eventlet/green/select.py", line 86, in select
return hub.switch()
File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch
return self.greenlet.switch()
Timeout: 5 seconds

2017-09-12 09:23:03.860 35 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=95

Ajay

From: Kevin Benton kevin@benton.pub
Reply-To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Date: Monday, September 11, 2017 at 1:12 PM
To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Cc: "Ian Wells (iawells)" iawells@cisco.com
Subject: Re: [openstack-dev] [neutron]OVS connection tracking cleanup

Can you start a bug on launchpad and upload the conntrack attachment to the bug?

Switching to the rootwrap daemon should also help significantly.

On Mon, Sep 11, 2017 at 12:32 PM, Ajay Kalambur (akalambu) akalambu@cisco.com wrote:
Hi Kevin
The information you asked for
For 1 compute node with 45 Vms here is the number of connection tracking entries getting deleted
cat conntrack.file | wc -l
38528

The file with output is 14MB so ill email it to Ian and he can share it if needed

Security group rules
DirectionEther TypeIP ProtocolPort RangeRemote IP PrefixRemote Security GroupActions
EgressIPv4AnyAny0.0.0.0/0
IngressIPv6AnyAny-default
EgressIPv6AnyAny::/0-
IngressIPv4AnyAny-

Please let me know if u need the dump of conntrack entries if so I can email it to email address of your choice

Ajay

From: Ajay Kalambur akalambu@cisco.com
Reply-To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Date: Monday, September 11, 2017 at 10:02 AM
To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [neutron]OVS connection tracking cleanup

Hi Kevin
Thanks for your response it was about 50 vms
Ajay

On Sep 11, 2017, at 9:49 AM, Kevin Benton kevin@benton.pub wrote:

The biggest improvement will be switching to native netlink calls: https://review.openstack.org/#/c/470912/

How many VMs were on a single compute node?

On Mon, Sep 11, 2017 at 9:15 AM, Ajay Kalambur (akalambu) akalambu@cisco.com wrote:
Hi
I am performing a scale test and I see that after creating 500 Vms with ping traffic between them it took almost 1 hr for the connection tracking
To clean up and ovs agent was busy doing this and unable to service any new port bind requests on some computes for almost an hr
It took that long for conntrack clean up to complete

I see the following bug
https://bugs.launchpad.net/neutron/+bug/1513765

And I also have the fix below
https://git.openstack.org/cgit/openstack/neutron/commit/?id=d7aeb8dd4b1d122e17eef8687192cd122b79fd6e

Still see really long times for conntrack cleanup

What is the solution to this problem in scale scenarios?
Ajay


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 12, 2017 by Ajay_Kalambur_(akala (2,680 points)   7 14
0 votes

Looks like I need both lines. I now see conntrack deletes happening with rootwrap daemon, Kevin yes the performance seems much better but still lots of entries to be deleted and hence port bind blocks. Any way we can perform conntrack cleanup in separate thread

Ajay

From: Ajay Kalambur akalambu@cisco.com
Reply-To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Date: Tuesday, September 12, 2017 at 9:30 AM
To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Cc: "Ian Wells (iawells)" iawells@cisco.com
Subject: Re: [openstack-dev] [neutron]OVS connection tracking cleanup

Hi Kevin
Sure will log a bug
Also does the config change involve having both these lines in the neutron.conf file?
[agent]
roothelper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
root
helper_daemon = sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf

If I have only the second line I see the exception below on neutron openvswitch agent bring up:

2017-09-12 09:23:03.633 35 DEBUG neutron.agent.linux.utils [req-0f8fe685-66bd-44d7-beac-bb4c24f0ccfa - - - - -] Running command: ['ps', '--ppid', '103', '-o', 'pid='] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
2017-09-12 09:23:03.762 35 ERROR ryu.lib.hub [req-0f8fe685-66bd-44d7-beac-bb4c24f0ccfa - - - - -] hub: uncaught exception: Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 54, in _launch
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_ryuapp.py", line 42, in agent_main_wrapper
ovs_agent.main(bridge_classes)
File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2184, in main
agent.daemon_loop()
File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 154, in wrapper
return f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2100, in daemon_loop
self.ovsdb_monitor_respawn_interval) as pm:
File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/polling.py", line 35, in get_polling_manager
pm.start()
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/polling.py", line 57, in start
while not self.is_active():
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py", line 100, in is_active
self.pid, self.cmd_without_namespace)
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py", line 159, in pid
run_as_root=self.run_as_root)
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 297, in get_root_helper_child_pid
pid = find_child_pids(pid)[0]
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 179, in find_child_pids
log_fail_as_error=False)
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 128, in execute
_stdout, _stderr = obj.communicate(_process_input)
File "/usr/lib64/python2.7/subprocess.py", line 800, in communicate
return self._communicate(input)
File "/usr/lib64/python2.7/subprocess.py", line 1403, in _communicate
stdout, stderr = self._communicate_with_select(input)
File "/usr/lib64/python2.7/subprocess.py", line 1504, in _communicate_with_select
rlist, wlist, xlist = select.select(read_set, write_set, [])
File "/usr/lib/python2.7/site-packages/eventlet/green/select.py", line 86, in select
return hub.switch()
File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch
return self.greenlet.switch()
Timeout: 5 seconds

2017-09-12 09:23:03.860 35 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=95

Ajay

From: Kevin Benton kevin@benton.pub
Reply-To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Date: Monday, September 11, 2017 at 1:12 PM
To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Cc: "Ian Wells (iawells)" iawells@cisco.com
Subject: Re: [openstack-dev] [neutron]OVS connection tracking cleanup

Can you start a bug on launchpad and upload the conntrack attachment to the bug?

Switching to the rootwrap daemon should also help significantly.

On Mon, Sep 11, 2017 at 12:32 PM, Ajay Kalambur (akalambu) akalambu@cisco.com wrote:
Hi Kevin
The information you asked for
For 1 compute node with 45 Vms here is the number of connection tracking entries getting deleted
cat conntrack.file | wc -l
38528

The file with output is 14MB so ill email it to Ian and he can share it if needed

Security group rules
DirectionEther TypeIP ProtocolPort RangeRemote IP PrefixRemote Security GroupActions
EgressIPv4AnyAny0.0.0.0/0
IngressIPv6AnyAny-default
EgressIPv6AnyAny::/0-
IngressIPv4AnyAny-

Please let me know if u need the dump of conntrack entries if so I can email it to email address of your choice

Ajay

From: Ajay Kalambur akalambu@cisco.com
Reply-To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Date: Monday, September 11, 2017 at 10:02 AM
To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [neutron]OVS connection tracking cleanup

Hi Kevin
Thanks for your response it was about 50 vms
Ajay

On Sep 11, 2017, at 9:49 AM, Kevin Benton kevin@benton.pub wrote:

The biggest improvement will be switching to native netlink calls: https://review.openstack.org/#/c/470912/

How many VMs were on a single compute node?

On Mon, Sep 11, 2017 at 9:15 AM, Ajay Kalambur (akalambu) akalambu@cisco.com wrote:
Hi
I am performing a scale test and I see that after creating 500 Vms with ping traffic between them it took almost 1 hr for the connection tracking
To clean up and ovs agent was busy doing this and unable to service any new port bind requests on some computes for almost an hr
It took that long for conntrack clean up to complete

I see the following bug
https://bugs.launchpad.net/neutron/+bug/1513765

And I also have the fix below
https://git.openstack.org/cgit/openstack/neutron/commit/?id=d7aeb8dd4b1d122e17eef8687192cd122b79fd6e

Still see really long times for conntrack cleanup

What is the solution to this problem in scale scenarios?
Ajay


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 12, 2017 by Ajay_Kalambur_(akala (2,680 points)   7 14
0 votes

Also the weird part with this conntrack deletion I perform a conntrack –L to view the table I see no entry for any of the entries its trying to delete. Those entries are all removed anyways when Vms are cleaned up from the look of it. So it looks like all those conntrack deletions were pretty much no-ops
Ajay

From: Ajay Kalambur akalambu@cisco.com
Date: Tuesday, September 12, 2017 at 9:30 AM
To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Cc: "Ian Wells (iawells)" iawells@cisco.com
Subject: Re: [openstack-dev] [neutron]OVS connection tracking cleanup

Hi Kevin
Sure will log a bug
Also does the config change involve having both these lines in the neutron.conf file?
[agent]
roothelper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
root
helper_daemon = sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf

If I have only the second line I see the exception below on neutron openvswitch agent bring up:

2017-09-12 09:23:03.633 35 DEBUG neutron.agent.linux.utils [req-0f8fe685-66bd-44d7-beac-bb4c24f0ccfa - - - - -] Running command: ['ps', '--ppid', '103', '-o', 'pid='] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
2017-09-12 09:23:03.762 35 ERROR ryu.lib.hub [req-0f8fe685-66bd-44d7-beac-bb4c24f0ccfa - - - - -] hub: uncaught exception: Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 54, in _launch
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_ryuapp.py", line 42, in agent_main_wrapper
ovs_agent.main(bridge_classes)
File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2184, in main
agent.daemon_loop()
File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 154, in wrapper
return f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2100, in daemon_loop
self.ovsdb_monitor_respawn_interval) as pm:
File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/polling.py", line 35, in get_polling_manager
pm.start()
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/polling.py", line 57, in start
while not self.is_active():
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py", line 100, in is_active
self.pid, self.cmd_without_namespace)
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py", line 159, in pid
run_as_root=self.run_as_root)
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 297, in get_root_helper_child_pid
pid = find_child_pids(pid)[0]
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 179, in find_child_pids
log_fail_as_error=False)
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 128, in execute
_stdout, _stderr = obj.communicate(_process_input)
File "/usr/lib64/python2.7/subprocess.py", line 800, in communicate
return self._communicate(input)
File "/usr/lib64/python2.7/subprocess.py", line 1403, in _communicate
stdout, stderr = self._communicate_with_select(input)
File "/usr/lib64/python2.7/subprocess.py", line 1504, in _communicate_with_select
rlist, wlist, xlist = select.select(read_set, write_set, [])
File "/usr/lib/python2.7/site-packages/eventlet/green/select.py", line 86, in select
return hub.switch()
File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch
return self.greenlet.switch()
Timeout: 5 seconds

2017-09-12 09:23:03.860 35 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=95

Ajay

From: Kevin Benton kevin@benton.pub
Reply-To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Date: Monday, September 11, 2017 at 1:12 PM
To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Cc: "Ian Wells (iawells)" iawells@cisco.com
Subject: Re: [openstack-dev] [neutron]OVS connection tracking cleanup

Can you start a bug on launchpad and upload the conntrack attachment to the bug?

Switching to the rootwrap daemon should also help significantly.

On Mon, Sep 11, 2017 at 12:32 PM, Ajay Kalambur (akalambu) akalambu@cisco.com wrote:
Hi Kevin
The information you asked for
For 1 compute node with 45 Vms here is the number of connection tracking entries getting deleted
cat conntrack.file | wc -l
38528

The file with output is 14MB so ill email it to Ian and he can share it if needed

Security group rules
DirectionEther TypeIP ProtocolPort RangeRemote IP PrefixRemote Security GroupActions
EgressIPv4AnyAny0.0.0.0/0
IngressIPv6AnyAny-default
EgressIPv6AnyAny::/0-
IngressIPv4AnyAny-

Please let me know if u need the dump of conntrack entries if so I can email it to email address of your choice

Ajay

From: Ajay Kalambur akalambu@cisco.com
Reply-To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Date: Monday, September 11, 2017 at 10:02 AM
To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [neutron]OVS connection tracking cleanup

Hi Kevin
Thanks for your response it was about 50 vms
Ajay

On Sep 11, 2017, at 9:49 AM, Kevin Benton kevin@benton.pub wrote:

The biggest improvement will be switching to native netlink calls: https://review.openstack.org/#/c/470912/

How many VMs were on a single compute node?

On Mon, Sep 11, 2017 at 9:15 AM, Ajay Kalambur (akalambu) akalambu@cisco.com wrote:
Hi
I am performing a scale test and I see that after creating 500 Vms with ping traffic between them it took almost 1 hr for the connection tracking
To clean up and ovs agent was busy doing this and unable to service any new port bind requests on some computes for almost an hr
It took that long for conntrack clean up to complete

I see the following bug
https://bugs.launchpad.net/neutron/+bug/1513765

And I also have the fix below
https://git.openstack.org/cgit/openstack/neutron/commit/?id=d7aeb8dd4b1d122e17eef8687192cd122b79fd6e

Still see really long times for conntrack cleanup

What is the solution to this problem in scale scenarios?
Ajay


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 27, 2017 by Ajay_Kalambur_(akala (2,680 points)   7 14
0 votes

It looks like the conntrack deletion can be skipped for port deletion no?
On bulk deletes of lot of Vms the entries that were deleted never existed in conntrack table

From looking the patch below seems to go along those lines
https://review.openstack.org/#/c/243994/

Is there a plan to distinguish between port deletes and port updates when it comes to conntrack rule deletions because in a scale scenario on OVS VLAN this is really a blocker for back to back scale tests being run

From: Ajay Kalambur akalambu@cisco.com
Reply-To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Date: Wednesday, September 27, 2017 at 4:42 PM
To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Cc: "Ian Wells (iawells)" iawells@cisco.com
Subject: Re: [openstack-dev] [neutron]OVS connection tracking cleanup

Also the weird part with this conntrack deletion I perform a conntrack –L to view the table I see no entry for any of the entries its trying to delete. Those entries are all removed anyways when Vms are cleaned up from the look of it. So it looks like all those conntrack deletions were pretty much no-ops
Ajay

From: Ajay Kalambur akalambu@cisco.com
Date: Tuesday, September 12, 2017 at 9:30 AM
To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Cc: "Ian Wells (iawells)" iawells@cisco.com
Subject: Re: [openstack-dev] [neutron]OVS connection tracking cleanup

Hi Kevin
Sure will log a bug
Also does the config change involve having both these lines in the neutron.conf file?
[agent]
roothelper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
root
helper_daemon = sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf

If I have only the second line I see the exception below on neutron openvswitch agent bring up:

2017-09-12 09:23:03.633 35 DEBUG neutron.agent.linux.utils [req-0f8fe685-66bd-44d7-beac-bb4c24f0ccfa - - - - -] Running command: ['ps', '--ppid', '103', '-o', 'pid='] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
2017-09-12 09:23:03.762 35 ERROR ryu.lib.hub [req-0f8fe685-66bd-44d7-beac-bb4c24f0ccfa - - - - -] hub: uncaught exception: Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 54, in _launch
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_ryuapp.py", line 42, in agent_main_wrapper
ovs_agent.main(bridge_classes)
File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2184, in main
agent.daemon_loop()
File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 154, in wrapper
return f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2100, in daemon_loop
self.ovsdb_monitor_respawn_interval) as pm:
File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/polling.py", line 35, in get_polling_manager
pm.start()
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/polling.py", line 57, in start
while not self.is_active():
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py", line 100, in is_active
self.pid, self.cmd_without_namespace)
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py", line 159, in pid
run_as_root=self.run_as_root)
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 297, in get_root_helper_child_pid
pid = find_child_pids(pid)[0]
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 179, in find_child_pids
log_fail_as_error=False)
File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 128, in execute
_stdout, _stderr = obj.communicate(_process_input)
File "/usr/lib64/python2.7/subprocess.py", line 800, in communicate
return self._communicate(input)
File "/usr/lib64/python2.7/subprocess.py", line 1403, in _communicate
stdout, stderr = self._communicate_with_select(input)
File "/usr/lib64/python2.7/subprocess.py", line 1504, in _communicate_with_select
rlist, wlist, xlist = select.select(read_set, write_set, [])
File "/usr/lib/python2.7/site-packages/eventlet/green/select.py", line 86, in select
return hub.switch()
File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch
return self.greenlet.switch()
Timeout: 5 seconds

2017-09-12 09:23:03.860 35 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=95

Ajay

From: Kevin Benton kevin@benton.pub
Reply-To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Date: Monday, September 11, 2017 at 1:12 PM
To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Cc: "Ian Wells (iawells)" iawells@cisco.com
Subject: Re: [openstack-dev] [neutron]OVS connection tracking cleanup

Can you start a bug on launchpad and upload the conntrack attachment to the bug?

Switching to the rootwrap daemon should also help significantly.

On Mon, Sep 11, 2017 at 12:32 PM, Ajay Kalambur (akalambu) akalambu@cisco.com wrote:
Hi Kevin
The information you asked for
For 1 compute node with 45 Vms here is the number of connection tracking entries getting deleted
cat conntrack.file | wc -l
38528

The file with output is 14MB so ill email it to Ian and he can share it if needed

Security group rules
DirectionEther TypeIP ProtocolPort RangeRemote IP PrefixRemote Security GroupActions
EgressIPv4AnyAny0.0.0.0/0
IngressIPv6AnyAny-default
EgressIPv6AnyAny::/0-
IngressIPv4AnyAny-

Please let me know if u need the dump of conntrack entries if so I can email it to email address of your choice

Ajay

From: Ajay Kalambur akalambu@cisco.com
Reply-To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Date: Monday, September 11, 2017 at 10:02 AM
To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [neutron]OVS connection tracking cleanup

Hi Kevin
Thanks for your response it was about 50 vms
Ajay

On Sep 11, 2017, at 9:49 AM, Kevin Benton kevin@benton.pub wrote:

The biggest improvement will be switching to native netlink calls: https://review.openstack.org/#/c/470912/

How many VMs were on a single compute node?

On Mon, Sep 11, 2017 at 9:15 AM, Ajay Kalambur (akalambu) akalambu@cisco.com wrote:
Hi
I am performing a scale test and I see that after creating 500 Vms with ping traffic between them it took almost 1 hr for the connection tracking
To clean up and ovs agent was busy doing this and unable to service any new port bind requests on some computes for almost an hr
It took that long for conntrack clean up to complete

I see the following bug
https://bugs.launchpad.net/neutron/+bug/1513765

And I also have the fix below
https://git.openstack.org/cgit/openstack/neutron/commit/?id=d7aeb8dd4b1d122e17eef8687192cd122b79fd6e

Still see really long times for conntrack cleanup

What is the solution to this problem in scale scenarios?
Ajay


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 28, 2017 by Ajay_Kalambur_(akala (2,680 points)   7 14
...