Did you know where the packet dropped? On physical interface / tap device / ovs port or in the vm.
We hit udp packet loss when the large pps. The following things you may double check:
- Double check if your physical interface is dropping packet. Usually if you rx queue ring size or rx queue number is default value ,it will drop udp packet . It will start to drop packet if it reach to about 200kpps in one cpu core(rss will distribute traffic to different core, for one single core, it will drop packet about 200kpps in my exp).
Usually you can get the statics from ethtool -S interface to check if there is packet loss because of rx queue full. And use ethtool to increase your ring size. I tested in my environment that if ring size increase from 512 to 4096, it can double the throughput from 200kpps to 400kpps in one cpu core. This may help in some case.
Double check if your TAP device dropped packet, the default tx_queue length is 500 or 1000, increase it to 10000 may help in some case.
Double check your nfconntrackmax in compute node and network node, the default value is 65535, in our case it usually reach to 500k-1m . we change it as following:
if you see , something like “nf_conntrack: table full, dropping packet” in your /var/log/message log, that means you hit this one.
You could check if drop happened inside your vm, increase the following param maybe help in some case:
net.core.rmemmax / net.core.rmemdefault / net.core.wmemmax / net.core.rmemdefault
- If you are using default network driver(virtio-net), you can double check if your vhost of your vm is full with CPU soft irq. You can find it by the process name is vhost-$PIDOFYOUR_VM . In this case, if you can try the following feature in “L”:
multi-queue may help you some case, but it will use more vhost and more cpu in your host.
- Sometimes cpu numa pin can also help, but you need to reserve them and static plan you cpu.
I think we should figure out the packet lost in where and which is the bottleneck. Hope this help, John.
发件人: John Petrini email@example.com
日期: 2017年7月28日 星期五 03:35
至: Pedro Sousa firstname.lastname@example.org, OpenStack Mailing List email@example.com, "firstname.lastname@example.org" email@example.com
主题: Re: [Openstack] [Openstack-operators] UDP Buffer Filling
Thank you for the suggestion. I will look into this.
Platforms Engineer // CoreDial, LLC // coredial.com // [witter] https://twitter.com/coredial [inkedIn] [oogle Plus] https://plus.google.com/104062177220750809525/posts [log]
751 Arbor Way, Hillcrest I, Suite 150, Blue Bell, PA 19422
P: 215.297.4400 x232 // F: 215.297.4401 // E: firstname.lastname@example.org
On Thu, Jul 27, 2017 at 12:25 PM, Pedro Sousa email@example.com wrote:
have you considered to implement some network acceleration technique like to OVS-DPDK or SR-IOV?
In these kind of workloads (voice, video) that have low latency requirements you might need to use something like DPDK to avoid these issues.
On Thu, Jul 27, 2017 at 4:49 PM, John Petrini firstname.lastname@example.org wrote:
We are running Mitaka with VLAN provider networking. We've recently encountered a problem where the UDP receive queue on instances is filling up and we begin dropping packets. Moving instances out of OpenStack onto bare metal resolves the issue completely.
These instances are running asterisk which should be pulling these packets off the queue but it appears to be falling behind no matter the resources we give it.
We can't seem to pin down a reason why we would see this behavior in KVM but not on metal. I'm hoping someone on the list might have some insight or ideas.
OpenStack-operators mailing list
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : email@example.com
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack