settingsLogin | Registersettings

Re: [Openstack] soft lockup on Newton compute nodes

0 votes

===== UPDATE 10/23 ======

we have been trying different things to get better debug we disabled
rate-limiting in order to get better info in /var/log/message. for some
reason (maybe unrelated) we didn't get the soft lockup during this test But
this time we got openvswitch, br_netfilter, etc in the call trace in
/var/log/messages

Please advise in any way! thx!!

basically we are running various types of SIP/RTP test traffic between 2
instances (on different compute nodes). This time instead of one hypervisor
getting the errors both hypervisors did, but neither got the soft lockup.

log snippetes below, full logs here:

www.jokken.com/downloads/node-68.txt

www.jokken.com/downloads/node-90.txt

node-68

2017-10-20T17:48:37.031741+00:00 node-68 rsyslogd-2177: imuxsock[pid 5085]:
40 messages lost due to rate-limiting

2017-10-20T17:58:36.281069+00:00 node-68 rsyslogd-2177: imuxsock[pid 5085]:
begin to drop messages due to rate-limiting

2017-10-20T17:58:37.548500+00:00 node-68 rsyslogd-2177: imuxsock[pid 5085]:
41 messages lost due to rate-limiting

2017-10-20T18:08:36.180377+00:00 node-68 rsyslogd-2177: imuxsock[pid 5085]:
begin to drop messages due to rate-limiting

2017-10-20T18:08:37.058861+00:00 node-68 rsyslogd-2177: imuxsock[pid 5085]:
40 messages lost due to rate-limiting

2017-10-20T18:18:36.175797+00:00 node-68 rsyslogd-2177: imuxsock[pid 5085]:
begin to drop messages due to rate-limiting

2017-10-20T18:18:37.583237+00:00 node-68 rsyslogd-2177: imuxsock[pid 5085]:
41 messages lost due to rate-limiting

2017-10-20T18:28:36.172090+00:00 node-68 rsyslogd-2177: imuxsock[pid 5085]:
begin to drop messages due to rate-limiting

2017-10-20T18:28:37.125346+00:00 node-68 rsyslogd-2177: imuxsock[pid 5085]:
40 messages lost due to rate-limiting

ps -aef | grep 5080

ceilome+ 5080 3502 0 Oct03 ? 01:32:57 ceilometer-polling - AgentManager(0)

2017-10-20T18:35:10.759230+00:00 node-68 rsyslogd: [origin
software="rsyslogd" swVersion="8.16.0" x-pid="3431" x-info="
http://www.rsyslog.com"] exiting on signal 15.

2017-10-20T18:35:10.790611+00:00 node-68 rsyslogd: [origin
software="rsyslogd" swVersion="8.16.0" x-pid="23851" x-info="
http://www.rsyslog.com"] start

2017-10-20T18:35:10.790395+00:00 node-68 rsyslogd: rsyslogd's groupid
changed to 108

2017-10-20T18:35:10.790455+00:00 node-68 rsyslogd: rsyslogd's userid
changed to 104

2017-10-20T18:35:10.790491+00:00 node-68 rsyslogd-2357: queue "action 0
queue": high water mark is set quite low at 8000. You should only set it
below 60% (600000) if you have a good reason for this. [v8.16.0 try
http://www.rsyslog.com/e/2357 ]

Test starts: Fri Oct 20 18:52:48 2017

2017-10-20T18:56:20.408532+00:00 node-68 kernel: [1458996.797708]
------------[ cut here ]------------

2017-10-20T18:56:20.408571+00:00 node-68 kernel: [1458996.797728] WARNING:
CPU: 27 PID: 0 at /build/linux-YyUNAI/linux-4.4.0/net/core/dev.c:2445
skbwarnbad_offload+0xd1/0x120()

2017-10-20T18:56:20.408574+00:00 node-68 kernel: [1458996.797732]
qvofd385f05-cb: caps=(0x00000184075b59e9, 0x0000000000000000) len=2636
datalen=2594 gsosize=1480 gsotype=6 ipsummed=0

2017-10-20T18:56:20.408576+00:00 node-68 kernel: [1458996.797735] Modules
linked in: bonding binfmtmisc nfconntracknetlink vhostnet vhost macvtap
macvlan xtmac xttcpudp xtphysdev brnetfilter xtset ipsethashnet
ipset nfnetlink veth ip6tableraw ebtablefilter ebtables openvswitch
ocfs2 quota
tree ocfs2dlmfs ocfs2stacko2cb ocfs2dlm ocfs2nodemanager
ocfs2
stackglue configfs ip6tablefilter ip6tables xtmultiport
xt
conntrack iptablefilter xtcomment xtCT iptableraw iptables xtables
xfs ipmissif bridge intelrapl x86pkgtempthermal intelpowerclamp
coretemp crct10difpclmul crc32pclmul ghashclmulniintel aesniintel
aes
x8664 lrw gf128mul gluehelper ablkhelper cryptd joydev 8021q
serio
raw inputleds garp mrp stp llc sbedac edaccore hpilo ioatdma
lpc
ich shpchp dca 8250fintek ipmisi ipmimsghandler acpipowermeter
mac
hid kvmintel kvm irqbypass ibiser rdmacm iwcm ibcm ibsa ibmad
ib
core ibaddr iscsitcp libiscsitcp libiscsi scsitransportiscsi
nf
conntrackprotogre nfconntrackipv6 nfdefragipv6 nfconntrackipv4
nfdefragipv4 nfconntrack autofs4 dmroundrobin raid10 raid456
async
raid6recov asyncmemcpy asyncpq asyncxor asynctx xor ses
enclosure raid6
pq libcrc32c raid1 raid0 multipath linear uas usbstorage
hid
generic usbhid hid psmouse lpfc ahci libahci be2net vxlan
scsitransportfc ip6udptunnel udptunnel wmi fjes scsidhemc
scsi
dhrdac scsidhalua dmmultipath

2017-10-20T18:56:20.408580+00:00 node-68 kernel: [1458996.797828] CPU: 27
PID: 0 Comm: swapper/27 Tainted: G W 4.4.0-93-generic

116-Ubuntu

2017-10-20T18:56:20.408582+00:00 node-68 kernel: [1458996.797830] Hardware
name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017

2017-10-20T18:56:20.408583+00:00 node-68 kernel: [1458996.797832]
0000000000000286 0fc821d2ff4865f6 ffff88103fc437d0 ffffffff813f9f83

2017-10-20T18:56:20.408615+00:00 node-68 kernel: [1458996.797835]
ffff88103fc43818 ffffffff81d6f780 ffff88103fc43808 ffffffff810812f2

2017-10-20T18:56:20.408623+00:00 node-68 kernel: [1458996.797838]
ffff88203343f200 ffff880f8e0d1000 0000000000000006 0000000000000006

2017-10-20T18:56:20.408624+00:00 node-68 kernel: [1458996.797840] Call
Trace:

2017-10-20T18:56:20.408625+00:00 node-68 kernel: [1458996.797842]
[] dump_stack+0x63/0x90

2017-10-20T18:56:20.408626+00:00 node-68 kernel: [1458996.797859]
[] warnslowpathcommon+0x82/0xc0

2017-10-20T18:56:20.408627+00:00 node-68 kernel: [1458996.797861]
[] warnslowpathfmt+0x5c/0x80

2017-10-20T18:56:20.408632+00:00 node-68 kernel: [1458996.797865]
[] ? ___ratelimit+0xa2/0xe0

2017-10-20T18:56:20.408635+00:00 node-68 kernel: [1458996.797867]
[] skbwarnbad_offload+0xd1/0x120

2017-10-20T18:56:20.408636+00:00 node-68 kernel: [1458996.797870]
[] __skbgsosegment+0xfd/0x110

2017-10-20T18:56:20.408638+00:00 node-68 kernel: [1458996.797878]
[] queuegsopackets+0x5b/0x150 [openvswitch]

2017-10-20T18:56:20.408639+00:00 node-68 kernel: [1458996.797881]
[] ? brnfforwardip+0x2a3/0x480 [brnetfilter]

2017-10-20T18:56:20.408640+00:00 node-68 kernel: [1458996.797884]
[] ? brvalidateipv4.isra.23+0x200/0x200 [br_netfilter]

2017-10-20T18:56:20.408641+00:00 node-68 kernel: [1458996.797889]
[] ? nf_iterate+0x62/0x80

2017-10-20T18:56:20.408644+00:00 node-68 kernel: [1458996.797892]
[] ? nfhookslow+0x73/0xd0

2017-10-20T18:56:20.408645+00:00 node-68 kernel: [1458996.797901]
[] ? __br_forward+0x104/0x130 [bridge]

2017-10-20T18:56:20.408646+00:00 node-68 kernel: [1458996.797905]
[] ovsdpupcall+0x31/0x60 [openvswitch]

2017-10-20T18:56:20.408648+00:00 node-68 kernel: [1458996.797909]
[] ovsdpprocess_packet+0x10a/0x130 [openvswitch]

2017-10-20T18:56:20.408649+00:00 node-68 kernel: [1458996.797914]
[] ovsvportreceive+0x6c/0xd0 [openvswitch]

2017-10-20T18:56:20.408650+00:00 node-68 kernel: [1458996.797917]
[] ? __skbflowdissect+0x6a6/0x9f0

2017-10-20T18:56:20.408653+00:00 node-68 kernel: [1458996.797920]
[] ? nf_iterate+0x62/0x80

2017-10-20T18:56:20.408654+00:00 node-68 kernel: [1458996.797922]
[] ? __skbgethash+0x9a/0x300

2017-10-20T18:56:20.408655+00:00 node-68 kernel: [1458996.797926]
[] ? __slab_free+0xcb/0x2c0

2017-10-20T18:56:20.408656+00:00 node-68 kernel: [1458996.797930]
[] ? skbreleasedata+0xa7/0xd0

2017-10-20T18:56:20.408657+00:00 node-68 kernel: [1458996.797934]
[] netdevframehook+0xe9/0x150 [openvswitch]

2017-10-20T18:56:20.408658+00:00 node-68 kernel: [1458996.797937]
[] __netifreceiveskb_core+0x364/0xa60

2017-10-20T18:56:20.408665+00:00 node-68 kernel: [1458996.797939]
[] ? skbcompletewifi_ack+0xa0/0xe0

2017-10-20T18:56:20.408666+00:00 node-68 kernel: [1458996.797942]
[] ? __devkfreeskb_any+0x2f/0x40

2017-10-20T18:56:20.408679+00:00 node-68 kernel: [1458996.797947]
[] ? begetnew_eqd.isra.63+0x124/0x1f0 [be2net]

2017-10-20T18:56:20.408680+00:00 node-68 kernel: [1458996.797949]
[] __netifreceiveskb+0x18/0x60

2017-10-20T18:56:20.408681+00:00 node-68 kernel: [1458996.797951]
[] process_backlog+0xa8/0x150

2017-10-20T18:56:20.408684+00:00 node-68 kernel: [1458996.797954]
[] netrxaction+0x21e/0x360

2017-10-20T18:56:20.408685+00:00 node-68 kernel: [1458996.797957]
[] __do_softirq+0x101/0x290

2017-10-20T18:56:20.408686+00:00 node-68 kernel: [1458996.797959]
[] irq_exit+0xa3/0xb0

2017-10-20T18:56:20.408687+00:00 node-68 kernel: [1458996.797963]
[] smpcallfunctionsingleinterrupt+0x33/0x40

2017-10-20T18:56:20.408687+00:00 node-68 kernel: [1458996.797967]
[] callfunctionsingle_interrupt+0x82/0x90

2017-10-20T18:56:20.408688+00:00 node-68 kernel: [1458996.797968]
[] ? cpuidleenterstate+0x111/0x2b0

2017-10-20T18:56:20.408691+00:00 node-68 kernel: [1458996.797973]
[] cpuidle_enter+0x17/0x20

2017-10-20T18:56:20.408692+00:00 node-68 kernel: [1458996.797977]
[] call_cpuidle+0x32/0x60

2017-10-20T18:56:20.408693+00:00 node-68 kernel: [1458996.797979]
[] ? cpuidle_select+0x13/0x20

2017-10-20T18:56:20.408694+00:00 node-68 kernel: [1458996.797982]
[] cpustartupentry+0x290/0x350

2017-10-20T18:56:20.408695+00:00 node-68 kernel: [1458996.797984]
[] start_secondary+0x154/0x190

2017-10-20T18:56:20.408695+00:00 node-68 kernel: [1458996.797989] ---[ end
trace d44d42b3ada78269 ]---

2017-10-20T19:00:19.679060+00:00 node-68 kernel: [1459236.052489]
------------[ cut here ]------------

2017-10-20T19:00:19.679080+00:00 node-68 kernel: [1459236.052509] WARNING:
CPU: 27 PID: 0 at /build/linux-YyUNAI/linux-4.4.0/net/core/dev.c:2445
skbwarnbad_offload+0xd1/0x120()

2017-10-20T19:00:19.679081+00:00 node-68 kernel: [1459236.052513]
qvofd385f05-cb: caps=(0x00000184075b59e9, 0x0000000000000000) len=2642
datalen=0 gsosize=1480 gsotype=6 ipsummed=0

2017-10-20T19:00:19.679082+00:00 node-68 kernel: [1459236.052515] Modules
linked in: bonding binfmtmisc nfconntracknetlink vhostnet vhost macvtap
macvlan xtmac xttcpudp xtphysdev brnetfilter xtset ipsethashnet
ipset nfnetlink veth ip6tableraw ebtablefilter ebtables openvswitch
ocfs2 quota
tree ocfs2dlmfs ocfs2stacko2cb ocfs2dlm ocfs2nodemanager
ocfs2
stackglue configfs ip6tablefilter ip6tables xtmultiport
xt
conntrack iptablefilter xtcomment xtCT iptableraw iptables xtables
xfs ipmissif bridge intelrapl x86pkgtempthermal intelpowerclamp
coretemp crct10difpclmul crc32pclmul ghashclmulniintel aesniintel
aes
x8664 lrw gf128mul gluehelper ablkhelper cryptd joydev 8021q
serio
raw inputleds garp mrp stp llc sbedac edaccore hpilo ioatdma
lpc
ich shpchp dca 8250fintek ipmisi ipmimsghandler acpipowermeter
mac
hid kvmintel kvm irqbypass ibiser rdmacm iwcm ibcm ibsa ibmad
ib
core ibaddr iscsitcp libiscsitcp libiscsi scsitransportiscsi
nf
conntrackprotogre nfconntrackipv6 nfdefragipv6 nfconntrackipv4
nfdefragipv4 nfconntrack autofs4 dmroundrobin raid10 raid456
async
raid6recov asyncmemcpy asyncpq asyncxor asynctx xor ses
enclosure raid6
pq libcrc32c raid1 raid0 multipath linear uas usbstorage
hid
generic usbhid hid psmouse lpfc ahci libahci be2net vxlan
scsitransportfc ip6udptunnel udptunnel wmi fjes scsidhemc
scsi
dhrdac scsidhalua dmmultipath

2017-10-20T19:00:19.679084+00:00 node-68 kernel: [1459236.052606] CPU: 27
PID: 0 Comm: swapper/27 Tainted: G W 4.4.0-93-generic

116-Ubuntu

2017-10-20T19:00:19.679098+00:00 node-68 kernel: [1459236.052609] Hardware
name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017

2017-10-20T19:00:19.679099+00:00 node-68 kernel: [1459236.052611]
0000000000000286 0fc821d2ff4865f6 ffff88103fc437d0 ffffffff813f9f83

2017-10-20T19:00:19.679114+00:00 node-68 kernel: [1459236.052614]
ffff88103fc43818 ffffffff81d6f780 ffff88103fc43808 ffffffff810812f2

2017-10-20T19:00:19.679116+00:00 node-68 kernel: [1459236.052616]
ffff880f89469000 ffff880f8e0d1000 0000000000000006 0000000000000006

2017-10-20T19:00:19.679117+00:00 node-68 kernel: [1459236.052619] Call
Trace:

2017-10-20T19:00:19.679118+00:00 node-68 kernel: [1459236.052620]
[] dump_stack+0x63/0x90

2017-10-20T19:00:19.679119+00:00 node-68 kernel: [1459236.052629]
[] warnslowpathcommon+0x82/0xc0

2017-10-20T19:00:19.679119+00:00 node-68 kernel: [1459236.052633]
[] warnslowpathfmt+0x5c/0x80

2017-10-20T19:00:19.679120+00:00 node-68 kernel: [1459236.052637]
[] ? ___ratelimit+0xa2/0xe0

2017-10-20T19:00:19.679121+00:00 node-68 kernel: [1459236.052639]
[] skbwarnbad_offload+0xd1/0x120

2017-10-20T19:00:19.679122+00:00 node-68 kernel: [1459236.052642]
[] __skbgsosegment+0xfd/0x110

2017-10-20T19:00:19.679123+00:00 node-68 kernel: [1459236.052649]
[] queuegsopackets+0x5b/0x150 [openvswitch]

2017-10-20T19:00:19.679124+00:00 node-68 kernel: [1459236.052653]
[] ? brnfforwardip+0x2a3/0x480 [brnetfilter]

2017-10-20T19:00:19.679124+00:00 node-68 kernel: [1459236.052659]
[] ? bexmitenqueue+0x5bd/0x630 [be2net]

2017-10-20T19:00:19.679125+00:00 node-68 kernel: [1459236.052662]
[] ? bexmitflush+0xfb/0x110 [be2net]

2017-10-20T19:00:19.679126+00:00 node-68 kernel: [1459236.052665]
[] ? be_xmit+0x2f0/0x730 [be2net]

2017-10-20T19:00:19.679127+00:00 node-68 kernel: [1459236.052670]
[] ovsdpupcall+0x31/0x60 [openvswitch]

2017-10-20T19:00:19.679129+00:00 node-68 kernel: [1459236.052673]
[] ovsdpprocess_packet+0x10a/0x130 [openvswitch]

2017-10-20T19:00:19.679129+00:00 node-68 kernel: [1459236.052678]
[] ovsvportreceive+0x6c/0xd0 [openvswitch]

2017-10-20T19:00:19.679130+00:00 node-68 kernel: [1459236.052684]
[] ? brfdbexternallearndel+0x120/0x120 [bridge]

2017-10-20T19:00:19.679131+00:00 node-68 kernel: [1459236.052688]
[] ? __br_forward+0xa6/0x130 [bridge]

2017-10-20T19:00:19.679132+00:00 node-68 kernel: [1459236.052693]
[] ? deliver_clone+0x50/0x50 [bridge]

2017-10-20T19:00:19.679133+00:00 node-68 kernel: [1459236.052698]
[] ? br_forward+0x87/0x90 [bridge]

2017-10-20T19:00:19.679134+00:00 node-68 kernel: [1459236.052702]
[] ? brhandleframe_finish+0x3a0/0x620 [bridge]

2017-10-20T19:00:19.679135+00:00 node-68 kernel: [1459236.052706]
[] ? __slab_free+0xcb/0x2c0

2017-10-20T19:00:19.679135+00:00 node-68 kernel: [1459236.052711]
[] ? brhandleframe+0x174/0x2b0 [bridge]

2017-10-20T19:00:19.679136+00:00 node-68 kernel: [1459236.052715]
[] netdevframehook+0xe9/0x150 [openvswitch]

2017-10-20T19:00:19.679137+00:00 node-68 kernel: [1459236.052717]
[] __netifreceiveskb_core+0x364/0xa60

2017-10-20T19:00:19.679147+00:00 node-68 kernel: [1459236.052721]
[] ? skbcompletewifi_ack+0xa0/0xe0

2017-10-20T19:00:19.679148+00:00 node-68 kernel: [1459236.052722]
[] ? __devkfreeskb_any+0x2f/0x40

2017-10-20T19:00:19.679149+00:00 node-68 kernel: [1459236.052723]
[] __netifreceiveskb+0x18/0x60

2017-10-20T19:00:19.679149+00:00 node-68 kernel: [1459236.052725]
[] process_backlog+0xa8/0x150

2017-10-20T19:00:19.679150+00:00 node-68 kernel: [1459236.052726]
[] netrxaction+0x21e/0x360

2017-10-20T19:00:19.679155+00:00 node-68 kernel: [1459236.052728]
[] __do_softirq+0x101/0x290

2017-10-20T19:00:19.679157+00:00 node-68 kernel: [1459236.052730]
[] irq_exit+0xa3/0xb0

2017-10-20T19:00:19.679158+00:00 node-68 kernel: [1459236.052733]
[] smpcallfunctionsingleinterrupt+0x33/0x40

2017-10-20T19:00:19.679158+00:00 node-68 kernel: [1459236.052738]
[] callfunctionsingle_interrupt+0x82/0x90

2017-10-20T19:00:19.679159+00:00 node-68 kernel: [1459236.052739]
[] ? cpuidleenterstate+0x111/0x2b0

2017-10-20T19:00:19.679160+00:00 node-68 kernel: [1459236.052743]
[] cpuidle_enter+0x17/0x20

2017-10-20T19:00:19.679162+00:00 node-68 kernel: [1459236.052746]
[] call_cpuidle+0x32/0x60

2017-10-20T19:00:19.679163+00:00 node-68 kernel: [1459236.052747]
[] ? cpuidle_select+0x13/0x20

2017-10-20T19:00:19.679163+00:00 node-68 kernel: [1459236.052749]
[] cpustartupentry+0x290/0x350

2017-10-20T19:00:19.679164+00:00 node-68 kernel: [1459236.052750]
[] start_secondary+0x154/0x190

2017-10-20T19:00:19.679165+00:00 node-68 kernel: [1459236.052753] ---[ end
trace d44d42b3ada7826a ]---

node-90

2017-10-20T18:04:40.933607+00:00 node-90 rsyslogd-2177: imuxsock[pid 5001]:
begin to drop messages due to rate-limiting

2017-10-20T18:04:42.868706+00:00 node-90 rsyslogd-2177: imuxsock[pid 5001]:
41 messages lost due to rate-limiting

2017-10-20T18:14:40.927790+00:00 node-90 rsyslogd-2177: imuxsock[pid 5001]:
begin to drop messages due to rate-limiting

2017-10-20T18:14:42.537996+00:00 node-90 rsyslogd-2177: imuxsock[pid 5001]:
41 messages lost due to rate-limiting

2017-10-20T18:24:40.921904+00:00 node-90 rsyslogd-2177: imuxsock[pid 5001]:
begin to drop messages due to rate-limiting

2017-10-20T18:24:42.091415+00:00 node-90 rsyslogd-2177: imuxsock[pid 5001]:
41 messages lost due to rate-limiting

ps -aef | grep 5001

ceilome+ 5001 3401 0 Oct19 ? 00:19:09 ceilometer-polling - AgentManager(0)

2017-10-20T18:30:37.734912+00:00 node-90 rsyslogd: [origin
software="rsyslogd" swVersion="8.16.0" x-pid="3305" x-info="
http://www.rsyslog.com"] exiting on signal 15.

2017-10-20T18:30:37.834236+00:00 node-90 rsyslogd: [origin
software="rsyslogd" swVersion="8.16.0" x-pid="21427" x-info="
http://www.rsyslog.com"] start

2017-10-20T18:30:37.833919+00:00 node-90 rsyslogd: rsyslogd's groupid
changed to 108

2017-10-20T18:30:37.833993+00:00 node-90 rsyslogd: rsyslogd's userid
changed to 104

2017-10-20T18:30:37.834050+00:00 node-90 rsyslogd-2357: queue "action 0
queue": high water mark is set quite low at 8000. You should only set it
below 60% (600000) if you have a good reason for this. [v8.16.0 try
http://www.rsyslog.com/e/2357 ]

Test starts: Fri Oct 20 18:52:48 2017

2017-10-20T18:56:20.421681+00:00 node-90 kernel: [97344.379555]
------------[ cut here ]------------

2017-10-20T18:56:20.421718+00:00 node-90 kernel: [97344.379563] WARNING:
CPU: 30 PID: 18870 at /build/linux-YyUNAI/linux-4.4.0/net/core/dev.c:2445
skbwarnbad_offload+0xd1/0x120()

2017-10-20T18:56:20.421719+00:00 node-90 kernel: [97344.379565]
qvo14d5a4ef-47: caps=(0x00000184075b59e9, 0x0000000000000000) len=2531
datalen=0 gsosize=1480 gsotype=6 ipsummed=0

2017-10-20T18:56:20.421720+00:00 node-90 kernel: [97344.379567] Modules
linked in: vhostnet vhost macvtap macvlan veth nfconntracknetlink
ip6table
raw xtmac xttcpudp xtphysdev brnetfilter xtset
ip
sethashnet ipset nfnetlink ebtablefilter ebtables openvswitch ocfs2
quotatree ocfs2dlmfs ocfs2stacko2cb ocfs2dlm ocfs2nodemanager
ocfs2stackglue configfs ip6tablefilter ip6tables xtmultiport
xtconntrack iptablefilter xtcomment xtCT iptableraw iptables xtables
xfs bridge 8021q garp mrp stp llc intel
rapl x86pkgtempthermal
intel
powerclamp coretemp crct10difpclmul crc32pclmul ghashclmulniintel
aesniintel aesx8664 lrw gf128mul hpilo inputleds joydev kvmintel
glue
helper ipmissif kvm ablkhelper cryptd irqbypass ipmisi shpchp
8250
fintek ipmimsghandler ioatdma serioraw sbedac lpcich edaccore dca
acpi
powermeter machid ibiser rdmacm iwcm ibcm ibsa ibmad ibcore
ib
addr iscsitcp libiscsitcp libiscsi scsitransportiscsi
nfconntrackprotogre nfconntrackipv6 nfdefragipv6 nfconntrackipv4
nf
defragipv4 nfconntrack autofs4 raid10 raid456 asyncraid6recov
asyncmemcpy asyncpq asyncxor asynctx dmroundrobin xor ses enclosure
raid6pq libcrc32c raid1 raid0 multipath linear hidgeneric usbhid hid
psmouse lpfc ahci libahci be2net vxlan scsitransportfc ip6udptunnel
udptunnel wmi fjes scsidhemc scsidhrdac scsidhalua dmmultipath

2017-10-20T18:56:20.421723+00:00 node-90 kernel: [97344.379625] CPU: 30
PID: 18870 Comm: vhost-18868 Not tainted 4.4.0-93-generic #116-Ubuntu

2017-10-20T18:56:20.421871+00:00 node-90 kernel: [97344.379626] Hardware
name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017

2017-10-20T18:56:20.421876+00:00 node-90 kernel: [97344.379627]
0000000000000286 f1812d601dc61f3e ffff88203f2837f0 ffffffff813f9f83

2017-10-20T18:56:20.421877+00:00 node-90 kernel: [97344.379629]
ffff88203f283838 ffffffff81d6f780 ffff88203f283828 ffffffff810812f2

2017-10-20T18:56:20.421895+00:00 node-90 kernel: [97344.379630]
ffff881fb3dfa700 ffff88202d5d1000 0000000000000006 0000000000000006

2017-10-20T18:56:20.421899+00:00 node-90 kernel: [97344.379632] Call Trace:

2017-10-20T18:56:20.421900+00:00 node-90 kernel: [97344.379634]
[] dump_stack+0x63/0x90

2017-10-20T18:56:20.421901+00:00 node-90 kernel: [97344.379642]
[] warnslowpathcommon+0x82/0xc0

2017-10-20T18:56:20.421901+00:00 node-90 kernel: [97344.379643]
[] warnslowpathfmt+0x5c/0x80

2017-10-20T18:56:20.421902+00:00 node-90 kernel: [97344.379646]
[] ? ___ratelimit+0xa2/0xe0

2017-10-20T18:56:20.421904+00:00 node-90 kernel: [97344.379648]
[] skbwarnbad_offload+0xd1/0x120

2017-10-20T18:56:20.421905+00:00 node-90 kernel: [97344.379650]
[] __skbgsosegment+0xfd/0x110

2017-10-20T18:56:20.421905+00:00 node-90 kernel: [97344.379656]
[] queuegsopackets+0x5b/0x150 [openvswitch]

2017-10-20T18:56:20.421906+00:00 node-90 kernel: [97344.379658]
[] ? brnfforwardip+0x2a3/0x480 [brnetfilter]

2017-10-20T18:56:20.421907+00:00 node-90 kernel: [97344.379660]
[] ? brvalidateipv4.isra.23+0x200/0x200 [br_netfilter]

2017-10-20T18:56:20.421907+00:00 node-90 kernel: [97344.379666]
[] ? nf_iterate+0x62/0x80

2017-10-20T18:56:20.421909+00:00 node-90 kernel: [97344.379668]
[] ? nfhookslow+0x73/0xd0

2017-10-20T18:56:20.421910+00:00 node-90 kernel: [97344.379676]
[] ? __br_forward+0x104/0x130 [bridge]

2017-10-20T18:56:20.421911+00:00 node-90 kernel: [97344.379679]
[] ovsdpupcall+0x31/0x60 [openvswitch]

2017-10-20T18:56:20.421911+00:00 node-90 kernel: [97344.379681]
[] ovsdpprocess_packet+0x10a/0x130 [openvswitch]

2017-10-20T18:56:20.421912+00:00 node-90 kernel: [97344.379684]
[] ovsvportreceive+0x6c/0xd0 [openvswitch]

2017-10-20T18:56:20.421912+00:00 node-90 kernel: [97344.379685]
[] ? brnfpreroutingfinish+0x1a9/0x350 [br_netfilter]

2017-10-20T18:56:20.421915+00:00 node-90 kernel: [97344.379688]
[] ? brhandlelocal_finish+0xa0/0xa0 [bridge]

2017-10-20T18:56:20.421915+00:00 node-90 kernel: [97344.379690]
[] ? nf_iterate+0x62/0x80

2017-10-20T18:56:20.421916+00:00 node-90 kernel: [97344.379692]
[] ? brnfprerouting+0x2e1/0x440 [brnetfilter]

2017-10-20T18:56:20.421916+00:00 node-90 kernel: [97344.379693]
[] ? brnfforwardip+0x480/0x480 [brnetfilter]

2017-10-20T18:56:20.421917+00:00 node-90 kernel: [97344.379696]
[] ? brhandleframe+0x1da/0x2b0 [bridge]

2017-10-20T18:56:20.421917+00:00 node-90 kernel: [97344.379699]
[] netdevframehook+0xe9/0x150 [openvswitch]

2017-10-20T18:56:20.421920+00:00 node-90 kernel: [97344.379700]
[] __netifreceiveskb_core+0x364/0xa60

2017-10-20T18:56:20.421920+00:00 node-90 kernel: [97344.379702]
[] __netifreceiveskb+0x18/0x60

2017-10-20T18:56:20.421921+00:00 node-90 kernel: [97344.379703]
[] process_backlog+0xa8/0x150

2017-10-20T18:56:20.421928+00:00 node-90 kernel: [97344.379704]
[] netrxaction+0x21e/0x360

2017-10-20T18:56:20.421929+00:00 node-90 kernel: [97344.379706]
[] __do_softirq+0x101/0x290

2017-10-20T18:56:20.421929+00:00 node-90 kernel: [97344.379709]
[] dosoftirqown_stack+0x1c/0x30

2017-10-20T18:56:20.421931+00:00 node-90 kernel: [97344.379710]
[] do_softirq.part.19+0x38/0x40

2017-10-20T18:56:20.421932+00:00 node-90 kernel: [97344.379713]
[] do_softirq+0x1d/0x20

2017-10-20T18:56:20.421932+00:00 node-90 kernel: [97344.379714]
[] netifrxni+0x33/0x80

2017-10-20T18:56:20.421933+00:00 node-90 kernel: [97344.379718]
[] tungetuser+0x506/0x880

2017-10-20T18:56:20.421933+00:00 node-90 kernel: [97344.379720]
[] tun_sendmsg+0x51/0x70

2017-10-20T18:56:20.421934+00:00 node-90 kernel: [97344.379723]
[] handletx+0x306/0x4e0 [vhostnet]

2017-10-20T18:56:20.421938+00:00 node-90 kernel: [97344.379726]
[] handletxkick+0x15/0x20 [vhost_net]

2017-10-20T18:56:20.421938+00:00 node-90 kernel: [97344.379730]
[] vhost_worker+0xf3/0x190 [vhost]

2017-10-20T18:56:20.421939+00:00 node-90 kernel: [97344.379733]
[] ? vhostpollwakeup+0x30/0x30 [vhost]

2017-10-20T18:56:20.421939+00:00 node-90 kernel: [97344.379736]
[] kthread+0xe5/0x100

2017-10-20T18:56:20.421940+00:00 node-90 kernel: [97344.379738]
[] ? kthreadcreateon_node+0x1e0/0x1e0

2017-10-20T18:56:20.421942+00:00 node-90 kernel: [97344.379740]
[] retfromfork+0x3f/0x70

2017-10-20T18:56:20.421943+00:00 node-90 kernel: [97344.379742]
[] ? kthreadcreateon_node+0x1e0/0x1e0

2017-10-20T18:56:20.421943+00:00 node-90 kernel: [97344.379743] ---[ end
trace d7e73079b38e57b3 ]---

2017-10-20T19:00:19.698016+00:00 node-90 kernel: [97583.653007]
------------[ cut here ]------------

2017-10-20T19:00:19.698034+00:00 node-90 kernel: [97583.653016] WARNING:
CPU: 2 PID: 18870 at /build/linux-YyUNAI/linux-4.4.0/net/core/dev.c:2445
skbwarnbad_offload+0xd1/0x120()

2017-10-20T19:00:19.698036+00:00 node-90 kernel: [97583.653018]
qvo14d5a4ef-47: caps=(0x00000184075b59e9, 0x0000000000000000) len=2531
datalen=0 gsosize=1480 gsotype=6 ipsummed=0

2017-10-20T19:00:19.698037+00:00 node-90 kernel: [97583.653019] Modules
linked in: vhostnet vhost macvtap macvlan veth nfconntracknetlink
ip6table
raw xtmac xttcpudp xtphysdev brnetfilter xtset
ip
sethashnet ipset nfnetlink ebtablefilter ebtables openvswitch ocfs2
quotatree ocfs2dlmfs ocfs2stacko2cb ocfs2dlm ocfs2nodemanager
ocfs2stackglue configfs ip6tablefilter ip6tables xtmultiport
xtconntrack iptablefilter xtcomment xtCT iptableraw iptables xtables
xfs bridge 8021q garp mrp stp llc intel
rapl x86pkgtempthermal
intel
powerclamp coretemp crct10difpclmul crc32pclmul ghashclmulniintel
aesniintel aesx8664 lrw gf128mul hpilo inputleds joydev kvmintel
glue
helper ipmissif kvm ablkhelper cryptd irqbypass ipmisi shpchp
8250
fintek ipmimsghandler ioatdma serioraw sbedac lpcich edaccore dca
acpi
powermeter machid ibiser rdmacm iwcm ibcm ibsa ibmad ibcore
ib
addr iscsitcp libiscsitcp libiscsi scsitransportiscsi
nfconntrackprotogre nfconntrackipv6 nfdefragipv6 nfconntrackipv4
nf
defragipv4 nfconntrack autofs4 raid10 raid456 asyncraid6recov
asyncmemcpy asyncpq asyncxor asynctx dmroundrobin xor ses enclosure
raid6pq libcrc32c raid1 raid0 multipath linear hidgeneric usbhid hid
psmouse lpfc ahci libahci be2net vxlan scsitransportfc ip6udptunnel
udptunnel wmi fjes scsidhemc scsidhrdac scsidhalua dmmultipath

2017-10-20T19:00:19.698046+00:00 node-90 kernel: [97583.653082] CPU: 2 PID:
18870 Comm: vhost-18868 Tainted: G W 4.4.0-93-generic

116-Ubuntu

2017-10-20T19:00:19.698048+00:00 node-90 kernel: [97583.653083] Hardware
name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017

2017-10-20T19:00:19.698049+00:00 node-90 kernel: [97583.653084]
0000000000000286 f1812d601dc61f3e ffff88103f8837f0 ffffffff813f9f83

2017-10-20T19:00:19.698064+00:00 node-90 kernel: [97583.653086]
ffff88103f883838 ffffffff81d6f780 ffff88103f883828 ffffffff810812f2

2017-10-20T19:00:19.698065+00:00 node-90 kernel: [97583.653088]
ffff881034e2fe00 ffff88202d5d1000 0000000000000006 0000000000000006

2017-10-20T19:00:19.698067+00:00 node-90 kernel: [97583.653090] Call Trace:

2017-10-20T19:00:19.698069+00:00 node-90 kernel: [97583.653091]
[] dump_stack+0x63/0x90

2017-10-20T19:00:19.698069+00:00 node-90 kernel: [97583.653098]
[] warnslowpathcommon+0x82/0xc0

2017-10-20T19:00:19.698070+00:00 node-90 kernel: [97583.653100]
[] warnslowpathfmt+0x5c/0x80

2017-10-20T19:00:19.698071+00:00 node-90 kernel: [97583.653102]
[] ? ___ratelimit+0xa2/0xe0

2017-10-20T19:00:19.698071+00:00 node-90 kernel: [97583.653103]
[] skbwarnbad_offload+0xd1/0x120

2017-10-20T19:00:19.698073+00:00 node-90 kernel: [97583.653105]
[] __skbgsosegment+0xfd/0x110

2017-10-20T19:00:19.698074+00:00 node-90 kernel: [97583.653111]
[] queuegsopackets+0x5b/0x150 [openvswitch]

2017-10-20T19:00:19.698075+00:00 node-90 kernel: [97583.653114]
[] ? brnfforwardip+0x2a3/0x480 [brnetfilter]

2017-10-20T19:00:19.698076+00:00 node-90 kernel: [97583.653116]
[] ? brvalidateipv4.isra.23+0x200/0x200 [br_netfilter]

2017-10-20T19:00:19.698076+00:00 node-90 kernel: [97583.653120]
[] ? nf_iterate+0x62/0x80

2017-10-20T19:00:19.698077+00:00 node-90 kernel: [97583.653122]
[] ? nfhookslow+0x73/0xd0

2017-10-20T19:00:19.698079+00:00 node-90 kernel: [97583.653128]
[] ? __br_forward+0x104/0x130 [bridge]

2017-10-20T19:00:19.698080+00:00 node-90 kernel: [97583.653131]
[] ovsdpupcall+0x31/0x60 [openvswitch]

2017-10-20T19:00:19.698081+00:00 node-90 kernel: [97583.653133]
[] ovsdpprocess_packet+0x10a/0x130 [openvswitch]

2017-10-20T19:00:19.698081+00:00 node-90 kernel: [97583.653136]
[] ovsvportreceive+0x6c/0xd0 [openvswitch]

2017-10-20T19:00:19.698082+00:00 node-90 kernel: [97583.653138]
[] ? brnfpreroutingfinish+0x1a9/0x350 [br_netfilter]

2017-10-20T19:00:19.698083+00:00 node-90 kernel: [97583.653141]
[] ? brhandlelocal_finish+0xa0/0xa0 [bridge]

2017-10-20T19:00:19.698084+00:00 node-90 kernel: [97583.653143]
[] ? nf_iterate+0x62/0x80

2017-10-20T19:00:19.698085+00:00 node-90 kernel: [97583.653144]
[] ? brnfprerouting+0x2e1/0x440 [brnetfilter]

2017-10-20T19:00:19.698086+00:00 node-90 kernel: [97583.653146]
[] ? brnfforwardip+0x480/0x480 [brnetfilter]

2017-10-20T19:00:19.698086+00:00 node-90 kernel: [97583.653149]
[] ? brhandleframe+0x1da/0x2b0 [bridge]

2017-10-20T19:00:19.698087+00:00 node-90 kernel: [97583.653152]
[] netdevframehook+0xe9/0x150 [openvswitch]

2017-10-20T19:00:19.698088+00:00 node-90 kernel: [97583.653154]
[] __netifreceiveskb_core+0x364/0xa60

2017-10-20T19:00:19.698099+00:00 node-90 kernel: [97583.653156]
[] ? x2apicsendIPI_mask+0x13/0x20

2017-10-20T19:00:19.698099+00:00 node-90 kernel: [97583.653159]
[] ? nativesendcallfuncsingle_ipi+0x3a/0x40

2017-10-20T19:00:19.698100+00:00 node-90 kernel: [97583.653163]
[] ? genericexecsingle+0x85/0x120

2017-10-20T19:00:19.698101+00:00 node-90 kernel: [97583.653167]
[] ? beeqnotify+0x60/0x70 [be2net]

2017-10-20T19:00:19.698101+00:00 node-90 kernel: [97583.653168]
[] __netifreceiveskb+0x18/0x60

2017-10-20T19:00:19.698102+00:00 node-90 kernel: [97583.653170]
[] process_backlog+0xa8/0x150

2017-10-20T19:00:19.698104+00:00 node-90 kernel: [97583.653171]
[] netrxaction+0x21e/0x360

2017-10-20T19:00:19.698105+00:00 node-90 kernel: [97583.653173]
[] __do_softirq+0x101/0x290

2017-10-20T19:00:19.698106+00:00 node-90 kernel: [97583.653175]
[] dosoftirqown_stack+0x1c/0x30

2017-10-20T19:00:19.698107+00:00 node-90 kernel: [97583.653176]
[] do_softirq.part.19+0x38/0x40

2017-10-20T19:00:19.698108+00:00 node-90 kernel: [97583.653179]
[] do_softirq+0x1d/0x20

2017-10-20T19:00:19.698110+00:00 node-90 kernel: [97583.653181]
[] netifrxni+0x33/0x80

2017-10-20T19:00:19.698111+00:00 node-90 kernel: [97583.653184]
[] tungetuser+0x506/0x880

2017-10-20T19:00:19.698112+00:00 node-90 kernel: [97583.653185]
[] tun_sendmsg+0x51/0x70

2017-10-20T19:00:19.698112+00:00 node-90 kernel: [97583.653188]
[] handletx+0x306/0x4e0 [vhostnet]

2017-10-20T19:00:19.698113+00:00 node-90 kernel: [97583.653190]
[] handletxkick+0x15/0x20 [vhost_net]

2017-10-20T19:00:19.698113+00:00 node-90 kernel: [97583.653193]
[] vhost_worker+0xf3/0x190 [vhost]

2017-10-20T19:00:19.698115+00:00 node-90 kernel: [97583.653195]
[] ? vhostpollwakeup+0x30/0x30 [vhost]

2017-10-20T19:00:19.698116+00:00 node-90 kernel: [97583.653198]
[] kthread+0xe5/0x100

2017-10-20T19:00:19.698117+00:00 node-90 kernel: [97583.653199]
[] ? kthreadcreateon_node+0x1e0/0x1e0

2017-10-20T19:00:19.698117+00:00 node-90 kernel: [97583.653203]
[] retfromfork+0x3f/0x70

2017-10-20T19:00:19.698118+00:00 node-90 kernel: [97583.653204]
[] ? kthreadcreateon_node+0x1e0/0x1e0

2017-10-20T19:00:19.698123+00:00 node-90 kernel: [97583.653206] ---[ end
trace d7e73079b38e57b4 ]---

-- Jim

On Wed, Oct 18, 2017 at 11:37 PM, Jim Okken jim@jokken.com wrote:

hi all,

please help us out with an issue we are seeing on multiple compute nodes
running Newton (Ubuntu 16.04.3 Kernel 4.4.0). After about 1 hour of running
our VOIP test application the instances become non-responsive and can't be
pinged as well do the compute nodes.

messages appear on the compute node console screens. a screen shot of that
is hosted here:

http://www.jokken.com/downloads/console.png

i'll try to attach it also.

The first compute node this was seen on was running 2 instances, the
second was running only 1 instance. They were using on a portion of the
total 40 vCPUs available, and the load was moderate. Cold boot these nodes
and all is well again, until we run our application for about 1 hour.

please let us know what you think thanks!

not a lot is shown in DEBUG logging of Nova and Neutron on the compute node

these logs are here:

http://www.jokken.com/downloads/logs.zip

i'll try to attach them too.

https://ask.openstack.org/en/question/110748/soft-lockup-
on-newton-compute-nodes/

/var/log/messages on the compute node shows many repeats of these messages:

2017-10-18T20:49:26.462309+00:00 node-58 kernel: [1297007.624935] Modules
linked in: binfmtmisc nfconntracknetlink vhostnet vhost macvtap macvlan
ip6tableraw xtmac xttcpudp xtphysdev brnetfilter xtset
ipsethashnet ipset nfnetlink veth ebtablefilter ebtables openvswitch
ocfs2 quota
tree ocfs2dlmfs ocfs2stacko2cb ocfs2dlm ocfs2nodemanager
ocfs2
stackglue configfs ip6tablefilter ip6tables xtmultiport
xt
conntrack iptablefilter xtcomment xtCT iptableraw iptables xtables
xfs ipmissif 8021q garp mrp intelrapl x86pkgtempthermal
intel
powerclamp coretemp crct10difpclmul crc32pclmul ghashclmulniintel
aesniintel aesx8664 lrw gf128mul gluehelper ablkhelper cryptd
serio
raw bridge stp llc sbedac edaccore hpilo ioatdma lpcich shpchp dca
ipmi
si 8250fintek ipmimsghandler acpipowermeter machid kvmintel kvm
irqbypass ibiser rdmacm iwcm ibcm ibsa ibmad ibcore ibaddr
iscsitcp libiscsitcp nfconntrackprotogre nfconntrackipv6
nf
defragipv6 nfconntrackipv4 nfdefragipv4 nfconntrack autofs4 raid10
raid456 asyncraid6recov asyncmemcpy asyncpq asyncxor asynctx xor
raid6pq libcrc32c raid1 raid0 multipath linear dmroundrobin ses
enclosure uas usb
storage psmouse ahci lpfc be2iscsi libahci be2net
iscsibootsysfs libiscsi vxlan scsitransportfc ip6udptunnel
scsitransportiscsi udptunnel wmi fjes scsidhemc scsidhrdac
scsi
dhalua dmmultipath

2017-10-18T20:49:26.462311+00:00 node-58 kernel: [1297007.625008] CPU: 27
PID: 860 Comm: qemu-system-x86 Not tainted 4.4.0-93-generic #116-Ubuntu

2017-10-18T20:49:26.462313+00:00 node-58 kernel: [1297007.625009]
Hardware name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017

2017-10-18T20:49:26.462314+00:00 node-58 kernel: [1297007.625010] task:
ffff881faaaa7000 ti: ffff881fa3a34000 task.ti: ffff881fa3a34000

2017-10-18T20:49:26.462315+00:00 node-58 kernel: [1297007.625011] RIP:
0010:[] [] nativequeuedspinlock
slowpath+0x15c/0x170

2017-10-18T20:49:26.462316+00:00 node-58 kernel: [1297007.625018] RSP:
0018:ffff883fff143c30 EFLAGS: 00000202

2017-10-18T20:49:26.462317+00:00 node-58 kernel: [1297007.625019] RAX:
0000000000000101 RBX: ffff881f677603f0 RCX: 0000000000000001

2017-10-18T20:49:26.462337+00:00 node-58 kernel: [1297007.625020] RDX:
0000000000000101 RSI: 0000000000000001 RDI: ffff881f677603ec

2017-10-18T20:49:26.462340+00:00 node-58 kernel: [1297007.625020] RBP:
ffff883fff143c30 R08: 0000000000000101 R09: ffffffff81191e27

2017-10-18T20:49:26.462341+00:00 node-58 kernel: [1297007.625021] R10:
ffffea00ffb09780 R11: 0000000000000a00 R12: ffff881f677603ec

2017-10-18T20:49:26.462342+00:00 node-58 kernel: [1297007.625022] R13:
0000000000000a00 R14: 00000000000a5000 R15: 0000000000000a00

2017-10-18T20:49:26.462343+00:00 node-58 kernel: [1297007.625023] FS:
00007f0c53fb3c00(0000) GS:ffff883fff140000(0000) knlGS:0000000000000000

2017-10-18T20:49:26.462343+00:00 node-58 kernel: [1297007.625024] CS:
0010 DS: 0000 ES: 0000 CR0: 0000000080050033

2017-10-18T20:49:26.462344+00:00 node-58 kernel: [1297007.625025] CR2:
00007fe018e2547e CR3: 0000003ec0b75000 CR4: 00000000001426e0

2017-10-18T20:49:26.462345+00:00 node-58 kernel: [1297007.625026] Stack:

2017-10-18T20:49:26.462347+00:00 node-58 kernel: [1297007.625026]
ffff883fff143c40 ffffffff81842f71 ffff883fff143c60 ffffffff81841085

2017-10-18T20:49:26.462348+00:00 node-58 kernel: [1297007.625028]
ffff881dc609ac00 ffff881f677604b0 ffff883fff143c70 ffffffff818410cb

2017-10-18T20:49:26.462349+00:00 node-58 kernel: [1297007.625029]
ffff883fff143ca0 ffffffffc08c658d ffff883feff9d500 0000000000000a00

2017-10-18T20:49:26.462351+00:00 node-58 kernel: [1297007.625031] Call
Trace:

2017-10-18T20:49:26.462353+00:00 node-58 kernel: [1297007.625032]

2017-10-18T20:49:26.462354+00:00 node-58 kernel: [1297007.625039]
[] rawspin_lock+0x21/0x30

2017-10-18T20:49:26.462356+00:00 node-58 kernel: [1297007.625041]
[] __mutexunlockslowpath+0x25/0x50

2017-10-18T20:49:26.462356+00:00 node-58 kernel: [1297007.625042]
[] mutex_unlock+0x1b/0x20

2017-10-18T20:49:26.462357+00:00 node-58 kernel: [1297007.625076]
[] ocfs2dioend_io+0x6d/0x80 [ocfs2]

2017-10-18T20:49:26.462358+00:00 node-58 kernel: [1297007.625080]
[] dio_complete+0x11c/0x1c0

2017-10-18T20:49:26.462359+00:00 node-58 kernel: [1297007.625081]
[] diobioend_aio+0x73/0x100

2017-10-18T20:49:26.462361+00:00 node-58 kernel: [1297007.625085]
[] bio_endio+0x3f/0x60

2017-10-18T20:49:26.462362+00:00 node-58 kernel: [1297007.625087]
[] blkupdaterequest+0x87/0x310

2017-10-18T20:49:26.462363+00:00 node-58 kernel: [1297007.625091]
[] endclonebio+0x46/0x70

2017-10-18T20:49:26.462363+00:00 node-58 kernel: [1297007.625092]
[] bio_endio+0x3f/0x60

2017-10-18T20:49:26.462364+00:00 node-58 kernel: [1297007.625093]
[] blkupdaterequest+0x87/0x310

2017-10-18T20:49:26.462365+00:00 node-58 kernel: [1297007.625097]
[] scsiendrequest+0x33/0x1d0

2017-10-18T20:49:26.462367+00:00 node-58 kernel: [1297007.625100]
[] scsiiocompletion+0x1b6/0x690

2017-10-18T20:49:26.462368+00:00 node-58 kernel: [1297007.625104]
[] ? rebalance_domains+0x166/0x2d0

2017-10-18T20:49:26.462368+00:00 node-58 kernel: [1297007.625107]
[] scsifinishcommand+0xcf/0x120

2017-10-18T20:49:26.462377+00:00 node-58 kernel: [1297007.625109]
[] scsisoftirqdone+0x124/0x150

2017-10-18T20:49:26.462378+00:00 node-58 kernel: [1297007.625112]
[] blkdonesoftirq+0x87/0xb0

2017-10-18T20:49:26.462379+00:00 node-58 kernel: [1297007.625116]
[] __do_softirq+0x101/0x290

2017-10-18T20:49:26.462381+00:00 node-58 kernel: [1297007.625118]
[] irq_exit+0xa3/0xb0

2017-10-18T20:49:26.462382+00:00 node-58 kernel: [1297007.625121]
[] smpcallfunctionsingleinterrupt+0x33/0x40

2017-10-18T20:49:26.462382+00:00 node-58 kernel: [1297007.625124]
[] callfunctionsingle_interrupt+0x82/0x90

2017-10-18T20:49:26.462383+00:00 node-58 kernel: [1297007.625125]

2017-10-18T20:49:26.462383+00:00 node-58 kernel: [1297007.625127]
[] ? rawspin_lock+0x14/0x30

2017-10-18T20:49:26.462385+00:00 node-58 kernel: [1297007.625129]
[] __mutexlockslowpath+0x72/0x130

2017-10-18T20:49:26.462387+00:00 node-58 kernel: [1297007.625142]
[] ? ocfs2inodeunlock+0x119/0x120 [ocfs2]

2017-10-18T20:49:26.462387+00:00 node-58 kernel: [1297007.625143]
[] mutex_lock+0x1f/0x30

2017-10-18T20:49:26.462388+00:00 node-58 kernel: [1297007.625155]
[] ocfs2filewrite_iter+0x95a/0xdf0 [ocfs2]

2017-10-18T20:49:26.462388+00:00 node-58 kernel: [1297007.625158]
[] ? pollselectcopy_remaining+0x140/0x140

2017-10-18T20:49:26.462389+00:00 node-58 kernel: [1297007.625169]
[] ? ocfs2checkrangeforrefcount+0x150/0x150 [ocfs2]

2017-10-18T20:49:26.462391+00:00 node-58 kernel: [1297007.625171]
[] aioruniocb+0x26a/0x2d0

2017-10-18T20:49:26.462392+00:00 node-58 kernel: [1297007.625174]
[] ? __fget_light+0x25/0x60

2017-10-18T20:49:26.462394+00:00 node-58 kernel: [1297007.625175]
[] ? __fdget+0x13/0x20

2017-10-18T20:49:26.462395+00:00 node-58 kernel: [1297007.625177]
[] doiosubmit+0x25f/0x500

2017-10-18T20:49:26.462396+00:00 node-58 kernel: [1297007.625178]
[] SySiosubmit+0x10/0x20

2017-10-18T20:49:26.462398+00:00 node-58 kernel: [1297007.625181]
[] entrySYSCALL64_fastpath+0x16/0x71

2017-10-18T20:49:26.462399+00:00 node-58 kernel: [1297007.625181] Code:
01 48 8b 02 48 85 c0 75 0a f3 90 48 8b 02 48 85 c0 74 f6 c7 40 08 01 00 00
00 e9 63 ff ff ff 83 fa 01 75 07 e9 c4 fe ff ff f3 90 <8b> 07 84 c0 75 f8
b8 01 00 00 00 66 89 07 5d c3 0f 1f 40 00 0f


Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
asked Oct 24, 2017 in openstack by Jim_Okken (480 points)   1 3

1 Response

0 votes

===== UPDATE 11/10 ======
hi again,

based on some advice from a member of this mailing list we've been looking
into kernel and driver versions of our compute nodes

We also have plain non openstack "KVM on Ubuntu" servers for testing.

I looked at driver and kernel differences between these Ubuntu 16 w/ KVM
systems and our openstack compute nodes. I found Ubuntu 16 w/ KVM was at
kernel version 4.4.0-87 and that the openstack compute nodes were at
4.4.0-93. So I upgraded the Ubuntu 16 w/ KVM to 4.4.0-93 and was able to
reproduce this problem (but only on the exact HP hardware that is our
openstack compute nodes, and not on other hardware).
Next I updated these Ubuntu 16 w/ KVM to 4.4.0-98 and the problem no longer
occured!

I need to upgrade a few openstack compute nodes to 4.4.0-98 and test. Do
anyone think this kernel change could break openstack?

In the kernel change log I found a fix for a specific HP server in 4.4.0-98
(not the same as our server but somewhat similar)

thanks!

-- Jim

On Mon, Oct 23, 2017 at 10:25 PM, Jim Okken jim@jokken.com wrote:

===== UPDATE 10/23 ======

we have been trying different things to get better debug we disabled
rate-limiting in order to get better info in /var/log/message. for some
reason (maybe unrelated) we didn't get the soft lockup during this test But
this time we got openvswitch, br_netfilter, etc in the call trace in
/var/log/messages

Please advise in any way! thx!!

basically we are running various types of SIP/RTP test traffic between 2
instances (on different compute nodes). This time instead of one hypervisor
getting the errors both hypervisors did, but neither got the soft lockup.

log snippetes below, full logs here:

www.jokken.com/downloads/node-68.txt

www.jokken.com/downloads/node-90.txt

node-68

2017-10-20T17:48:37.031741+00:00 node-68 rsyslogd-2177: imuxsock[pid
5085]: 40 messages lost due to rate-limiting

2017-10-20T17:58:36.281069+00:00 node-68 rsyslogd-2177: imuxsock[pid
5085]: begin to drop messages due to rate-limiting

2017-10-20T17:58:37.548500+00:00 node-68 rsyslogd-2177: imuxsock[pid
5085]: 41 messages lost due to rate-limiting

2017-10-20T18:08:36.180377+00:00 node-68 rsyslogd-2177: imuxsock[pid
5085]: begin to drop messages due to rate-limiting

2017-10-20T18:08:37.058861+00:00 node-68 rsyslogd-2177: imuxsock[pid
5085]: 40 messages lost due to rate-limiting

2017-10-20T18:18:36.175797+00:00 node-68 rsyslogd-2177: imuxsock[pid
5085]: begin to drop messages due to rate-limiting

2017-10-20T18:18:37.583237+00:00 node-68 rsyslogd-2177: imuxsock[pid
5085]: 41 messages lost due to rate-limiting

2017-10-20T18:28:36.172090+00:00 node-68 rsyslogd-2177: imuxsock[pid
5085]: begin to drop messages due to rate-limiting

2017-10-20T18:28:37.125346+00:00 node-68 rsyslogd-2177: imuxsock[pid
5085]: 40 messages lost due to rate-limiting

ps -aef | grep 5080

ceilome+ 5080 3502 0 Oct03 ? 01:32:57 ceilometer-polling - AgentManager(0)

2017-10-20T18:35:10.759230+00:00 node-68 rsyslogd: [origin
software="rsyslogd" swVersion="8.16.0" x-pid="3431" x-info="
http://www.rsyslog.com"] exiting on signal 15.

2017-10-20T18:35:10.790611+00:00 node-68 rsyslogd: [origin
software="rsyslogd" swVersion="8.16.0" x-pid="23851" x-info="
http://www.rsyslog.com"] start

2017-10-20T18:35:10.790395+00:00 node-68 rsyslogd: rsyslogd's groupid
changed to 108

2017-10-20T18:35:10.790455+00:00 node-68 rsyslogd: rsyslogd's userid
changed to 104

2017-10-20T18:35:10.790491+00:00 node-68 rsyslogd-2357: queue "action 0
queue": high water mark is set quite low at 8000. You should only set it
below 60% (600000) if you have a good reason for this. [v8.16.0 try
http://www.rsyslog.com/e/2357 ]

Test starts: Fri Oct 20 18:52:48 2017

2017-10-20T18:56:20.408532+00:00 node-68 kernel: [1458996.797708]
------------[ cut here ]------------

2017-10-20T18:56:20.408571+00:00 node-68 kernel: [1458996.797728]
WARNING: CPU: 27 PID: 0 at /build/linux-YyUNAI/linux-4.4.0/net/core/dev.c:2445
skbwarnbad_offload+0xd1/0x120()

2017-10-20T18:56:20.408574+00:00 node-68 kernel: [1458996.797732]
qvofd385f05-cb: caps=(0x00000184075b59e9, 0x0000000000000000) len=2636
datalen=2594 gsosize=1480 gsotype=6 ipsummed=0

2017-10-20T18:56:20.408576+00:00 node-68 kernel: [1458996.797735] Modules
linked in: bonding binfmtmisc nfconntracknetlink vhostnet vhost macvtap
macvlan xtmac xttcpudp xtphysdev brnetfilter xtset ipsethashnet
ipset nfnetlink veth ip6tableraw ebtablefilter ebtables openvswitch
ocfs2 quota
tree ocfs2dlmfs ocfs2stacko2cb ocfs2dlm ocfs2nodemanager
ocfs2
stackglue configfs ip6tablefilter ip6tables xtmultiport
xt
conntrack iptablefilter xtcomment xtCT iptableraw iptables xtables
xfs ipmissif bridge intelrapl x86pkgtempthermal intelpowerclamp
coretemp crct10difpclmul crc32pclmul ghashclmulniintel aesniintel
aes
x8664 lrw gf128mul gluehelper ablkhelper cryptd joydev 8021q
serio
raw inputleds garp mrp stp llc sbedac edaccore hpilo ioatdma
lpc
ich shpchp dca 8250fintek ipmisi ipmimsghandler acpipowermeter
mac
hid kvmintel kvm irqbypass ibiser rdmacm iwcm ibcm ibsa ibmad
ib
core ibaddr iscsitcp libiscsitcp libiscsi scsitransportiscsi
nf
conntrackprotogre nfconntrackipv6 nfdefragipv6 nfconntrackipv4
nfdefragipv4 nfconntrack autofs4 dmroundrobin raid10 raid456
async
raid6recov asyncmemcpy asyncpq asyncxor asynctx xor ses
enclosure raid6
pq libcrc32c raid1 raid0 multipath linear uas usbstorage
hid
generic usbhid hid psmouse lpfc ahci libahci be2net vxlan
scsitransportfc ip6udptunnel udptunnel wmi fjes scsidhemc
scsi
dhrdac scsidhalua dmmultipath

2017-10-20T18:56:20.408580+00:00 node-68 kernel: [1458996.797828] CPU: 27
PID: 0 Comm: swapper/27 Tainted: G W 4.4.0-93-generic

116-Ubuntu

2017-10-20T18:56:20.408582+00:00 node-68 kernel: [1458996.797830]
Hardware name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017

2017-10-20T18:56:20.408583+00:00 node-68 kernel: [1458996.797832]
0000000000000286 0fc821d2ff4865f6 ffff88103fc437d0 ffffffff813f9f83

2017-10-20T18:56:20.408615+00:00 node-68 kernel: [1458996.797835]
ffff88103fc43818 ffffffff81d6f780 ffff88103fc43808 ffffffff810812f2

2017-10-20T18:56:20.408623+00:00 node-68 kernel: [1458996.797838]
ffff88203343f200 ffff880f8e0d1000 0000000000000006 0000000000000006

2017-10-20T18:56:20.408624+00:00 node-68 kernel: [1458996.797840] Call
Trace:

2017-10-20T18:56:20.408625+00:00 node-68 kernel: [1458996.797842]
[] dump_stack+0x63/0x90

2017-10-20T18:56:20.408626+00:00 node-68 kernel: [1458996.797859]
[] warnslowpathcommon+0x82/0xc0

2017-10-20T18:56:20.408627+00:00 node-68 kernel: [1458996.797861]
[] warnslowpathfmt+0x5c/0x80

2017-10-20T18:56:20.408632+00:00 node-68 kernel: [1458996.797865]
[] ? ___ratelimit+0xa2/0xe0

2017-10-20T18:56:20.408635+00:00 node-68 kernel: [1458996.797867]
[] skbwarnbad_offload+0xd1/0x120

2017-10-20T18:56:20.408636+00:00 node-68 kernel: [1458996.797870]
[] __skbgsosegment+0xfd/0x110

2017-10-20T18:56:20.408638+00:00 node-68 kernel: [1458996.797878]
[] queuegsopackets+0x5b/0x150 [openvswitch]

2017-10-20T18:56:20.408639+00:00 node-68 kernel: [1458996.797881]
[] ? brnfforwardip+0x2a3/0x480 [brnetfilter]

2017-10-20T18:56:20.408640+00:00 node-68 kernel: [1458996.797884]
[] ? brvalidateipv4.isra.23+0x200/0x200 [br_netfilter]

2017-10-20T18:56:20.408641+00:00 node-68 kernel: [1458996.797889]
[] ? nf_iterate+0x62/0x80

2017-10-20T18:56:20.408644+00:00 node-68 kernel: [1458996.797892]
[] ? nfhookslow+0x73/0xd0

2017-10-20T18:56:20.408645+00:00 node-68 kernel: [1458996.797901]
[] ? __br_forward+0x104/0x130 [bridge]

2017-10-20T18:56:20.408646+00:00 node-68 kernel: [1458996.797905]
[] ovsdpupcall+0x31/0x60 [openvswitch]

2017-10-20T18:56:20.408648+00:00 node-68 kernel: [1458996.797909]
[] ovsdpprocess_packet+0x10a/0x130 [openvswitch]

2017-10-20T18:56:20.408649+00:00 node-68 kernel: [1458996.797914]
[] ovsvportreceive+0x6c/0xd0 [openvswitch]

2017-10-20T18:56:20.408650+00:00 node-68 kernel: [1458996.797917]
[] ? __skbflowdissect+0x6a6/0x9f0

2017-10-20T18:56:20.408653+00:00 node-68 kernel: [1458996.797920]
[] ? nf_iterate+0x62/0x80

2017-10-20T18:56:20.408654+00:00 node-68 kernel: [1458996.797922]
[] ? __skbgethash+0x9a/0x300

2017-10-20T18:56:20.408655+00:00 node-68 kernel: [1458996.797926]
[] ? __slab_free+0xcb/0x2c0

2017-10-20T18:56:20.408656+00:00 node-68 kernel: [1458996.797930]
[] ? skbreleasedata+0xa7/0xd0

2017-10-20T18:56:20.408657+00:00 node-68 kernel: [1458996.797934]
[] netdevframehook+0xe9/0x150 [openvswitch]

2017-10-20T18:56:20.408658+00:00 node-68 kernel: [1458996.797937]
[] __netifreceiveskb_core+0x364/0xa60

2017-10-20T18:56:20.408665+00:00 node-68 kernel: [1458996.797939]
[] ? skbcompletewifi_ack+0xa0/0xe0

2017-10-20T18:56:20.408666+00:00 node-68 kernel: [1458996.797942]
[] ? __devkfreeskb_any+0x2f/0x40

2017-10-20T18:56:20.408679+00:00 node-68 kernel: [1458996.797947]
[] ? begetnew_eqd.isra.63+0x124/0x1f0 [be2net]

2017-10-20T18:56:20.408680+00:00 node-68 kernel: [1458996.797949]
[] __netifreceiveskb+0x18/0x60

2017-10-20T18:56:20.408681+00:00 node-68 kernel: [1458996.797951]
[] process_backlog+0xa8/0x150

2017-10-20T18:56:20.408684+00:00 node-68 kernel: [1458996.797954]
[] netrxaction+0x21e/0x360

2017-10-20T18:56:20.408685+00:00 node-68 kernel: [1458996.797957]
[] __do_softirq+0x101/0x290

2017-10-20T18:56:20.408686+00:00 node-68 kernel: [1458996.797959]
[] irq_exit+0xa3/0xb0

2017-10-20T18:56:20.408687+00:00 node-68 kernel: [1458996.797963]
[] smpcallfunctionsingleinterrupt+0x33/0x40

2017-10-20T18:56:20.408687+00:00 node-68 kernel: [1458996.797967]
[] callfunctionsingle_interrupt+0x82/0x90

2017-10-20T18:56:20.408688+00:00 node-68 kernel: [1458996.797968]
[] ? cpuidleenterstate+0x111/0x2b0

2017-10-20T18:56:20.408691+00:00 node-68 kernel: [1458996.797973]
[] cpuidle_enter+0x17/0x20

2017-10-20T18:56:20.408692+00:00 node-68 kernel: [1458996.797977]
[] call_cpuidle+0x32/0x60

2017-10-20T18:56:20.408693+00:00 node-68 kernel: [1458996.797979]
[] ? cpuidle_select+0x13/0x20

2017-10-20T18:56:20.408694+00:00 node-68 kernel: [1458996.797982]
[] cpustartupentry+0x290/0x350

2017-10-20T18:56:20.408695+00:00 node-68 kernel: [1458996.797984]
[] start_secondary+0x154/0x190

2017-10-20T18:56:20.408695+00:00 node-68 kernel: [1458996.797989] ---[
end trace d44d42b3ada78269 ]---

2017-10-20T19:00:19.679060+00:00 node-68 kernel: [1459236.052489]
------------[ cut here ]------------

2017-10-20T19:00:19.679080+00:00 node-68 kernel: [1459236.052509]
WARNING: CPU: 27 PID: 0 at /build/linux-YyUNAI/linux-4.4.0/net/core/dev.c:2445
skbwarnbad_offload+0xd1/0x120()

2017-10-20T19:00:19.679081+00:00 node-68 kernel: [1459236.052513]
qvofd385f05-cb: caps=(0x00000184075b59e9, 0x0000000000000000) len=2642
datalen=0 gsosize=1480 gsotype=6 ipsummed=0

2017-10-20T19:00:19.679082+00:00 node-68 kernel: [1459236.052515] Modules
linked in: bonding binfmtmisc nfconntracknetlink vhostnet vhost macvtap
macvlan xtmac xttcpudp xtphysdev brnetfilter xtset ipsethashnet
ipset nfnetlink veth ip6tableraw ebtablefilter ebtables openvswitch
ocfs2 quota
tree ocfs2dlmfs ocfs2stacko2cb ocfs2dlm ocfs2nodemanager
ocfs2
stackglue configfs ip6tablefilter ip6tables xtmultiport
xt
conntrack iptablefilter xtcomment xtCT iptableraw iptables xtables
xfs ipmissif bridge intelrapl x86pkgtempthermal intelpowerclamp
coretemp crct10difpclmul crc32pclmul ghashclmulniintel aesniintel
aes
x8664 lrw gf128mul gluehelper ablkhelper cryptd joydev 8021q
serio
raw inputleds garp mrp stp llc sbedac edaccore hpilo ioatdma
lpc
ich shpchp dca 8250fintek ipmisi ipmimsghandler acpipowermeter
mac
hid kvmintel kvm irqbypass ibiser rdmacm iwcm ibcm ibsa ibmad
ib
core ibaddr iscsitcp libiscsitcp libiscsi scsitransportiscsi
nf
conntrackprotogre nfconntrackipv6 nfdefragipv6 nfconntrackipv4
nfdefragipv4 nfconntrack autofs4 dmroundrobin raid10 raid456
async
raid6recov asyncmemcpy asyncpq asyncxor asynctx xor ses
enclosure raid6
pq libcrc32c raid1 raid0 multipath linear uas usbstorage
hid
generic usbhid hid psmouse lpfc ahci libahci be2net vxlan
scsitransportfc ip6udptunnel udptunnel wmi fjes scsidhemc
scsi
dhrdac scsidhalua dmmultipath

2017-10-20T19:00:19.679084+00:00 node-68 kernel: [1459236.052606] CPU: 27
PID: 0 Comm: swapper/27 Tainted: G W 4.4.0-93-generic

116-Ubuntu

2017-10-20T19:00:19.679098+00:00 node-68 kernel: [1459236.052609]
Hardware name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017

2017-10-20T19:00:19.679099+00:00 node-68 kernel: [1459236.052611]
0000000000000286 0fc821d2ff4865f6 ffff88103fc437d0 ffffffff813f9f83

2017-10-20T19:00:19.679114+00:00 node-68 kernel: [1459236.052614]
ffff88103fc43818 ffffffff81d6f780 ffff88103fc43808 ffffffff810812f2

2017-10-20T19:00:19.679116+00:00 node-68 kernel: [1459236.052616]
ffff880f89469000 ffff880f8e0d1000 0000000000000006 0000000000000006

2017-10-20T19:00:19.679117+00:00 node-68 kernel: [1459236.052619] Call
Trace:

2017-10-20T19:00:19.679118+00:00 node-68 kernel: [1459236.052620]
[] dump_stack+0x63/0x90

2017-10-20T19:00:19.679119+00:00 node-68 kernel: [1459236.052629]
[] warnslowpathcommon+0x82/0xc0

2017-10-20T19:00:19.679119+00:00 node-68 kernel: [1459236.052633]
[] warnslowpathfmt+0x5c/0x80

2017-10-20T19:00:19.679120+00:00 node-68 kernel: [1459236.052637]
[] ? ___ratelimit+0xa2/0xe0

2017-10-20T19:00:19.679121+00:00 node-68 kernel: [1459236.052639]
[] skbwarnbad_offload+0xd1/0x120

2017-10-20T19:00:19.679122+00:00 node-68 kernel: [1459236.052642]
[] __skbgsosegment+0xfd/0x110

2017-10-20T19:00:19.679123+00:00 node-68 kernel: [1459236.052649]
[] queuegsopackets+0x5b/0x150 [openvswitch]

2017-10-20T19:00:19.679124+00:00 node-68 kernel: [1459236.052653]
[] ? brnfforwardip+0x2a3/0x480 [brnetfilter]

2017-10-20T19:00:19.679124+00:00 node-68 kernel: [1459236.052659]
[] ? bexmitenqueue+0x5bd/0x630 [be2net]

2017-10-20T19:00:19.679125+00:00 node-68 kernel: [1459236.052662]
[] ? bexmitflush+0xfb/0x110 [be2net]

2017-10-20T19:00:19.679126+00:00 node-68 kernel: [1459236.052665]
[] ? be_xmit+0x2f0/0x730 [be2net]

2017-10-20T19:00:19.679127+00:00 node-68 kernel: [1459236.052670]
[] ovsdpupcall+0x31/0x60 [openvswitch]

2017-10-20T19:00:19.679129+00:00 node-68 kernel: [1459236.052673]
[] ovsdpprocess_packet+0x10a/0x130 [openvswitch]

2017-10-20T19:00:19.679129+00:00 node-68 kernel: [1459236.052678]
[] ovsvportreceive+0x6c/0xd0 [openvswitch]

2017-10-20T19:00:19.679130+00:00 node-68 kernel: [1459236.052684]
[] ? brfdbexternallearndel+0x120/0x120 [bridge]

2017-10-20T19:00:19.679131+00:00 node-68 kernel: [1459236.052688]
[] ? __br_forward+0xa6/0x130 [bridge]

2017-10-20T19:00:19.679132+00:00 node-68 kernel: [1459236.052693]
[] ? deliver_clone+0x50/0x50 [bridge]

2017-10-20T19:00:19.679133+00:00 node-68 kernel: [1459236.052698]
[] ? br_forward+0x87/0x90 [bridge]

2017-10-20T19:00:19.679134+00:00 node-68 kernel: [1459236.052702]
[] ? brhandleframe_finish+0x3a0/0x620 [bridge]

2017-10-20T19:00:19.679135+00:00 node-68 kernel: [1459236.052706]
[] ? __slab_free+0xcb/0x2c0

2017-10-20T19:00:19.679135+00:00 node-68 kernel: [1459236.052711]
[] ? brhandleframe+0x174/0x2b0 [bridge]

2017-10-20T19:00:19.679136+00:00 node-68 kernel: [1459236.052715]
[] netdevframehook+0xe9/0x150 [openvswitch]

2017-10-20T19:00:19.679137+00:00 node-68 kernel: [1459236.052717]
[] __netifreceiveskb_core+0x364/0xa60

2017-10-20T19:00:19.679147+00:00 node-68 kernel: [1459236.052721]
[] ? skbcompletewifi_ack+0xa0/0xe0

2017-10-20T19:00:19.679148+00:00 node-68 kernel: [1459236.052722]
[] ? __devkfreeskb_any+0x2f/0x40

2017-10-20T19:00:19.679149+00:00 node-68 kernel: [1459236.052723]
[] __netifreceiveskb+0x18/0x60

2017-10-20T19:00:19.679149+00:00 node-68 kernel: [1459236.052725]
[] process_backlog+0xa8/0x150

2017-10-20T19:00:19.679150+00:00 node-68 kernel: [1459236.052726]
[] netrxaction+0x21e/0x360

2017-10-20T19:00:19.679155+00:00 node-68 kernel: [1459236.052728]
[] __do_softirq+0x101/0x290

2017-10-20T19:00:19.679157+00:00 node-68 kernel: [1459236.052730]
[] irq_exit+0xa3/0xb0

2017-10-20T19:00:19.679158+00:00 node-68 kernel: [1459236.052733]
[] smpcallfunctionsingleinterrupt+0x33/0x40

2017-10-20T19:00:19.679158+00:00 node-68 kernel: [1459236.052738]
[] callfunctionsingle_interrupt+0x82/0x90

2017-10-20T19:00:19.679159+00:00 node-68 kernel: [1459236.052739]
[] ? cpuidleenterstate+0x111/0x2b0

2017-10-20T19:00:19.679160+00:00 node-68 kernel: [1459236.052743]
[] cpuidle_enter+0x17/0x20

2017-10-20T19:00:19.679162+00:00 node-68 kernel: [1459236.052746]
[] call_cpuidle+0x32/0x60

2017-10-20T19:00:19.679163+00:00 node-68 kernel: [1459236.052747]
[] ? cpuidle_select+0x13/0x20

2017-10-20T19:00:19.679163+00:00 node-68 kernel: [1459236.052749]
[] cpustartupentry+0x290/0x350

2017-10-20T19:00:19.679164+00:00 node-68 kernel: [1459236.052750]
[] start_secondary+0x154/0x190

2017-10-20T19:00:19.679165+00:00 node-68 kernel: [1459236.052753] ---[
end trace d44d42b3ada7826a ]---

node-90

2017-10-20T18:04:40.933607+00:00 node-90 rsyslogd-2177: imuxsock[pid
5001]: begin to drop messages due to rate-limiting

2017-10-20T18:04:42.868706+00:00 node-90 rsyslogd-2177: imuxsock[pid
5001]: 41 messages lost due to rate-limiting

2017-10-20T18:14:40.927790+00:00 node-90 rsyslogd-2177: imuxsock[pid
5001]: begin to drop messages due to rate-limiting

2017-10-20T18:14:42.537996+00:00 node-90 rsyslogd-2177: imuxsock[pid
5001]: 41 messages lost due to rate-limiting

2017-10-20T18:24:40.921904+00:00 node-90 rsyslogd-2177: imuxsock[pid
5001]: begin to drop messages due to rate-limiting

2017-10-20T18:24:42.091415+00:00 node-90 rsyslogd-2177: imuxsock[pid
5001]: 41 messages lost due to rate-limiting

ps -aef | grep 5001

ceilome+ 5001 3401 0 Oct19 ? 00:19:09 ceilometer-polling - AgentManager(0)

2017-10-20T18:30:37.734912+00:00 node-90 rsyslogd: [origin
software="rsyslogd" swVersion="8.16.0" x-pid="3305" x-info="
http://www.rsyslog.com"] exiting on signal 15.

2017-10-20T18:30:37.834236+00:00 node-90 rsyslogd: [origin
software="rsyslogd" swVersion="8.16.0" x-pid="21427" x-info="
http://www.rsyslog.com"] start

2017-10-20T18:30:37.833919+00:00 node-90 rsyslogd: rsyslogd's groupid
changed to 108

2017-10-20T18:30:37.833993+00:00 node-90 rsyslogd: rsyslogd's userid
changed to 104

2017-10-20T18:30:37.834050+00:00 node-90 rsyslogd-2357: queue "action 0
queue": high water mark is set quite low at 8000. You should only set it
below 60% (600000) if you have a good reason for this. [v8.16.0 try
http://www.rsyslog.com/e/2357 ]

Test starts: Fri Oct 20 18:52:48 2017

2017-10-20T18:56:20.421681+00:00 node-90 kernel: [97344.379555]
------------[ cut here ]------------

2017-10-20T18:56:20.421718+00:00 node-90 kernel: [97344.379563] WARNING:
CPU: 30 PID: 18870 at /build/linux-YyUNAI/linux-4.4.0/net/core/dev.c:2445
skbwarnbad_offload+0xd1/0x120()

2017-10-20T18:56:20.421719+00:00 node-90 kernel: [97344.379565]
qvo14d5a4ef-47: caps=(0x00000184075b59e9, 0x0000000000000000) len=2531
datalen=0 gsosize=1480 gsotype=6 ipsummed=0

2017-10-20T18:56:20.421720+00:00 node-90 kernel: [97344.379567] Modules
linked in: vhostnet vhost macvtap macvlan veth nfconntracknetlink
ip6table
raw xtmac xttcpudp xtphysdev brnetfilter xtset
ip
sethashnet ipset nfnetlink ebtablefilter ebtables openvswitch ocfs2
quotatree ocfs2dlmfs ocfs2stacko2cb ocfs2dlm ocfs2nodemanager
ocfs2stackglue configfs ip6tablefilter ip6tables xtmultiport
xtconntrack iptablefilter xtcomment xtCT iptableraw iptables xtables
xfs bridge 8021q garp mrp stp llc intel
rapl x86pkgtempthermal
intel
powerclamp coretemp crct10difpclmul crc32pclmul ghashclmulniintel
aesniintel aesx8664 lrw gf128mul hpilo inputleds joydev kvmintel
glue
helper ipmissif kvm ablkhelper cryptd irqbypass ipmisi shpchp
8250
fintek ipmimsghandler ioatdma serioraw sbedac lpcich edaccore dca
acpi
powermeter machid ibiser rdmacm iwcm ibcm ibsa ibmad ibcore
ib
addr iscsitcp libiscsitcp libiscsi scsitransportiscsi
nfconntrackprotogre nfconntrackipv6 nfdefragipv6 nfconntrackipv4
nf
defragipv4 nfconntrack autofs4 raid10 raid456 asyncraid6recov
asyncmemcpy asyncpq asyncxor asynctx dmroundrobin xor ses enclosure
raid6pq libcrc32c raid1 raid0 multipath linear hidgeneric usbhid hid
psmouse lpfc ahci libahci be2net vxlan scsitransportfc ip6udptunnel
udptunnel wmi fjes scsidhemc scsidhrdac scsidhalua dmmultipath

2017-10-20T18:56:20.421723+00:00 node-90 kernel: [97344.379625] CPU: 30
PID: 18870 Comm: vhost-18868 Not tainted 4.4.0-93-generic #116-Ubuntu

2017-10-20T18:56:20.421871+00:00 node-90 kernel: [97344.379626] Hardware
name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017

2017-10-20T18:56:20.421876+00:00 node-90 kernel: [97344.379627]
0000000000000286 f1812d601dc61f3e ffff88203f2837f0 ffffffff813f9f83

2017-10-20T18:56:20.421877+00:00 node-90 kernel: [97344.379629]
ffff88203f283838 ffffffff81d6f780 ffff88203f283828 ffffffff810812f2

2017-10-20T18:56:20.421895+00:00 node-90 kernel: [97344.379630]
ffff881fb3dfa700 ffff88202d5d1000 0000000000000006 0000000000000006

2017-10-20T18:56:20.421899+00:00 node-90 kernel: [97344.379632] Call
Trace:

2017-10-20T18:56:20.421900+00:00 node-90 kernel: [97344.379634]
[] dump_stack+0x63/0x90

2017-10-20T18:56:20.421901+00:00 node-90 kernel: [97344.379642]
[] warnslowpathcommon+0x82/0xc0

2017-10-20T18:56:20.421901+00:00 node-90 kernel: [97344.379643]
[] warnslowpathfmt+0x5c/0x80

2017-10-20T18:56:20.421902+00:00 node-90 kernel: [97344.379646]
[] ? ___ratelimit+0xa2/0xe0

2017-10-20T18:56:20.421904+00:00 node-90 kernel: [97344.379648]
[] skbwarnbad_offload+0xd1/0x120

2017-10-20T18:56:20.421905+00:00 node-90 kernel: [97344.379650]
[] __skbgsosegment+0xfd/0x110

2017-10-20T18:56:20.421905+00:00 node-90 kernel: [97344.379656]
[] queuegsopackets+0x5b/0x150 [openvswitch]

2017-10-20T18:56:20.421906+00:00 node-90 kernel: [97344.379658]
[] ? brnfforwardip+0x2a3/0x480 [brnetfilter]

2017-10-20T18:56:20.421907+00:00 node-90 kernel: [97344.379660]
[] ? brvalidateipv4.isra.23+0x200/0x200 [br_netfilter]

2017-10-20T18:56:20.421907+00:00 node-90 kernel: [97344.379666]
[] ? nf_iterate+0x62/0x80

2017-10-20T18:56:20.421909+00:00 node-90 kernel: [97344.379668]
[] ? nfhookslow+0x73/0xd0

2017-10-20T18:56:20.421910+00:00 node-90 kernel: [97344.379676]
[] ? __br_forward+0x104/0x130 [bridge]

2017-10-20T18:56:20.421911+00:00 node-90 kernel: [97344.379679]
[] ovsdpupcall+0x31/0x60 [openvswitch]

2017-10-20T18:56:20.421911+00:00 node-90 kernel: [97344.379681]
[] ovsdpprocess_packet+0x10a/0x130 [openvswitch]

2017-10-20T18:56:20.421912+00:00 node-90 kernel: [97344.379684]
[] ovsvportreceive+0x6c/0xd0 [openvswitch]

2017-10-20T18:56:20.421912+00:00 node-90 kernel: [97344.379685]
[] ? brnfpreroutingfinish+0x1a9/0x350 [br_netfilter]

2017-10-20T18:56:20.421915+00:00 node-90 kernel: [97344.379688]
[] ? brhandlelocal_finish+0xa0/0xa0 [bridge]

2017-10-20T18:56:20.421915+00:00 node-90 kernel: [97344.379690]
[] ? nf_iterate+0x62/0x80

2017-10-20T18:56:20.421916+00:00 node-90 kernel: [97344.379692]
[] ? brnfprerouting+0x2e1/0x440 [brnetfilter]

2017-10-20T18:56:20.421916+00:00 node-90 kernel: [97344.379693]
[] ? brnfforwardip+0x480/0x480 [brnetfilter]

2017-10-20T18:56:20.421917+00:00 node-90 kernel: [97344.379696]
[] ? brhandleframe+0x1da/0x2b0 [bridge]

2017-10-20T18:56:20.421917+00:00 node-90 kernel: [97344.379699]
[] netdevframehook+0xe9/0x150 [openvswitch]

2017-10-20T18:56:20.421920+00:00 node-90 kernel: [97344.379700]
[] __netifreceiveskb_core+0x364/0xa60

2017-10-20T18:56:20.421920+00:00 node-90 kernel: [97344.379702]
[] __netifreceiveskb+0x18/0x60

2017-10-20T18:56:20.421921+00:00 node-90 kernel: [97344.379703]
[] process_backlog+0xa8/0x150

2017-10-20T18:56:20.421928+00:00 node-90 kernel: [97344.379704]
[] netrxaction+0x21e/0x360

2017-10-20T18:56:20.421929+00:00 node-90 kernel: [97344.379706]
[] __do_softirq+0x101/0x290

2017-10-20T18:56:20.421929+00:00 node-90 kernel: [97344.379709]
[] dosoftirqown_stack+0x1c/0x30

2017-10-20T18:56:20.421931+00:00 node-90 kernel: [97344.379710]
[] do_softirq.part.19+0x38/0x40

2017-10-20T18:56:20.421932+00:00 node-90 kernel: [97344.379713]
[] do_softirq+0x1d/0x20

2017-10-20T18:56:20.421932+00:00 node-90 kernel: [97344.379714]
[] netifrxni+0x33/0x80

2017-10-20T18:56:20.421933+00:00 node-90 kernel: [97344.379718]
[] tungetuser+0x506/0x880

2017-10-20T18:56:20.421933+00:00 node-90 kernel: [97344.379720]
[] tun_sendmsg+0x51/0x70

2017-10-20T18:56:20.421934+00:00 node-90 kernel: [97344.379723]
[] handletx+0x306/0x4e0 [vhostnet]

2017-10-20T18:56:20.421938+00:00 node-90 kernel: [97344.379726]
[] handletxkick+0x15/0x20 [vhost_net]

2017-10-20T18:56:20.421938+00:00 node-90 kernel: [97344.379730]
[] vhost_worker+0xf3/0x190 [vhost]

2017-10-20T18:56:20.421939+00:00 node-90 kernel: [97344.379733]
[] ? vhostpollwakeup+0x30/0x30 [vhost]

2017-10-20T18:56:20.421939+00:00 node-90 kernel: [97344.379736]
[] kthread+0xe5/0x100

2017-10-20T18:56:20.421940+00:00 node-90 kernel: [97344.379738]
[] ? kthreadcreateon_node+0x1e0/0x1e0

2017-10-20T18:56:20.421942+00:00 node-90 kernel: [97344.379740]
[] retfromfork+0x3f/0x70

2017-10-20T18:56:20.421943+00:00 node-90 kernel: [97344.379742]
[] ? kthreadcreateon_node+0x1e0/0x1e0

2017-10-20T18:56:20.421943+00:00 node-90 kernel: [97344.379743] ---[ end
trace d7e73079b38e57b3 ]---

2017-10-20T19:00:19.698016+00:00 node-90 kernel: [97583.653007]
------------[ cut here ]------------

2017-10-20T19:00:19.698034+00:00 node-90 kernel: [97583.653016] WARNING:
CPU: 2 PID: 18870 at /build/linux-YyUNAI/linux-4.4.0/net/core/dev.c:2445
skbwarnbad_offload+0xd1/0x120()

2017-10-20T19:00:19.698036+00:00 node-90 kernel: [97583.653018]
qvo14d5a4ef-47: caps=(0x00000184075b59e9, 0x0000000000000000) len=2531
datalen=0 gsosize=1480 gsotype=6 ipsummed=0

2017-10-20T19:00:19.698037+00:00 node-90 kernel: [97583.653019] Modules
linked in: vhostnet vhost macvtap macvlan veth nfconntracknetlink
ip6table
raw xtmac xttcpudp xtphysdev brnetfilter xtset
ip
sethashnet ipset nfnetlink ebtablefilter ebtables openvswitch ocfs2
quotatree ocfs2dlmfs ocfs2stacko2cb ocfs2dlm ocfs2nodemanager
ocfs2stackglue configfs ip6tablefilter ip6tables xtmultiport
xtconntrack iptablefilter xtcomment xtCT iptableraw iptables xtables
xfs bridge 8021q garp mrp stp llc intel
rapl x86pkgtempthermal
intel
powerclamp coretemp crct10difpclmul crc32pclmul ghashclmulniintel
aesniintel aesx8664 lrw gf128mul hpilo inputleds joydev kvmintel
glue
helper ipmissif kvm ablkhelper cryptd irqbypass ipmisi shpchp
8250
fintek ipmimsghandler ioatdma serioraw sbedac lpcich edaccore dca
acpi
powermeter machid ibiser rdmacm iwcm ibcm ibsa ibmad ibcore
ib
addr iscsitcp libiscsitcp libiscsi scsitransportiscsi
nfconntrackprotogre nfconntrackipv6 nfdefragipv6 nfconntrackipv4
nf
defragipv4 nfconntrack autofs4 raid10 raid456 asyncraid6recov
asyncmemcpy asyncpq asyncxor asynctx dmroundrobin xor ses enclosure
raid6pq libcrc32c raid1 raid0 multipath linear hidgeneric usbhid hid
psmouse lpfc ahci libahci be2net vxlan scsitransportfc ip6udptunnel
udptunnel wmi fjes scsidhemc scsidhrdac scsidhalua dmmultipath

2017-10-20T19:00:19.698046+00:00 node-90 kernel: [97583.653082] CPU: 2
PID: 18870 Comm: vhost-18868 Tainted: G W 4.4.0-93-generic

116-Ubuntu

2017-10-20T19:00:19.698048+00:00 node-90 kernel: [97583.653083] Hardware
name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017

2017-10-20T19:00:19.698049+00:00 node-90 kernel: [97583.653084]
0000000000000286 f1812d601dc61f3e ffff88103f8837f0 ffffffff813f9f83

2017-10-20T19:00:19.698064+00:00 node-90 kernel: [97583.653086]
ffff88103f883838 ffffffff81d6f780 ffff88103f883828 ffffffff810812f2

2017-10-20T19:00:19.698065+00:00 node-90 kernel: [97583.653088]
ffff881034e2fe00 ffff88202d5d1000 0000000000000006 0000000000000006

2017-10-20T19:00:19.698067+00:00 node-90 kernel: [97583.653090] Call
Trace:

2017-10-20T19:00:19.698069+00:00 node-90 kernel: [97583.653091]
[] dump_stack+0x63/0x90

2017-10-20T19:00:19.698069+00:00 node-90 kernel: [97583.653098]
[] warnslowpathcommon+0x82/0xc0

2017-10-20T19:00:19.698070+00:00 node-90 kernel: [97583.653100]
[] warnslowpathfmt+0x5c/0x80

2017-10-20T19:00:19.698071+00:00 node-90 kernel: [97583.653102]
[] ? ___ratelimit+0xa2/0xe0

2017-10-20T19:00:19.698071+00:00 node-90 kernel: [97583.653103]
[] skbwarnbad_offload+0xd1/0x120

2017-10-20T19:00:19.698073+00:00 node-90 kernel: [97583.653105]
[] __skbgsosegment+0xfd/0x110

2017-10-20T19:00:19.698074+00:00 node-90 kernel: [97583.653111]
[] queuegsopackets+0x5b/0x150 [openvswitch]

2017-10-20T19:00:19.698075+00:00 node-90 kernel: [97583.653114]
[] ? brnfforwardip+0x2a3/0x480 [brnetfilter]

2017-10-20T19:00:19.698076+00:00 node-90 kernel: [97583.653116]
[] ? brvalidateipv4.isra.23+0x200/0x200 [br_netfilter]

2017-10-20T19:00:19.698076+00:00 node-90 kernel: [97583.653120]
[] ? nf_iterate+0x62/0x80

2017-10-20T19:00:19.698077+00:00 node-90 kernel: [97583.653122]
[] ? nfhookslow+0x73/0xd0

2017-10-20T19:00:19.698079+00:00 node-90 kernel: [97583.653128]
[] ? __br_forward+0x104/0x130 [bridge]

2017-10-20T19:00:19.698080+00:00 node-90 kernel: [97583.653131]
[] ovsdpupcall+0x31/0x60 [openvswitch]

2017-10-20T19:00:19.698081+00:00 node-90 kernel: [97583.653133]
[] ovsdpprocess_packet+0x10a/0x130 [openvswitch]

2017-10-20T19:00:19.698081+00:00 node-90 kernel: [97583.653136]
[] ovsvportreceive+0x6c/0xd0 [openvswitch]

2017-10-20T19:00:19.698082+00:00 node-90 kernel: [97583.653138]
[] ? brnfpreroutingfinish+0x1a9/0x350 [br_netfilter]

2017-10-20T19:00:19.698083+00:00 node-90 kernel: [97583.653141]
[] ? brhandlelocal_finish+0xa0/0xa0 [bridge]

2017-10-20T19:00:19.698084+00:00 node-90 kernel: [97583.653143]
[] ? nf_iterate+0x62/0x80

2017-10-20T19:00:19.698085+00:00 node-90 kernel: [97583.653144]
[] ? brnfprerouting+0x2e1/0x440 [brnetfilter]

2017-10-20T19:00:19.698086+00:00 node-90 kernel: [97583.653146]
[] ? brnfforwardip+0x480/0x480 [brnetfilter]

2017-10-20T19:00:19.698086+00:00 node-90 kernel: [97583.653149]
[] ? brhandleframe+0x1da/0x2b0 [bridge]

2017-10-20T19:00:19.698087+00:00 node-90 kernel: [97583.653152]
[] netdevframehook+0xe9/0x150 [openvswitch]

2017-10-20T19:00:19.698088+00:00 node-90 kernel: [97583.653154]
[] __netifreceiveskb_core+0x364/0xa60

2017-10-20T19:00:19.698099+00:00 node-90 kernel: [97583.653156]
[] ? x2apicsendIPI_mask+0x13/0x20

2017-10-20T19:00:19.698099+00:00 node-90 kernel: [97583.653159]
[] ? nativesendcallfuncsingle_ipi+0x3a/0x40

2017-10-20T19:00:19.698100+00:00 node-90 kernel: [97583.653163]
[] ? genericexecsingle+0x85/0x120

2017-10-20T19:00:19.698101+00:00 node-90 kernel: [97583.653167]
[] ? beeqnotify+0x60/0x70 [be2net]

2017-10-20T19:00:19.698101+00:00 node-90 kernel: [97583.653168]
[] __netifreceiveskb+0x18/0x60

2017-10-20T19:00:19.698102+00:00 node-90 kernel: [97583.653170]
[] process_backlog+0xa8/0x150

2017-10-20T19:00:19.698104+00:00 node-90 kernel: [97583.653171]
[] netrxaction+0x21e/0x360

2017-10-20T19:00:19.698105+00:00 node-90 kernel: [97583.653173]
[] __do_softirq+0x101/0x290

2017-10-20T19:00:19.698106+00:00 node-90 kernel: [97583.653175]
[] dosoftirqown_stack+0x1c/0x30

2017-10-20T19:00:19.698107+00:00 node-90 kernel: [97583.653176]
[] do_softirq.part.19+0x38/0x40

2017-10-20T19:00:19.698108+00:00 node-90 kernel: [97583.653179]
[] do_softirq+0x1d/0x20

2017-10-20T19:00:19.698110+00:00 node-90 kernel: [97583.653181]
[] netifrxni+0x33/0x80

2017-10-20T19:00:19.698111+00:00 node-90 kernel: [97583.653184]
[] tungetuser+0x506/0x880

2017-10-20T19:00:19.698112+00:00 node-90 kernel: [97583.653185]
[] tun_sendmsg+0x51/0x70

2017-10-20T19:00:19.698112+00:00 node-90 kernel: [97583.653188]
[] handletx+0x306/0x4e0 [vhostnet]

2017-10-20T19:00:19.698113+00:00 node-90 kernel: [97583.653190]
[] handletxkick+0x15/0x20 [vhost_net]

2017-10-20T19:00:19.698113+00:00 node-90 kernel: [97583.653193]
[] vhost_worker+0xf3/0x190 [vhost]

2017-10-20T19:00:19.698115+00:00 node-90 kernel: [97583.653195]
[] ? vhostpollwakeup+0x30/0x30 [vhost]

2017-10-20T19:00:19.698116+00:00 node-90 kernel: [97583.653198]
[] kthread+0xe5/0x100

2017-10-20T19:00:19.698117+00:00 node-90 kernel: [97583.653199]
[] ? kthreadcreateon_node+0x1e0/0x1e0

2017-10-20T19:00:19.698117+00:00 node-90 kernel: [97583.653203]
[] retfromfork+0x3f/0x70

2017-10-20T19:00:19.698118+00:00 node-90 kernel: [97583.653204]
[] ? kthreadcreateon_node+0x1e0/0x1e0

2017-10-20T19:00:19.698123+00:00 node-90 kernel: [97583.653206] ---[ end
trace d7e73079b38e57b4 ]---

-- Jim

On Wed, Oct 18, 2017 at 11:37 PM, Jim Okken jim@jokken.com wrote:

hi all,

please help us out with an issue we are seeing on multiple compute nodes
running Newton (Ubuntu 16.04.3 Kernel 4.4.0). After about 1 hour of running
our VOIP test application the instances become non-responsive and can't be
pinged as well do the compute nodes.

messages appear on the compute node console screens. a screen shot of
that is hosted here:

http://www.jokken.com/downloads/console.png

i'll try to attach it also.

The first compute node this was seen on was running 2 instances, the
second was running only 1 instance. They were using on a portion of the
total 40 vCPUs available, and the load was moderate. Cold boot these nodes
and all is well again, until we run our application for about 1 hour.

please let us know what you think thanks!

not a lot is shown in DEBUG logging of Nova and Neutron on the compute
node

these logs are here:

http://www.jokken.com/downloads/logs.zip

i'll try to attach them too.

https://ask.openstack.org/en/question/110748/soft-lockup-on-
newton-compute-nodes/

/var/log/messages on the compute node shows many repeats of these
messages:

2017-10-18T20:49:26.462309+00:00 node-58 kernel: [1297007.624935]
Modules linked in: binfmtmisc nfconntracknetlink vhostnet vhost macvtap
macvlan ip6tableraw xtmac xttcpudp xtphysdev brnetfilter xtset
ipsethashnet ipset nfnetlink veth ebtablefilter ebtables openvswitch
ocfs2 quota
tree ocfs2dlmfs ocfs2stacko2cb ocfs2dlm ocfs2nodemanager
ocfs2
stackglue configfs ip6tablefilter ip6tables xtmultiport
xt
conntrack iptablefilter xtcomment xtCT iptableraw iptables xtables
xfs ipmissif 8021q garp mrp intelrapl x86pkgtempthermal
intel
powerclamp coretemp crct10difpclmul crc32pclmul ghashclmulniintel
aesniintel aesx8664 lrw gf128mul gluehelper ablkhelper cryptd
serio
raw bridge stp llc sbedac edaccore hpilo ioatdma lpcich shpchp dca
ipmi
si 8250fintek ipmimsghandler acpipowermeter machid kvmintel kvm
irqbypass ibiser rdmacm iwcm ibcm ibsa ibmad ibcore ibaddr
iscsitcp libiscsitcp nfconntrackprotogre nfconntrackipv6
nf
defragipv6 nfconntrackipv4 nfdefragipv4 nfconntrack autofs4 raid10
raid456 asyncraid6recov asyncmemcpy asyncpq asyncxor asynctx xor
raid6pq libcrc32c raid1 raid0 multipath linear dmroundrobin ses
enclosure uas usb
storage psmouse ahci lpfc be2iscsi libahci be2net
iscsibootsysfs libiscsi vxlan scsitransportfc ip6udptunnel
scsitransportiscsi udptunnel wmi fjes scsidhemc scsidhrdac
scsi
dhalua dmmultipath

2017-10-18T20:49:26.462311+00:00 node-58 kernel: [1297007.625008] CPU:
27 PID: 860 Comm: qemu-system-x86 Not tainted 4.4.0-93-generic #116-Ubuntu

2017-10-18T20:49:26.462313+00:00 node-58 kernel: [1297007.625009]
Hardware name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017

2017-10-18T20:49:26.462314+00:00 node-58 kernel: [1297007.625010] task:
ffff881faaaa7000 ti: ffff881fa3a34000 task.ti: ffff881fa3a34000

2017-10-18T20:49:26.462315+00:00 node-58 kernel: [1297007.625011] RIP:
0010:[] []
nativequeuedspinlockslowpath+0x15c/0x170

2017-10-18T20:49:26.462316+00:00 node-58 kernel: [1297007.625018] RSP:
0018:ffff883fff143c30 EFLAGS: 00000202

2017-10-18T20:49:26.462317+00:00 node-58 kernel: [1297007.625019] RAX:
0000000000000101 RBX: ffff881f677603f0 RCX: 0000000000000001

2017-10-18T20:49:26.462337+00:00 node-58 kernel: [1297007.625020] RDX:
0000000000000101 RSI: 0000000000000001 RDI: ffff881f677603ec

2017-10-18T20:49:26.462340+00:00 node-58 kernel: [1297007.625020] RBP:
ffff883fff143c30 R08: 0000000000000101 R09: ffffffff81191e27

2017-10-18T20:49:26.462341+00:00 node-58 kernel: [1297007.625021] R10:
ffffea00ffb09780 R11: 0000000000000a00 R12: ffff881f677603ec

2017-10-18T20:49:26.462342+00:00 node-58 kernel: [1297007.625022] R13:
0000000000000a00 R14: 00000000000a5000 R15: 0000000000000a00

2017-10-18T20:49:26.462343+00:00 node-58 kernel: [1297007.625023] FS:
00007f0c53fb3c00(0000) GS:ffff883fff140000(0000) knlGS:0000000000000000

2017-10-18T20:49:26.462343+00:00 node-58 kernel: [1297007.625024] CS:
0010 DS: 0000 ES: 0000 CR0: 0000000080050033

2017-10-18T20:49:26.462344+00:00 node-58 kernel: [1297007.625025] CR2:
00007fe018e2547e CR3: 0000003ec0b75000 CR4: 00000000001426e0

2017-10-18T20:49:26.462345+00:00 node-58 kernel: [1297007.625026] Stack:

2017-10-18T20:49:26.462347+00:00 node-58 kernel: [1297007.625026]
ffff883fff143c40 ffffffff81842f71 ffff883fff143c60 ffffffff81841085

2017-10-18T20:49:26.462348+00:00 node-58 kernel: [1297007.625028]
ffff881dc609ac00 ffff881f677604b0 ffff883fff143c70 ffffffff818410cb

2017-10-18T20:49:26.462349+00:00 node-58 kernel: [1297007.625029]
ffff883fff143ca0 ffffffffc08c658d ffff883feff9d500 0000000000000a00

2017-10-18T20:49:26.462351+00:00 node-58 kernel: [1297007.625031] Call
Trace:

2017-10-18T20:49:26.462353+00:00 node-58 kernel: [1297007.625032]

2017-10-18T20:49:26.462354+00:00 node-58 kernel: [1297007.625039]
[] rawspin_lock+0x21/0x30

2017-10-18T20:49:26.462356+00:00 node-58 kernel: [1297007.625041]
[] __mutexunlockslowpath+0x25/0x50

2017-10-18T20:49:26.462356+00:00 node-58 kernel: [1297007.625042]
[] mutex_unlock+0x1b/0x20

2017-10-18T20:49:26.462357+00:00 node-58 kernel: [1297007.625076]
[] ocfs2dioend_io+0x6d/0x80 [ocfs2]

2017-10-18T20:49:26.462358+00:00 node-58 kernel: [1297007.625080]
[] dio_complete+0x11c/0x1c0

2017-10-18T20:49:26.462359+00:00 node-58 kernel: [1297007.625081]
[] diobioend_aio+0x73/0x100

2017-10-18T20:49:26.462361+00:00 node-58 kernel: [1297007.625085]
[] bio_endio+0x3f/0x60

2017-10-18T20:49:26.462362+00:00 node-58 kernel: [1297007.625087]
[] blkupdaterequest+0x87/0x310

2017-10-18T20:49:26.462363+00:00 node-58 kernel: [1297007.625091]
[] endclonebio+0x46/0x70

2017-10-18T20:49:26.462363+00:00 node-58 kernel: [1297007.625092]
[] bio_endio+0x3f/0x60

2017-10-18T20:49:26.462364+00:00 node-58 kernel: [1297007.625093]
[] blkupdaterequest+0x87/0x310

2017-10-18T20:49:26.462365+00:00 node-58 kernel: [1297007.625097]
[] scsiendrequest+0x33/0x1d0

2017-10-18T20:49:26.462367+00:00 node-58 kernel: [1297007.625100]
[] scsiiocompletion+0x1b6/0x690

2017-10-18T20:49:26.462368+00:00 node-58 kernel: [1297007.625104]
[] ? rebalance_domains+0x166/0x2d0

2017-10-18T20:49:26.462368+00:00 node-58 kernel: [1297007.625107]
[] scsifinishcommand+0xcf/0x120

2017-10-18T20:49:26.462377+00:00 node-58 kernel: [1297007.625109]
[] scsisoftirqdone+0x124/0x150

2017-10-18T20:49:26.462378+00:00 node-58 kernel: [1297007.625112]
[] blkdonesoftirq+0x87/0xb0

2017-10-18T20:49:26.462379+00:00 node-58 kernel: [1297007.625116]
[] __do_softirq+0x101/0x290

2017-10-18T20:49:26.462381+00:00 node-58 kernel: [1297007.625118]
[] irq_exit+0xa3/0xb0

2017-10-18T20:49:26.462382+00:00 node-58 kernel: [1297007.625121]
[] smpcallfunctionsingleinterrupt+0x33/0x40

2017-10-18T20:49:26.462382+00:00 node-58 kernel: [1297007.625124]
[] callfunctionsingle_interrupt+0x82/0x90

2017-10-18T20:49:26.462383+00:00 node-58 kernel: [1297007.625125]

2017-10-18T20:49:26.462383+00:00 node-58 kernel: [1297007.625127]
[] ? rawspin_lock+0x14/0x30

2017-10-18T20:49:26.462385+00:00 node-58 kernel: [1297007.625129]
[] __mutexlockslowpath+0x72/0x130

2017-10-18T20:49:26.462387+00:00 node-58 kernel: [1297007.625142]
[] ? ocfs2inodeunlock+0x119/0x120 [ocfs2]

2017-10-18T20:49:26.462387+00:00 node-58 kernel: [1297007.625143]
[] mutex_lock+0x1f/0x30

2017-10-18T20:49:26.462388+00:00 node-58 kernel: [1297007.625155]
[] ocfs2filewrite_iter+0x95a/0xdf0 [ocfs2]

2017-10-18T20:49:26.462388+00:00 node-58 kernel: [1297007.625158]
[] ? pollselectcopy_remaining+0x140/0x140

2017-10-18T20:49:26.462389+00:00 node-58 kernel: [1297007.625169]
[] ? ocfs2checkrangeforrefcount+0x150/0x150 [ocfs2]

2017-10-18T20:49:26.462391+00:00 node-58 kernel: [1297007.625171]
[] aioruniocb+0x26a/0x2d0

2017-10-18T20:49:26.462392+00:00 node-58 kernel: [1297007.625174]
[] ? __fget_light+0x25/0x60

2017-10-18T20:49:26.462394+00:00 node-58 kernel: [1297007.625175]
[] ? __fdget+0x13/0x20

2017-10-18T20:49:26.462395+00:00 node-58 kernel: [1297007.625177]
[] doiosubmit+0x25f/0x500

2017-10-18T20:49:26.462396+00:00 node-58 kernel: [1297007.625178]
[] SySiosubmit+0x10/0x20

2017-10-18T20:49:26.462398+00:00 node-58 kernel: [1297007.625181]
[] entrySYSCALL64_fastpath+0x16/0x71

2017-10-18T20:49:26.462399+00:00 node-58 kernel: [1297007.625181] Code:
01 48 8b 02 48 85 c0 75 0a f3 90 48 8b 02 48 85 c0 74 f6 c7 40 08 01 00 00
00 e9 63 ff ff ff 83 fa 01 75 07 e9 c4 fe ff ff f3 90 <8b> 07 84 c0 75 f8
b8 01 00 00 00 66 89 07 5d c3 0f 1f 40 00 0f


Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
responded Nov 10, 2017 by Jim_Okken (480 points)   1 3
...