[ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

Klaus Wenninger kwenning at redhat.com
Tue Feb 19 12:02:08 EST 2019


On 02/19/2019 05:41 PM, Edwin Török wrote:
> On 19/02/2019 16:26, Edwin Török wrote:
>> On 18/02/2019 18:27, Edwin Török wrote:
>>> Did a test today with CentOS 7.6 with upstream kernel and with
>>> 4.20.10-1.el7.elrepo.x86_64 (tested both with upstream SBD, and our
>>> patched [1] SBD) and was not able to reproduce the issue yet.
>> I was able to finally reproduce this using only upstream components
>> (although it seems to be easier to reproduce if we use our patched SBD,
>> I was able to reproduce this by using only upstream packages unpatched
>> by us):

Just out of curiosity: What did you patch in SBD?
Sorry if I missed the answer in the previous communication.

> I was also able to get a corosync blackbox from one of the stuck VMs
> that showed something interesting:
> https://clbin.com/d76Ha
>
> It is looping on:
> debug   Feb 19 16:37:24 mcast_sendmsg(408):12: sendmsg(mcast) failed
> (non-critical): Resource temporarily unavailable (11)

Hmm ... something like tx-queue of the device full, or no buffers
available anymore and kernel-thread doing the cleanup isn't
scheduled ...
Does the kernel log anything in that situation?

>
> Also noticed this:
> [ 5390.361861] crmd[12620]: segfault at 0 ip 00007f221c5e03b1 sp
> 00007ffcf9cf9d88 error 4 in libc-2.17.so[7f221c554000+1c2000]
> [ 5390.361918] Code: b8 00 00 00 04 00 00 00 74 07 48 8d 05 f8 f2 0d 00
> c3 0f 1f 80 00 00 00 00 48 31 c0 89 f9 83 e1 3f 66 0f ef c0 83 f9 30 77
> 19 <f3> 0f 6f 0f 66 0f 74 c1 66 0f d7 d0 85 d2 75 7a 48 89 f8 48 83 e0
>
>
>> CentOS 7.6 vmlinuz-3.10.0-957.el7.x86_64: OK
>> CentOS 7.6 vmlinuz-4.19.16-200.fc28.x86_64: 100% CPU usage corosync
>> CentOS 7.6 vmlinuz-4.19-xen (XenServer): 100% CPU usage corosync
>> CentOS 7.6 vmlinuz-4.20.10-1.el7.elrepo.x86_64: OK
>>
>> I got the 4.19.16 kernel from:
>> https://koji.fedoraproject.org/koji/buildinfo?buildID=1180301
>>
>> Setup: 16 CentOS 7.6 VMs, 4 vCPUs, 4GiB RAM running on XenServer 7.6
>> (Xen 4.7.6)
>> Host is a Dell Poweredge R430, Xeon E5-2630 v3.
>>
>> On each VM:
>> # yum install -y corosync dlm pcs pacemaker fence-agents-all sbd
>> # echo mypassword | passwd hacluster --stdin
>> # systemctl enable --now pcsd
>> # echo xen_wdt >/etc/modules-load.d/watchdog.conf
>> # modprobe xen_wdt
>> # hostnamectl set-hostname host-<ip-address>
>>
>> On one host:
>> # pcs cluster auth -u hacluster -p xenroot <allips>
>> # pcs cluster setup --name cluster --auto_tie_breaker=1 <allips>
>>
>> # pcs stonith sbd enable
>> # pcs cluster enable --all
>> # pcs cluster start --all
>> # pcs property set no-quorum-policy=freeze
>> # pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s
>> on-fail=fence clone interleave=true ordered=true
>> # pcs property set stonith-watchdog-timeout=10s
>>
>> In a loop on this host:
>> # while true; do pcs cluster stop; pcs cluster start; corosync-cfgtool
>> -R; done
>>
>> # rpm -q corosync pacemaker sbd libqb
>> corosync-2.4.3-4.el7.x86_64
>> pacemaker-1.1.19-8.el7.x86_64
>> sbd-1.3.1-8.2.el7.x86_64
>> libqb-1.0.1-7.el7.x86_64
>>
>> Watch the other VMs, if the bug happens you would loose SSH, or see
>> corosync using 100% CPU, or notice that simply the pane with that VM is
>> not updating.
>> For watching the other VMs I used this script inside tmux and used 'setw
>> synchronize-panes on':
>> https://github.com/xapi-project/testarossa/blob/master/scripts/tmuxmulti.sh
>> # scripts/tmuxmulti.sh 'ssh root@{}' 10.62.98.34 10.62.98.38 10.62.98.23
>> 10.62.98.30 10.62.98.40 10.62.98.36 10.62.98.29 10.62.98.35 10.62.98.28
>> 10.62.98.37 10.62.98.27 10.62.98.39 10.62.98.32 10.62.98.26 10.62.98.31
>> 10.62.98.33
>>
>> Some VMs sometimes fence, some VMs just lock up (I think it depends how
>> many VMs lock up, if it is too many the other ones loose quorum and
>> fence correctly) and do not fence.
>>
>> Another observation: after reproducing the problem even if I stop the
>> pcs cluster start/stop loop and reboot all VMs they seem to still end up
>> in the bad 100% cpu usage state sometimes.
>>
>> P.S.: taking a  disk+memory snapshot of the VM is also enough to get
>> corosync out of the bad state, when the VM is resumed its cpu usage goes
>> down to 0.3%.
>>
>> Here is how a frozen VM looks like (logged in via serial using `xl
>> console`):
>> top - 16:25:23 up  1:25,  3 users,  load average: 3.86, 1.96, 0.78
>> Tasks: 133 total,   4 running,  68 sleeping,   0 stopped,   0 zombie
>> %Cpu(s): 11.1 us, 14.3 sy,  0.0 ni, 49.2 id, 22.2 wa,  1.6 hi,  1.6 si,
>>  0.0 st
>> KiB Mem :  4005228 total,  3507244 free,   264960 used,   233024 buff/cache
>> KiB Swap:  1048572 total,  1048572 free,        0 used.  3452980 avail Mem
>>
>>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
>> COMMAND
>>   4975 root      rt   0  216460 114152  84732 R 100.0  2.9   4:08.29
>> corosync
>>      1 root      20   0  191036   5300   3884 S   0.0  0.1   0:02.14
>> systemd
>>      2 root      20   0       0      0      0 S   0.0  0.0   0:00.00
>> kthreadd
>>      3 root       0 -20       0      0      0 I   0.0  0.0   0:00.00
>> rcu_gp
>>      4 root       0 -20       0      0      0 I   0.0  0.0   0:00.00
>> rcu_par_gp
>>      6 root       0 -20       0      0      0 I   0.0  0.0   0:00.00
>> kworker/0+
>>      7 root      20   0       0      0      0 I   0.0  0.0   0:00.00
>> kworker/u+
>>      8 root       0 -20       0      0      0 I   0.0  0.0   0:00.00
>> mm_percpu+
>>      9 root      20   0       0      0      0 S   0.0  0.0   0:00.01
>> ksoftirqd+
>>     10 root      20   0       0      0      0 I   0.0  0.0   0:01.00
>> rcu_sched
>>     11 root      20   0       0      0      0 I   0.0  0.0   0:00.00
>> rcu_bh
>>     12 root      rt   0       0      0      0 S   0.0  0.0   0:00.01
>> migration+
>>     14 root      20   0       0      0      0 S   0.0  0.0   0:00.00
>> cpuhp/0
>>     15 root      20   0       0      0      0 S   0.0  0.0   0:00.00
>> cpuhp/1
>>     16 root      rt   0       0      0      0 S   0.0  0.0   0:00.01
>> migration+
>>     17 root      20   0       0      0      0 S   0.0  0.0   0:00.00
>> ksoftirqd+
>>     19 root       0 -20       0      0      0 I   0.0  0.0   0:00.00
>> kworker/1+
>>
>> [root at host-10 ~]# uname -a
>> Linux host-10.62.98.36 4.19.16-200.fc28.x86_64 #1 SMP Thu Jan 17
>> 00:16:20 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
>>
>> Best regards,
>> --Edwin
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>




More information about the Users mailing list