[ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

Tue Feb 19 16:41:57 UTC 2019

On 19/02/2019 16:26, Edwin Török wrote:
> On 18/02/2019 18:27, Edwin Török wrote:
>> Did a test today with CentOS 7.6 with upstream kernel and with
>> 4.20.10-1.el7.elrepo.x86_64 (tested both with upstream SBD, and our
>> patched [1] SBD) and was not able to reproduce the issue yet.
> 
> I was able to finally reproduce this using only upstream components
> (although it seems to be easier to reproduce if we use our patched SBD,
> I was able to reproduce this by using only upstream packages unpatched
> by us):

I was also able to get a corosync blackbox from one of the stuck VMs
that showed something interesting:
https://clbin.com/d76Ha

It is looping on:
debug   Feb 19 16:37:24 mcast_sendmsg(408):12: sendmsg(mcast) failed
(non-critical): Resource temporarily unavailable (11)

Also noticed this:
[ 5390.361861] crmd[12620]: segfault at 0 ip 00007f221c5e03b1 sp
00007ffcf9cf9d88 error 4 in libc-2.17.so[7f221c554000+1c2000]
[ 5390.361918] Code: b8 00 00 00 04 00 00 00 74 07 48 8d 05 f8 f2 0d 00
c3 0f 1f 80 00 00 00 00 48 31 c0 89 f9 83 e1 3f 66 0f ef c0 83 f9 30 77
19 <f3> 0f 6f 0f 66 0f 74 c1 66 0f d7 d0 85 d2 75 7a 48 89 f8 48 83 e0


> 
> CentOS 7.6 vmlinuz-3.10.0-957.el7.x86_64: OK
> CentOS 7.6 vmlinuz-4.19.16-200.fc28.x86_64: 100% CPU usage corosync
> CentOS 7.6 vmlinuz-4.19-xen (XenServer): 100% CPU usage corosync
> CentOS 7.6 vmlinuz-4.20.10-1.el7.elrepo.x86_64: OK
> 
> I got the 4.19.16 kernel from:
> https://koji.fedoraproject.org/koji/buildinfo?buildID=1180301
> 
> Setup: 16 CentOS 7.6 VMs, 4 vCPUs, 4GiB RAM running on XenServer 7.6
> (Xen 4.7.6)
> Host is a Dell Poweredge R430, Xeon E5-2630 v3.
> 
> On each VM:
> # yum install -y corosync dlm pcs pacemaker fence-agents-all sbd
> # echo mypassword | passwd hacluster --stdin
> # systemctl enable --now pcsd
> # echo xen_wdt >/etc/modules-load.d/watchdog.conf
> # modprobe xen_wdt
> # hostnamectl set-hostname host-<ip-address>
> 
> On one host:
> # pcs cluster auth -u hacluster -p xenroot <allips>
> # pcs cluster setup --name cluster --auto_tie_breaker=1 <allips>
> 
> # pcs stonith sbd enable
> # pcs cluster enable --all
> # pcs cluster start --all
> # pcs property set no-quorum-policy=freeze
> # pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s
> on-fail=fence clone interleave=true ordered=true
> # pcs property set stonith-watchdog-timeout=10s
> 
> In a loop on this host:
> # while true; do pcs cluster stop; pcs cluster start; corosync-cfgtool
> -R; done
> 
> # rpm -q corosync pacemaker sbd libqb
> corosync-2.4.3-4.el7.x86_64
> pacemaker-1.1.19-8.el7.x86_64
> sbd-1.3.1-8.2.el7.x86_64
> libqb-1.0.1-7.el7.x86_64
> 
> Watch the other VMs, if the bug happens you would loose SSH, or see
> corosync using 100% CPU, or notice that simply the pane with that VM is
> not updating.
> For watching the other VMs I used this script inside tmux and used 'setw
> synchronize-panes on':
> https://github.com/xapi-project/testarossa/blob/master/scripts/tmuxmulti.sh
> # scripts/tmuxmulti.sh 'ssh root@{}' 10.62.98.34 10.62.98.38 10.62.98.23
> 10.62.98.30 10.62.98.40 10.62.98.36 10.62.98.29 10.62.98.35 10.62.98.28
> 10.62.98.37 10.62.98.27 10.62.98.39 10.62.98.32 10.62.98.26 10.62.98.31
> 10.62.98.33
> 
> Some VMs sometimes fence, some VMs just lock up (I think it depends how
> many VMs lock up, if it is too many the other ones loose quorum and
> fence correctly) and do not fence.
> 
> Another observation: after reproducing the problem even if I stop the
> pcs cluster start/stop loop and reboot all VMs they seem to still end up
> in the bad 100% cpu usage state sometimes.
> 
> P.S.: taking a  disk+memory snapshot of the VM is also enough to get
> corosync out of the bad state, when the VM is resumed its cpu usage goes
> down to 0.3%.
> 
> Here is how a frozen VM looks like (logged in via serial using `xl
> console`):
> top - 16:25:23 up  1:25,  3 users,  load average: 3.86, 1.96, 0.78
> Tasks: 133 total,   4 running,  68 sleeping,   0 stopped,   0 zombie
> %Cpu(s): 11.1 us, 14.3 sy,  0.0 ni, 49.2 id, 22.2 wa,  1.6 hi,  1.6 si,
>  0.0 st
> KiB Mem :  4005228 total,  3507244 free,   264960 used,   233024 buff/cache
> KiB Swap:  1048572 total,  1048572 free,        0 used.  3452980 avail Mem
> 
>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
> COMMAND
>   4975 root      rt   0  216460 114152  84732 R 100.0  2.9   4:08.29
> corosync
>      1 root      20   0  191036   5300   3884 S   0.0  0.1   0:02.14
> systemd
>      2 root      20   0       0      0      0 S   0.0  0.0   0:00.00
> kthreadd
>      3 root       0 -20       0      0      0 I   0.0  0.0   0:00.00
> rcu_gp
>      4 root       0 -20       0      0      0 I   0.0  0.0   0:00.00
> rcu_par_gp
>      6 root       0 -20       0      0      0 I   0.0  0.0   0:00.00
> kworker/0+
>      7 root      20   0       0      0      0 I   0.0  0.0   0:00.00
> kworker/u+
>      8 root       0 -20       0      0      0 I   0.0  0.0   0:00.00
> mm_percpu+
>      9 root      20   0       0      0      0 S   0.0  0.0   0:00.01
> ksoftirqd+
>     10 root      20   0       0      0      0 I   0.0  0.0   0:01.00
> rcu_sched
>     11 root      20   0       0      0      0 I   0.0  0.0   0:00.00
> rcu_bh
>     12 root      rt   0       0      0      0 S   0.0  0.0   0:00.01
> migration+
>     14 root      20   0       0      0      0 S   0.0  0.0   0:00.00
> cpuhp/0
>     15 root      20   0       0      0      0 S   0.0  0.0   0:00.00
> cpuhp/1
>     16 root      rt   0       0      0      0 S   0.0  0.0   0:00.01
> migration+
>     17 root      20   0       0      0      0 S   0.0  0.0   0:00.00
> ksoftirqd+
>     19 root       0 -20       0      0      0 I   0.0  0.0   0:00.00
> kworker/1+
> 
> [root at host-10 ~]# uname -a
> Linux host-10.62.98.36 4.19.16-200.fc28.x86_64 #1 SMP Thu Jan 17
> 00:16:20 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
> 
> Best regards,
> --Edwin
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>