[ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?
Edwin Török
edvin.torok at citrix.com
Tue Feb 19 11:41:57 EST 2019
On 19/02/2019 16:26, Edwin Török wrote:
> On 18/02/2019 18:27, Edwin Török wrote:
>> Did a test today with CentOS 7.6 with upstream kernel and with
>> 4.20.10-1.el7.elrepo.x86_64 (tested both with upstream SBD, and our
>> patched [1] SBD) and was not able to reproduce the issue yet.
>
> I was able to finally reproduce this using only upstream components
> (although it seems to be easier to reproduce if we use our patched SBD,
> I was able to reproduce this by using only upstream packages unpatched
> by us):
I was also able to get a corosync blackbox from one of the stuck VMs
that showed something interesting:
https://clbin.com/d76Ha
It is looping on:
debug Feb 19 16:37:24 mcast_sendmsg(408):12: sendmsg(mcast) failed
(non-critical): Resource temporarily unavailable (11)
Also noticed this:
[ 5390.361861] crmd[12620]: segfault at 0 ip 00007f221c5e03b1 sp
00007ffcf9cf9d88 error 4 in libc-2.17.so[7f221c554000+1c2000]
[ 5390.361918] Code: b8 00 00 00 04 00 00 00 74 07 48 8d 05 f8 f2 0d 00
c3 0f 1f 80 00 00 00 00 48 31 c0 89 f9 83 e1 3f 66 0f ef c0 83 f9 30 77
19 <f3> 0f 6f 0f 66 0f 74 c1 66 0f d7 d0 85 d2 75 7a 48 89 f8 48 83 e0
>
> CentOS 7.6 vmlinuz-3.10.0-957.el7.x86_64: OK
> CentOS 7.6 vmlinuz-4.19.16-200.fc28.x86_64: 100% CPU usage corosync
> CentOS 7.6 vmlinuz-4.19-xen (XenServer): 100% CPU usage corosync
> CentOS 7.6 vmlinuz-4.20.10-1.el7.elrepo.x86_64: OK
>
> I got the 4.19.16 kernel from:
> https://koji.fedoraproject.org/koji/buildinfo?buildID=1180301
>
> Setup: 16 CentOS 7.6 VMs, 4 vCPUs, 4GiB RAM running on XenServer 7.6
> (Xen 4.7.6)
> Host is a Dell Poweredge R430, Xeon E5-2630 v3.
>
> On each VM:
> # yum install -y corosync dlm pcs pacemaker fence-agents-all sbd
> # echo mypassword | passwd hacluster --stdin
> # systemctl enable --now pcsd
> # echo xen_wdt >/etc/modules-load.d/watchdog.conf
> # modprobe xen_wdt
> # hostnamectl set-hostname host-<ip-address>
>
> On one host:
> # pcs cluster auth -u hacluster -p xenroot <allips>
> # pcs cluster setup --name cluster --auto_tie_breaker=1 <allips>
>
> # pcs stonith sbd enable
> # pcs cluster enable --all
> # pcs cluster start --all
> # pcs property set no-quorum-policy=freeze
> # pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s
> on-fail=fence clone interleave=true ordered=true
> # pcs property set stonith-watchdog-timeout=10s
>
> In a loop on this host:
> # while true; do pcs cluster stop; pcs cluster start; corosync-cfgtool
> -R; done
>
> # rpm -q corosync pacemaker sbd libqb
> corosync-2.4.3-4.el7.x86_64
> pacemaker-1.1.19-8.el7.x86_64
> sbd-1.3.1-8.2.el7.x86_64
> libqb-1.0.1-7.el7.x86_64
>
> Watch the other VMs, if the bug happens you would loose SSH, or see
> corosync using 100% CPU, or notice that simply the pane with that VM is
> not updating.
> For watching the other VMs I used this script inside tmux and used 'setw
> synchronize-panes on':
> https://github.com/xapi-project/testarossa/blob/master/scripts/tmuxmulti.sh
> # scripts/tmuxmulti.sh 'ssh root@{}' 10.62.98.34 10.62.98.38 10.62.98.23
> 10.62.98.30 10.62.98.40 10.62.98.36 10.62.98.29 10.62.98.35 10.62.98.28
> 10.62.98.37 10.62.98.27 10.62.98.39 10.62.98.32 10.62.98.26 10.62.98.31
> 10.62.98.33
>
> Some VMs sometimes fence, some VMs just lock up (I think it depends how
> many VMs lock up, if it is too many the other ones loose quorum and
> fence correctly) and do not fence.
>
> Another observation: after reproducing the problem even if I stop the
> pcs cluster start/stop loop and reboot all VMs they seem to still end up
> in the bad 100% cpu usage state sometimes.
>
> P.S.: taking a disk+memory snapshot of the VM is also enough to get
> corosync out of the bad state, when the VM is resumed its cpu usage goes
> down to 0.3%.
>
> Here is how a frozen VM looks like (logged in via serial using `xl
> console`):
> top - 16:25:23 up 1:25, 3 users, load average: 3.86, 1.96, 0.78
> Tasks: 133 total, 4 running, 68 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 11.1 us, 14.3 sy, 0.0 ni, 49.2 id, 22.2 wa, 1.6 hi, 1.6 si,
> 0.0 st
> KiB Mem : 4005228 total, 3507244 free, 264960 used, 233024 buff/cache
> KiB Swap: 1048572 total, 1048572 free, 0 used. 3452980 avail Mem
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> COMMAND
> 4975 root rt 0 216460 114152 84732 R 100.0 2.9 4:08.29
> corosync
> 1 root 20 0 191036 5300 3884 S 0.0 0.1 0:02.14
> systemd
> 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00
> kthreadd
> 3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
> rcu_gp
> 4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
> rcu_par_gp
> 6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
> kworker/0+
> 7 root 20 0 0 0 0 I 0.0 0.0 0:00.00
> kworker/u+
> 8 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
> mm_percpu+
> 9 root 20 0 0 0 0 S 0.0 0.0 0:00.01
> ksoftirqd+
> 10 root 20 0 0 0 0 I 0.0 0.0 0:01.00
> rcu_sched
> 11 root 20 0 0 0 0 I 0.0 0.0 0:00.00
> rcu_bh
> 12 root rt 0 0 0 0 S 0.0 0.0 0:00.01
> migration+
> 14 root 20 0 0 0 0 S 0.0 0.0 0:00.00
> cpuhp/0
> 15 root 20 0 0 0 0 S 0.0 0.0 0:00.00
> cpuhp/1
> 16 root rt 0 0 0 0 S 0.0 0.0 0:00.01
> migration+
> 17 root 20 0 0 0 0 S 0.0 0.0 0:00.00
> ksoftirqd+
> 19 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
> kworker/1+
>
> [root at host-10 ~]# uname -a
> Linux host-10.62.98.36 4.19.16-200.fc28.x86_64 #1 SMP Thu Jan 17
> 00:16:20 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
>
> Best regards,
> --Edwin
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list