[ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

Edwin Török edvin.torok at citrix.com
Tue Feb 19 16:26:12 UTC 2019


On 18/02/2019 18:27, Edwin Török wrote:
> Did a test today with CentOS 7.6 with upstream kernel and with
> 4.20.10-1.el7.elrepo.x86_64 (tested both with upstream SBD, and our
> patched [1] SBD) and was not able to reproduce the issue yet.

I was able to finally reproduce this using only upstream components
(although it seems to be easier to reproduce if we use our patched SBD,
I was able to reproduce this by using only upstream packages unpatched
by us):

CentOS 7.6 vmlinuz-3.10.0-957.el7.x86_64: OK
CentOS 7.6 vmlinuz-4.19.16-200.fc28.x86_64: 100% CPU usage corosync
CentOS 7.6 vmlinuz-4.19-xen (XenServer): 100% CPU usage corosync
CentOS 7.6 vmlinuz-4.20.10-1.el7.elrepo.x86_64: OK

I got the 4.19.16 kernel from:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1180301

Setup: 16 CentOS 7.6 VMs, 4 vCPUs, 4GiB RAM running on XenServer 7.6
(Xen 4.7.6)
Host is a Dell Poweredge R430, Xeon E5-2630 v3.

On each VM:
# yum install -y corosync dlm pcs pacemaker fence-agents-all sbd
# echo mypassword | passwd hacluster --stdin
# systemctl enable --now pcsd
# echo xen_wdt >/etc/modules-load.d/watchdog.conf
# modprobe xen_wdt
# hostnamectl set-hostname host-<ip-address>

On one host:
# pcs cluster auth -u hacluster -p xenroot <allips>
# pcs cluster setup --name cluster --auto_tie_breaker=1 <allips>

# pcs stonith sbd enable
# pcs cluster enable --all
# pcs cluster start --all
# pcs property set no-quorum-policy=freeze
# pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s
on-fail=fence clone interleave=true ordered=true
# pcs property set stonith-watchdog-timeout=10s

In a loop on this host:
# while true; do pcs cluster stop; pcs cluster start; corosync-cfgtool
-R; done

# rpm -q corosync pacemaker sbd libqb
corosync-2.4.3-4.el7.x86_64
pacemaker-1.1.19-8.el7.x86_64
sbd-1.3.1-8.2.el7.x86_64
libqb-1.0.1-7.el7.x86_64

Watch the other VMs, if the bug happens you would loose SSH, or see
corosync using 100% CPU, or notice that simply the pane with that VM is
not updating.
For watching the other VMs I used this script inside tmux and used 'setw
synchronize-panes on':
https://github.com/xapi-project/testarossa/blob/master/scripts/tmuxmulti.sh
# scripts/tmuxmulti.sh 'ssh root@{}' 10.62.98.34 10.62.98.38 10.62.98.23
10.62.98.30 10.62.98.40 10.62.98.36 10.62.98.29 10.62.98.35 10.62.98.28
10.62.98.37 10.62.98.27 10.62.98.39 10.62.98.32 10.62.98.26 10.62.98.31
10.62.98.33

Some VMs sometimes fence, some VMs just lock up (I think it depends how
many VMs lock up, if it is too many the other ones loose quorum and
fence correctly) and do not fence.

Another observation: after reproducing the problem even if I stop the
pcs cluster start/stop loop and reboot all VMs they seem to still end up
in the bad 100% cpu usage state sometimes.

P.S.: taking a  disk+memory snapshot of the VM is also enough to get
corosync out of the bad state, when the VM is resumed its cpu usage goes
down to 0.3%.

Here is how a frozen VM looks like (logged in via serial using `xl
console`):
top - 16:25:23 up  1:25,  3 users,  load average: 3.86, 1.96, 0.78
Tasks: 133 total,   4 running,  68 sleeping,   0 stopped,   0 zombie
%Cpu(s): 11.1 us, 14.3 sy,  0.0 ni, 49.2 id, 22.2 wa,  1.6 hi,  1.6 si,
 0.0 st
KiB Mem :  4005228 total,  3507244 free,   264960 used,   233024 buff/cache
KiB Swap:  1048572 total,  1048572 free,        0 used.  3452980 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
COMMAND
  4975 root      rt   0  216460 114152  84732 R 100.0  2.9   4:08.29
corosync
     1 root      20   0  191036   5300   3884 S   0.0  0.1   0:02.14
systemd
     2 root      20   0       0      0      0 S   0.0  0.0   0:00.00
kthreadd
     3 root       0 -20       0      0      0 I   0.0  0.0   0:00.00
rcu_gp
     4 root       0 -20       0      0      0 I   0.0  0.0   0:00.00
rcu_par_gp
     6 root       0 -20       0      0      0 I   0.0  0.0   0:00.00
kworker/0+
     7 root      20   0       0      0      0 I   0.0  0.0   0:00.00
kworker/u+
     8 root       0 -20       0      0      0 I   0.0  0.0   0:00.00
mm_percpu+
     9 root      20   0       0      0      0 S   0.0  0.0   0:00.01
ksoftirqd+
    10 root      20   0       0      0      0 I   0.0  0.0   0:01.00
rcu_sched
    11 root      20   0       0      0      0 I   0.0  0.0   0:00.00
rcu_bh
    12 root      rt   0       0      0      0 S   0.0  0.0   0:00.01
migration+
    14 root      20   0       0      0      0 S   0.0  0.0   0:00.00
cpuhp/0
    15 root      20   0       0      0      0 S   0.0  0.0   0:00.00
cpuhp/1
    16 root      rt   0       0      0      0 S   0.0  0.0   0:00.01
migration+
    17 root      20   0       0      0      0 S   0.0  0.0   0:00.00
ksoftirqd+
    19 root       0 -20       0      0      0 I   0.0  0.0   0:00.00
kworker/1+

[root at host-10 ~]# uname -a
Linux host-10.62.98.36 4.19.16-200.fc28.x86_64 #1 SMP Thu Jan 17
00:16:20 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Best regards,
--Edwin


More information about the Users mailing list