[ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?
Edwin Török
edvin.torok at citrix.com
Wed Feb 20 14:03:03 UTC 2019
On 20/02/2019 12:44, Jan Pokorný wrote:
> On 19/02/19 16:41 +0000, Edwin Török wrote:
>> Also noticed this: [ 5390.361861] crmd[12620]: segfault at 0 ip
>> 00007f221c5e03b1 sp 00007ffcf9cf9d88 error 4 in
>> libc-2.17.so[7f221c554000+1c2000] [ 5390.361918] Code: b8 00 00
>> 00 04 00 00 00 74 07 48 8d 05 f8 f2 0d 00 c3 0f 1f 80 00 00 00 00
>> 48 31 c0 89 f9 83 e1 3f 66 0f ef c0 83 f9 30 77 19 <f3> 0f 6f 0f
>> 66 0f 74 c1 66 0f d7 d0 85 d2 75 7a 48 89 f8 48 83 e0
>
> By any chance, is this an unmodified pacemaker package as
> obtainable from some public repo together with debug symbols?
I haven't modified pacemaker, here are the versions:
rpm -q pacemaker
pacemaker-1.1.19-8.el7.x86_64
rpm -q glibc
glibc-2.17-260.el7_6.3.x86_64
0x00007f221c5e03b1 - 0x7f221c554000 = 0x8c3b1
addr2line -fie /lib64/libc.so.6 0x8c3b1
__GI_strlen
:?
Feb 19 16:22:04 host-10 crmd[12620]: notice: Additional logging
available in /var/log/cluster/corosync.log
Feb 19 16:22:05 host-10 crmd[12620]: notice: Connecting to cluster
infrastructure: corosync
Feb 19 16:29:50 host-10 crmd[12620]: error: Could not join the CPG
group 'crmd': 6
Feb 19 16:29:50 host-10 kernel: crmd[12620]: segfault at 0 ip
00007f221c5e03b1 sp 00007ffcf9cf9d88 error 4 in
libc-2.17.so[7f221c554000+1c2000]
Feb 19 16:38:28 host-10 pacemakerd[12614]: error: Managed process
12620 (crmd) dumped core
Feb 19 16:38:28 host-10 pacemakerd[12614]: error: The crmd process
(12620) terminated with signal 11 (core=1)
I found a core file in /var/lib/pacemaker/cores
(gdb) bt
#0 0x00007f221c5e03b1 in __strlen_sse2 () from /lib64/libc.so.6
#1 0x00007f221c5e00be in strdup () from /lib64/libc.so.6
#2 0x00007f221f1a05cd in election_init (name=name at entry=0x0,
uname=0x0, period_ms=period_ms at entry=60000, cb=cb at entry=0x55ea42cb2790
<election_timeout_popped>)
at election.c:78
#3 0x000055ea42cb3d4c in do_ha_control (action=4, cause=<optimized
out>, cur_state=<optimized out>, current_input=<optimized out>,
msg_data=0x55ea4464fec0)
at control.c:139
#4 0x000055ea42cb0524 in s_crmd_fsa_actions
(fsa_data=fsa_data at entry=0x55ea4464fec0) at fsa.c:305
#5 0x000055ea42cb216a in s_crmd_fsa (cause=cause at entry=C_STARTUP) at
fsa.c:237
#6 0x000055ea42cad707 in crmd_init () at main.c:173
#7 0x000055ea42cad510 in main (argc=1, argv=0x7ffcf9cfa078) at main.c:122
g
Best regards,
--Edwin
More information about the Users
mailing list