[ClusterLabs] Antw: Re: corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

Wed Feb 20 07:07:52 UTC 2019

>>> Klaus Wenninger <kwenning at redhat.com> schrieb am 19.02.2019 um 18:02 in
Nachricht <7b626ca1-4f59-6257-bfb5-ef5d0d823469 at redhat.com>:
[...]
>>
>> It is looping on:
>> debug   Feb 19 16:37:24 mcast_sendmsg(408):12: sendmsg(mcast) failed
>> (non-critical): Resource temporarily unavailable (11)

I wonder whether this is the reason for looping or the consequence of loop-sending. To me it looks like a good idea to try sched_yield() in this situation. Maybe then the other tasks have a chance to empty the send queue.

> 
> Hmm ... something like tx-queue of the device full, or no buffers
> available anymore and kernel-thread doing the cleanup isn't
> scheduled ...
> Does the kernel log anything in that situation?
> 
>>
>> Also noticed this:
>> [ 5390.361861] crmd[12620]: segfault at 0 ip 00007f221c5e03b1 sp
>> 00007ffcf9cf9d88 error 4 in libc-2.17.so[7f221c554000+1c2000]
>> [ 5390.361918] Code: b8 00 00 00 04 00 00 00 74 07 48 8d 05 f8 f2 0d 00
>> c3 0f 1f 80 00 00 00 00 48 31 c0 89 f9 83 e1 3f 66 0f ef c0 83 f9 30 77
>> 19 <f3> 0f 6f 0f 66 0f 74 c1 66 0f d7 d0 85 d2 75 7a 48 89 f8 48 83 e0

Maybe time to enable core dumps...

[...]

Regards,
Ulrich Windl