[ClusterLabs] Antw: Re: corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

Wed Feb 20 09:13:55 UTC 2019

On 02/20/2019 08:07 AM, Ulrich Windl wrote:
>>>> Klaus Wenninger <kwenning at redhat.com> schrieb am 19.02.2019 um 18:02 in
> Nachricht <7b626ca1-4f59-6257-bfb5-ef5d0d823469 at redhat.com>:
> [...]
>>> It is looping on:
>>> debug   Feb 19 16:37:24 mcast_sendmsg(408):12: sendmsg(mcast) failed
>>> (non-critical): Resource temporarily unavailable (11)
> I wonder whether this is the reason for looping or the consequence of loop-sending. To me it looks like a good idea to try sched_yield() in this situation. Maybe then the other tasks have a chance to empty the send queue.

Doesn't that just trigger RR. So if the other threads aren't SCHED_RR at the
same prio would it help?

>
>> Hmm ... something like tx-queue of the device full, or no buffers
>> available anymore and kernel-thread doing the cleanup isn't
>> scheduled ...
>> Does the kernel log anything in that situation?
>>
>>> Also noticed this:
>>> [ 5390.361861] crmd[12620]: segfault at 0 ip 00007f221c5e03b1 sp
>>> 00007ffcf9cf9d88 error 4 in libc-2.17.so[7f221c554000+1c2000]
>>> [ 5390.361918] Code: b8 00 00 00 04 00 00 00 74 07 48 8d 05 f8 f2 0d 00
>>> c3 0f 1f 80 00 00 00 00 48 31 c0 89 f9 83 e1 3f 66 0f ef c0 83 f9 30 77
>>> 19 <f3> 0f 6f 0f 66 0f 74 c1 66 0f d7 d0 85 d2 75 7a 48 89 f8 48 83 e0
> Maybe time to enable core dumps...
>
> [...]
>
> Regards,
> Ulrich Windl
>
>