[ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

Wed Feb 20 07:57:05 UTC 2019

Edwin,
> 
> 
> On 19/02/2019 17:02, Klaus Wenninger wrote:
>> On 02/19/2019 05:41 PM, Edwin Török wrote:
>>> On 19/02/2019 16:26, Edwin Török wrote:
>>>> On 18/02/2019 18:27, Edwin Török wrote:
>>>>> Did a test today with CentOS 7.6 with upstream kernel and with
>>>>> 4.20.10-1.el7.elrepo.x86_64 (tested both with upstream SBD, and our
>>>>> patched [1] SBD) and was not able to reproduce the issue yet.
>>>> I was able to finally reproduce this using only upstream components
>>>> (although it seems to be easier to reproduce if we use our patched SBD,
>>>> I was able to reproduce this by using only upstream packages unpatched
>>>> by us):
>>
>> Just out of curiosity: What did you patch in SBD?
>> Sorry if I missed the answer in the previous communication.
> 
> It is mostly this PR, which calls getquorate quite often (a more
> efficient impl. would be to use the quorum notification API like
> dlm/pacemaker do, although see concerns in
> https://lists.clusterlabs.org/pipermail/users/2019-February/016249.html):
> https://github.com/ClusterLabs/sbd/pull/27
> 
> We have also added our own servant for watching the health of our
> control plane, but that is not relevant to this bug (it reproduces with
> that watcher turned off too).
> 
>>
>>> I was also able to get a corosync blackbox from one of the stuck VMs
>>> that showed something interesting:
>>> https://clbin.com/d76Ha
>>>
>>> It is looping on:
>>> debug   Feb 19 16:37:24 mcast_sendmsg(408):12: sendmsg(mcast) failed
>>> (non-critical): Resource temporarily unavailable (11)
>>
>> Hmm ... something like tx-queue of the device full, or no buffers
>> available anymore and kernel-thread doing the cleanup isn't
>> scheduled ...
> 
> Yes that is very plausible. Perhaps it'd be nicer if corosync went back
> to the epoll_wait loop when it gets too many EAGAINs from sendmsg.

But this is exactly what happens. Corosync will call sendmsg to all 
active udpu members and returns back to main loop -> epoll_wait.

> (although this seems different from the original bug where it got stuck
> in epoll_wait)

I'm pretty sure it is.

Anyway, let's try "sched_yield" idea. Could you please try included 
patch and see if it makes any difference (only for udpu)?

Regards,
   Honza

> 
>> Does the kernel log anything in that situation?
> 
> Other than the crmd segfault no.
>  From previous observations on xenserver the softirqs were all stuck on
> the CPU that corosync hogged 100% (I'll check this on upstream, but I'm
> fairly sure it'll be the same). softirqs do not run at realtime priority
> (if we increase the priority of ksoftirqd to realtime then it all gets
> unstuck), but seem to be essential for whatever corosync is stuck
> waiting on, in this case likely the sending/receiving of network packets.
> 
> I'm trying to narrow down the kernel between 4.19.16 and 4.20.10 to see
> why this was only reproducible on 4.19 so far.
> 
> Best regards,
> --Edwin
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sched-yield.patch
Type: text/x-patch
Size: 749 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190220/161a5819/attachment-0001.bin>