[ClusterLabs] Antw: Re: Corosync: 100% cpu (corosync 2.3.5, libqb 0.17.1, pacemaker 1.1.13)

Fri Aug 7 06:01:58 UTC 2015

I know that corosync runs at "moderate real-time priority". Despite of the fact that I wonder whether it's a work-around for some bugs in corosync, have you tried running DRBD with real-time priority also? I never tried to change the priority of a kernel thread, however...

>>> Pallai Roland <pallair at magex.hu> schrieb am 06.08.2015 um 15:54 in Nachricht
<CALj=1whCfZhjF97Dg+ykde7Kgdj79GHgYbJ2NyMyGd8XyxFwDA at mail.gmail.com>:
> 2015-08-06 15:24 GMT+02:00 Pallai Roland <pallair at magex.hu>:
> 
>>   drbdtest1 corosync[4734]:   [MAIN  ] Corosync main process was not
>>>> scheduled for 2590.4512 ms (threshold is 2400.0000 ms). Consider token
>>>> timeout increase.
>>>>
>>>> and even drbd:
>>>>   drbdtest1 kernel: drbd p1: PingAck did not arrive in time.
>>>>
>>>
>>> Kernel module blocked by unrelated userspace app?
>>
>>
>> There is a chance that the nodes are blocking each other as they are on
>> the same host and that is the reason of the DRBD timeout but it's also
>> weird - how can a guest block an other entirely when there are idle cores
>> on the host?
>>
>> All in all, DRBD timeout has been eliminated when a node got more than one
>> logical core.
>>
> 
> I have to correct myself;
> 
> DRBD timeout is not fixed if only one node has more cores. In this case the
> other node will report PingAck timeout periodically. I think the most
> simple explanation on this is a spinning corosync can block even kernel
> threads.
> 
> DRBD timeout fixed if both nodes has more logical cores.