[ClusterLabs] Corosync: 100% cpu (corosync 2.3.5, libqb 0.17.1, pacemaker 1.1.13)

Pallai Roland pallair at magex.hu
Thu Aug 6 13:54:38 UTC 2015


2015-08-06 15:24 GMT+02:00 Pallai Roland <pallair at magex.hu>:

>   drbdtest1 corosync[4734]:   [MAIN  ] Corosync main process was not
>>> scheduled for 2590.4512 ms (threshold is 2400.0000 ms). Consider token
>>> timeout increase.
>>>
>>> and even drbd:
>>>   drbdtest1 kernel: drbd p1: PingAck did not arrive in time.
>>>
>>
>> Kernel module blocked by unrelated userspace app?
>
>
> There is a chance that the nodes are blocking each other as they are on
> the same host and that is the reason of the DRBD timeout but it's also
> weird - how can a guest block an other entirely when there are idle cores
> on the host?
>
> All in all, DRBD timeout has been eliminated when a node got more than one
> logical core.
>

I have to correct myself;

DRBD timeout is not fixed if only one node has more cores. In this case the
other node will report PingAck timeout periodically. I think the most
simple explanation on this is a spinning corosync can block even kernel
threads.

DRBD timeout fixed if both nodes has more logical cores.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150806/aa7dd503/attachment.htm>


More information about the Users mailing list