[ClusterLabs] Corosync: 100% cpu (corosync 2.3.5, libqb 0.17.1, pacemaker 1.1.13)

Thu Aug 6 09:24:45 EDT 2015

2015-08-06 8:53 GMT+02:00 Jan Friesse <jfriesse at redhat.com>:

> Pallai Roland napsal(a):
>
>> hi,
>>
>> I've built a recent cluster stack from sources on Debian Jessie and I
>> can't
>> get rid of cpu spikes. Corosync blocks the entire system for seconds on
>> every simple transition, even itself:
>>
>
> How many cores you have? Corosync since 2.0 uses only two threads (and one
> is only for logging) so it's virtually impossible for corosync to block
> ENTIRE system as long as you have more then one core.

I forgot to mention my test nodes are KVM guests on the same host. There is
2x4 cores on the host but only one was allocated for each VM.

You got the point.

The problem has been absolutely eliminated by allocating more cpu cores to
the guest. Now I run "drbdtest1" on 1 logical core and "drbdtest2" on 2
logical cores. Corosync on drbdtest1 spins the cpu but no spinning on
drbdtest2.

>>   drbdtest1 corosync[4734]:   [MAIN  ] Corosync main process was not
>> scheduled for 2590.4512 ms (threshold is 2400.0000 ms). Consider token
>> timeout increase.
>>
>> and even drbd:
>>   drbdtest1 kernel: drbd p1: PingAck did not arrive in time.
>>
>
> Kernel module blocked by unrelated userspace app?

There is a chance that the nodes are blocking each other as they are on the
same host and that is the reason of the DRBD timeout but it's also weird -
how can a guest block an other entirely when there are idle cores on the
host?

All in all, DRBD timeout has been eliminated when a node got more than one
logical core.

Is this a known behaviour of corosync?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150806/1be9953b/attachment-0003.html>