[ClusterLabs] Antw: Re: corosync eats the whole CPU core in epoll_wait() on one node in cluster
jfriesse at redhat.com
Tue Jun 2 04:03:50 EDT 2015
Vladislav Bogdanov napsal(a):
> 02.06.2015 09:17, Ulrich Windl wrote:
>>>>> Jan Friesse <jfriesse at redhat.com> schrieb am 01.06.2015 um 16:20 in
>> <556C6A19.5080007 at redhat.com>:
>> [...you cut the part where it seems like polling an invalid file
>>> strace of pacemakerd shows absolutely normal.
>> If you wait for I/O on an invalid file dewcriptior, you can busy the
>> CPU quite easily. Usually this is the case where querying ERRNO to
>> quit a loop helps ;-)
> Yep, invalid/disabled fd could be the root of the issue, but I'd like to
> make sure that either I hit #147 (pe->state == QB_POLL_ENTRY_DELETED) or
> it is completely different issue.
> There is no reproducer code/path available in #147, so I'm unable to
> compare strace outputs with it.
AFAIK Dave was talking about creating too many connections (not in
parallel, just open/close). So you can try something simple like "while
true;do corosync-cmapctl;done". In theory, bug should reproduce (not on
cpg but on cmap socket).
>> Not saying I diagnosed the proplem correctly, but that was my first
>> Users mailing list: Users at clusterlabs.org
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> Users mailing list: Users at clusterlabs.org
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users