[ClusterLabs] Corosync main process was not scheduled for 115935.2266 ms (threshold is 800.0000 ms). Consider token timeout increase.

Wed Feb 17 11:47:03 EST 2016

Kostiantyn Ponomarenko napsal(a):
> Thank you for the suggestion.
> The OS is Debian 8. All Packages are build by myself.
> libqb-0.17.2
> corosync-2.3.5
> cluster-glue-1.0.12
> pacemaker-1.1.13
>
> It is really important for me to understand what is happening with the
> cluster under the high load.

For Corosync it's really simple. Corosync has to be scheduled by OS 
regularly (more often than it's current token timeout) to be able to 
detect membership changes and send/receive messages (cpg). If it's not 
scheduled, membership is not up to date and eventually when it's finally 
scheduled, it logs "process was not scheduled for ... ms" message 
(warning for user) and if corosync was not scheduled for more than token 
timeout "Process pause detected for ..." message is displayed and new 
membership is formed. Other nodes (if scheduled regularly) sees non 
regularly scheduled node as dead.

> So I would appreciate any help here =)

There is really no help. It's best to make sure corosync is scheduled 
regularly.

>
>
> Thank you,
> Kostia
>
> On Wed, Feb 17, 2016 at 5:02 PM, Greg Woods <woods at ucar.edu> wrote:
>
>>
>> On Wed, Feb 17, 2016 at 3:30 AM, Kostiantyn Ponomarenko <
>> konstantin.ponomarenko at gmail.com> wrote:
>>
>>> Jan 29 07:00:43 B5-2U-205-LS corosync[2742]: [MAIN  ] Corosync main
>>> process was not scheduled for 12483.7363 ms (threshold is 800.0000 ms).
>>> Consider token timeout increase.
>>
>>
>> I was having this problem as well. You don't say which version of corosync
>> you are running or on what OS, but on CentOS 7, there is an available

This update sets round robin realtime scheduling for corosync by 
default. Same can be achieved without update by editing 
/etc/sysconfig/corosync and changing COROSYNC_OPTIONS line to something 
like COROSYNC_OPTIONS="-r"

Regards,
   Honza

>> update that looks like it might address this (it has to do with
>> scheduling). We haven't gotten around to actually applying it yet because
>> it will require some down time on production services (we do have a few
>> node-locked VMs in our cluster), and it only happens when the system is
>> under very high load, so I can't say for sure the update will fix the
>> issue, but it might be worth looking into.
>>
>> --Greg
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>