[ClusterLabs] Corosync main process was not scheduled for 2889.8477 ms (threshold is 800.0000 ms), though it runs with realtime priority and there was not much load on the node

Jeevan Patnaik g1patnaik at gmail.com
Fri Aug 30 09:53:41 EDT 2019


Hi,

We see the following messages almost everyday in our 2 node cluster and
resources gets migrated when it happens:

[16187] node1 corosyncwarning [MAIN  ] Corosync main process was not
scheduled for 2889.8477 ms (threshold is 800.0000 ms). Consider token
timeout increase.
[16187] node1 corosyncnotice  [TOTEM ] c.
[16187] node1 corosyncnotice  [TOTEM ] A new membership
(192.168.0.1:1268) was formed. Members joined: 2 left: 2
[16187] node1 corosyncnotice  [TOTEM ] Failed to receive the leave
message. failed: 2


After setting the token timeout to 6000ms, at least the "Failed to receive
the leave message" doesn't appear anymore. But we see corosync timeout
errors:
[16395] node1 corosyncwarning [MAIN  ] Corosync main process was not
scheduled for 6660.9043 ms (threshold is 4800.0000 ms). Consider token
timeout increase.

1. Why is the set timeout not in effect? It's 4800ms instead of 6000ms.
2. How to fix this? We have not much load on the nodes, the corosync is
already running with RT priority.

The following is the details of OS and packages:

Kernel: 3.10.0-957.el7.x86_64
OS: Oracle Linux Server 7.6

corosync-2.4.3-4.el7.x86_64
corosynclib-2.4.3-4.el7.x86_64

Thanks in advance.

-- 
Regards,
Jeevan.
Create your own email signature
<https://www.wisestamp.com/signature-in-email?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190830/f0532826/attachment-0001.html>


More information about the Users mailing list