[ClusterLabs] Corosync main process was not scheduled for 2889.8477 ms (threshold is 800.0000 ms), though it runs with realtime priority and there was not much load on the node

wferi at niif.hu wferi at niif.hu
Mon Sep 9 08:21:59 EDT 2019

Andrei Borzenkov <arvidjaar at gmail.com> writes:

> 04.09.2019 0:27, wferi at niif.hu пишет:
>> Jeevan Patnaik <g1patnaik at gmail.com> writes:
>>> [16187] node1 corosyncwarning [MAIN  ] Corosync main process was not
>>> scheduled for 2889.8477 ms (threshold is 800.0000 ms). Consider token
>>> timeout increase.
>>> [...]
>>> 2. How to fix this? We have not much load on the nodes, the corosync is
>>> already running with RT priority.
>> Does your corosync daemon use a watchdog device?  (See in the startup
>> logs.)  Watchdog interaction can be *slow*.
> Can you elaborate? This is the first time I see that corosync has
> anything to do with watchdog. How exactly corosync interacts with
> watchdog? Where in corosync configuration watchdog device is defined?

Inside the resources directive you can specify a watchdog_device, which
Corosync will "pet" from its main loop.  From corosync.conf(5):

| In a cluster with properly configured power fencing a watchdog
| provides no additional value.  On the other hand, slow watchdog
| communication may incur multi-second delays in the Corosync main loop,
| potentially breaking down membership.  IPMI watchdogs are particularly
| notorious in this regard: read about kipmid_max_busy_us in IPMI.txt in
| the Linux kernel documentation.

More information about the Users mailing list