[ClusterLabs] SBD Latency Warnings

Jorge Fábregas jorge.fabregas at gmail.com
Wed Dec 30 11:53:53 EST 2015


Hi,

We're having some issues with a particular oversubscribed hypervisor
(cpu-wise) where we run SLES 11 SP4 guests.  I had to increase many
timeouts on the cluster to cope with this:

- Corosync's token timeout (from the default of 5 secs to 30 seconds)
- SBD's watchdog & msgwait (from 15/30 to 30/60 respectively)
- Pacemaker's resource-monitoring timeouts

I know the consequence for doing all this will be *slow reaction times*
 but it's all I can do in the meantime.

However, when the hypervisor is at 100% full CPU utilization I still get
these messages:

sbd: :WARN: Latency: 4 exceeded threshold 3 on disk /dev/mapper/clustersbd
logd: WARN: G_CH_prepare_int: working on IPC channel took 220 ms (> 100 ms)
sbd: WARN: Pacemaker state outdated (age: 4)
sbd: info: Pacemaker health check: OK
sbd: WARN: Latency: 4 exceeded threshold 3 on disk /dev/mapper/clustersbd
logd: WARN: G_CH_check_int: working on IPC channel took 150 ms (> 100 ms)
sbd: WARN: Latency: 4 exceeded threshold 3 on disk /dev/mapper/clustersbd
sbd: WARN: Servant for /dev/mapper/clustersbd outdated (age: 5)
sbd: WARN: Majority of devices lost - surviving on pacemaker

Is this latency configurable? It keeps mentioning "threshold 3". Is that
3 seconds? How does it relates to the following parameters ?

==Dumping header on disk /dev/mapper/clustersbd
Header version     : 2.1
UUID               : 54597871-2392-475f-ba2d-71bdf92c36b5
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 30
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 60
==Header on disk /dev/mapper/clustersbd is dumped

I'm using the -P option with sbd so I know it will not fence the system
as long as the node's health is ok (as reported by Pacemaker).  I'd
still like to find out if the latency mentioned is configurable or is it
safe to ignore.

Thanks!

Regards,
Jorge




More information about the Users mailing list