[ClusterLabs] SBD Latency Warnings
emmanuel segura
emi2fast at gmail.com
Wed Dec 30 18:14:08 CET 2015
I'm not sbd expert but I try to describe one of this warnings.
sbd: WARN: Pacemaker state outdated (age: 4)
in sbd source code "./src/sbd-md.c"
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
good_servants = 0;
for (s = servants_leader; s; s = s->next) {
int age = t_now.tv_sec - s->t_last.tv_sec;
if (!s->t_last.tv_sec)
continue;
if (age < (int)(timeout_io+timeout_loop)) { ##### if the sbd
process was scheduled in timeslide < 4 nothing is printed
if (strcmp(s->devname, "pcmk") != 0) {
good_servants++;
}
s->outdated = 0;
} else if (!s->outdated) {
if (strcmp(s->devname, "pcmk") == 0) {
/* If the state is outdated, we
* override the last reported
* state */
pcmk_healthy = 0;
cl_log(LOG_WARNING, "Pacemaker state outdated (age: %d)",
##### but the sbd was scheduled with a timeslide > 4 seconds
age);
} else if (!s->restart_blocked) {
cl_log(LOG_WARNING, "Servant for %s outdated (age: %d)",
s->devname, age);
}
s->outdated = 1;
}
}
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
sbd --help
-5 <N> Warn if loop latency exceeds threshold (optional, watch only)
(default is 3, set to 0 to disable)
sbd -d /dev/mydevicepath dump | grep -i loop
Timeout (loop) : 1
this is only a warning and If you want you can ignore, but looking the
source code, this means that sbd process wasn't scheduled for more
than 4 seconds
>From what I know the sbd process has realtime attribute:
ps -eo pid,class,comm | grep sbd
6639 RR sbd
6640 RR sbd
6641 RR sbd
So this problem is very clear, your host doesn't give cpu time to your guest.
2015-12-30 17:53 GMT+01:00 Jorge Fábregas <jorge.fabregas at gmail.com>:
> Hi,
>
> We're having some issues with a particular oversubscribed hypervisor
> (cpu-wise) where we run SLES 11 SP4 guests. I had to increase many
> timeouts on the cluster to cope with this:
>
> - Corosync's token timeout (from the default of 5 secs to 30 seconds)
> - SBD's watchdog & msgwait (from 15/30 to 30/60 respectively)
> - Pacemaker's resource-monitoring timeouts
>
> I know the consequence for doing all this will be *slow reaction times*
> but it's all I can do in the meantime.
>
> However, when the hypervisor is at 100% full CPU utilization I still get
> these messages:
>
> sbd: :WARN: Latency: 4 exceeded threshold 3 on disk /dev/mapper/clustersbd
> logd: WARN: G_CH_prepare_int: working on IPC channel took 220 ms (> 100 ms)
> sbd: WARN: Pacemaker state outdated (age: 4)
> sbd: info: Pacemaker health check: OK
> sbd: WARN: Latency: 4 exceeded threshold 3 on disk /dev/mapper/clustersbd
> logd: WARN: G_CH_check_int: working on IPC channel took 150 ms (> 100 ms)
> sbd: WARN: Latency: 4 exceeded threshold 3 on disk /dev/mapper/clustersbd
> sbd: WARN: Servant for /dev/mapper/clustersbd outdated (age: 5)
> sbd: WARN: Majority of devices lost - surviving on pacemaker
>
> Is this latency configurable? It keeps mentioning "threshold 3". Is that
> 3 seconds? How does it relates to the following parameters ?
>
> ==Dumping header on disk /dev/mapper/clustersbd
> Header version : 2.1
> UUID : 54597871-2392-475f-ba2d-71bdf92c36b5
> Number of slots : 255
> Sector size : 512
> Timeout (watchdog) : 30
> Timeout (allocate) : 2
> Timeout (loop) : 1
> Timeout (msgwait) : 60
> ==Header on disk /dev/mapper/clustersbd is dumped
>
> I'm using the -P option with sbd so I know it will not fence the system
> as long as the node's health is ok (as reported by Pacemaker). I'd
> still like to find out if the latency mentioned is configurable or is it
> safe to ignore.
>
> Thanks!
>
> Regards,
> Jorge
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
--
.~.
/V\
// \\
/( )\
^`~'^
More information about the Users
mailing list