[ClusterLabs] SBD Latency Warnings

Wed Dec 30 17:14:08 UTC 2015

I'm not sbd expert but I try to describe one of this warnings.

sbd: WARN: Pacemaker state outdated (age: 4)

in sbd source code "./src/sbd-md.c"

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

    good_servants = 0;
    for (s = servants_leader; s; s = s->next) {
      int age = t_now.tv_sec - s->t_last.tv_sec;

      if (!s->t_last.tv_sec)
        continue;

      if (age < (int)(timeout_io+timeout_loop)) {  ##### if the sbd
process was scheduled in timeslide < 4 nothing is printed
        if (strcmp(s->devname, "pcmk") != 0) {
          good_servants++;
        }
        s->outdated = 0;
      } else if (!s->outdated) {
        if (strcmp(s->devname, "pcmk") == 0) {
          /* If the state is outdated, we
           * override the last reported
           * state */
          pcmk_healthy = 0;
          cl_log(LOG_WARNING, "Pacemaker state outdated (age: %d)",
##### but the sbd was scheduled with a timeslide > 4 seconds
            age);
        } else if (!s->restart_blocked) {
          cl_log(LOG_WARNING, "Servant for %s outdated (age: %d)",
            s->devname, age);
        }
        s->outdated = 1;
      }
    }

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

sbd --help

-5 <N>        Warn if loop latency exceeds threshold (optional, watch only)
            (default is 3, set to 0 to disable)

sbd -d /dev/mydevicepath dump | grep -i loop
Timeout (loop)     : 1

this is only a warning and If you want you can ignore, but looking the
source code, this means that sbd process wasn't scheduled for more
than 4 seconds

>From what I know the sbd process has realtime attribute:

 ps -eo pid,class,comm | grep sbd
 6639 RR  sbd
 6640 RR  sbd
 6641 RR  sbd

So this problem is very clear, your host doesn't give cpu time to your guest.

2015-12-30 17:53 GMT+01:00 Jorge Fábregas <jorge.fabregas at gmail.com>:
> Hi,
>
> We're having some issues with a particular oversubscribed hypervisor
> (cpu-wise) where we run SLES 11 SP4 guests.  I had to increase many
> timeouts on the cluster to cope with this:
>
> - Corosync's token timeout (from the default of 5 secs to 30 seconds)
> - SBD's watchdog & msgwait (from 15/30 to 30/60 respectively)
> - Pacemaker's resource-monitoring timeouts
>
> I know the consequence for doing all this will be *slow reaction times*
>  but it's all I can do in the meantime.
>
> However, when the hypervisor is at 100% full CPU utilization I still get
> these messages:
>
> sbd: :WARN: Latency: 4 exceeded threshold 3 on disk /dev/mapper/clustersbd
> logd: WARN: G_CH_prepare_int: working on IPC channel took 220 ms (> 100 ms)
> sbd: WARN: Pacemaker state outdated (age: 4)
> sbd: info: Pacemaker health check: OK
> sbd: WARN: Latency: 4 exceeded threshold 3 on disk /dev/mapper/clustersbd
> logd: WARN: G_CH_check_int: working on IPC channel took 150 ms (> 100 ms)
> sbd: WARN: Latency: 4 exceeded threshold 3 on disk /dev/mapper/clustersbd
> sbd: WARN: Servant for /dev/mapper/clustersbd outdated (age: 5)
> sbd: WARN: Majority of devices lost - surviving on pacemaker
>
> Is this latency configurable? It keeps mentioning "threshold 3". Is that
> 3 seconds? How does it relates to the following parameters ?
>
> ==Dumping header on disk /dev/mapper/clustersbd
> Header version     : 2.1
> UUID               : 54597871-2392-475f-ba2d-71bdf92c36b5
> Number of slots    : 255
> Sector size        : 512
> Timeout (watchdog) : 30
> Timeout (allocate) : 2
> Timeout (loop)     : 1
> Timeout (msgwait)  : 60
> ==Header on disk /dev/mapper/clustersbd is dumped
>
> I'm using the -P option with sbd so I know it will not fence the system
> as long as the node's health is ok (as reported by Pacemaker).  I'd
> still like to find out if the latency mentioned is configurable or is it
> safe to ignore.
>
> Thanks!
>
> Regards,
> Jorge
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^