[Pacemaker] Frequent SBD triggered server reboots

emmanuel segura emi2fast at gmail.com
Fri May 3 07:52:19 UTC 2013


Hello Andrea

When you use multipath -v3 look this parameters
fast_io_fail_tmo,dev_loss_tmo maybe your watchdog timeout is too low and
you took for more the 10 seconds

Thanks


2013/5/3 andrea cuozzo <andrea.cuozzo at sysma.it>

> Thanks Emmanuel,
>
> using the -v3 switch I can see that multipath is querying my local /dev/sda
> disk too, which is unneded, so I blacklisted /dev/sda in multipath.conf and
> reloaded multipath; it migh not be related to my problem but it was a
> mistake indeed. Thanks. My vision on all the timouts involved in my
> scenario
> is:
>
> - each hba has 1 second (qlport_down_retry=1 ) to manage a port down event
> and report upwards
> - multipathd has at most 5 seconds (polling_interval  5) to notice for path
> failures and do its job
> - sbd attempts to read its partition each second (Timeout (loop) :1) and
> has
> 10 seconds (timeout (watchdog):10 ) before the watchdog reboots the server
> (I'm assuming SBD doesn't feed the watchdog unless the reading attempt is
> succesfull).
>
> I can see this working as expected using the multipathd -k interactive
> console to fail and reinstate paths, and reading the relative multipathd
> messages on the syslog about path lost and reinstated, What makes me think
> my problem might not be multipath related is that there's no sign of port
> down or path lost messages in the syslog when the problem happens, there's
> just the sbd delay countdown.
>
> andrea
>
>
>
> Date: Thu, 2 May 2013 20:18:04 +0200
> From: emmanuel segura <emi2fast at gmail.com>
> To: The Pacemaker cluster resource manager
>         <pacemaker at oss.clusterlabs.org>
> Subject: Re: [Pacemaker] Frequent SBD triggered server reboots
> Message-ID:
>         <
> CAE7pJ3DGq4KsOVr8svFk0YpreGc67YG7prrjtvQOp2BA8vcc5Q at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> if you think your problems are related to multipath timeout, try to use
> multipath -v3 and look well the sbd timeout
>
> Thanks
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130503/a4362e3d/attachment.htm>


More information about the Pacemaker mailing list