<div dir="ltr"><div><div>Hello Andrea<br><br></div>When you use multipath -v3 look this parameters fast_io_fail_tmo,dev_loss_tmo maybe your watchdog timeout is too low and you took for more the 10 seconds<br><br></div>Thanks<br>

</div><div class="gmail_extra"><br><br><div class="gmail_quote">2013/5/3 andrea cuozzo <span dir="ltr"><<a href="mailto:andrea.cuozzo@sysma.it" target="_blank">andrea.cuozzo@sysma.it</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Thanks Emmanuel,<br>

<br>

using the -v3 switch I can see that multipath is querying my local /dev/sda<br>

disk too, which is unneded, so I blacklisted /dev/sda in multipath.conf and<br>

reloaded multipath; it migh not be related to my problem but it was a<br>

mistake indeed. Thanks. My vision on all the timouts involved in my scenario<br>

is:<br>

<br>

- each hba has 1 second (qlport_down_retry=1 ) to manage a port down event<br>

and report upwards<br>

- multipathd has at most 5 seconds (polling_interval  5) to notice for path<br>

failures and do its job<br>

- sbd attempts to read its partition each second (Timeout (loop) :1) and has<br>

10 seconds (timeout (watchdog):10 ) before the watchdog reboots the server<br>

(I'm assuming SBD doesn't feed the watchdog unless the reading attempt is<br>

succesfull).<br>

<br>

I can see this working as expected using the multipathd -k interactive<br>

console to fail and reinstate paths, and reading the relative multipathd<br>

messages on the syslog about path lost and reinstated, What makes me think<br>

my problem might not be multipath related is that there's no sign of port<br>

down or path lost messages in the syslog when the problem happens, there's<br>

just the sbd delay countdown.<br>

<br>

andrea<br>

<br>

<br>

<br>

Date: Thu, 2 May 2013 20:18:04 +0200<br>

From: emmanuel segura <<a href="mailto:emi2fast@gmail.com">emi2fast@gmail.com</a>><br>

<div class="im">To: The Pacemaker cluster resource manager<br>

</div>        <<a href="mailto:pacemaker@oss.clusterlabs.org">pacemaker@oss.clusterlabs.org</a>><br>

Subject: Re: [Pacemaker] Frequent SBD triggered server reboots<br>

Message-ID:<br>

        <<a href="mailto:CAE7pJ3DGq4KsOVr8svFk0YpreGc67YG7prrjtvQOp2BA8vcc5Q@mail.gmail.com">CAE7pJ3DGq4KsOVr8svFk0YpreGc67YG7prrjtvQOp2BA8vcc5Q@mail.gmail.com</a>><br>

Content-Type: text/plain; charset="iso-8859-1"<br>

<div class="im HOEnZb"><br>

if you think your problems are related to multipath timeout, try to use<br>

multipath -v3 and look well the sbd timeout<br>

<br>

Thanks<br>

<br>

<br>

<br>

<br>

<br>

</div><div class="HOEnZb"><div class="h5">_______________________________________________<br>

Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br>esta es mi vida e me la vivo hasta que dios quiera

</div>