<div dir="ltr"><div><div>Hello Andrea<br><br></div>When you use multipath -v3 look this parameters fast_io_fail_tmo,dev_loss_tmo maybe your watchdog timeout is too low and you took for more the 10 seconds<br><br></div>Thanks<br>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">2013/5/3 andrea cuozzo <span dir="ltr"><<a href="mailto:andrea.cuozzo@sysma.it" target="_blank">andrea.cuozzo@sysma.it</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Thanks Emmanuel,<br>
<br>
using the -v3 switch I can see that multipath is querying my local /dev/sda<br>
disk too, which is unneded, so I blacklisted /dev/sda in multipath.conf and<br>
reloaded multipath; it migh not be related to my problem but it was a<br>
mistake indeed. Thanks. My vision on all the timouts involved in my scenario<br>
is:<br>
<br>
- each hba has 1 second (qlport_down_retry=1 ) to manage a port down event<br>
and report upwards<br>
- multipathd has at most 5 seconds (polling_interval 5) to notice for path<br>
failures and do its job<br>
- sbd attempts to read its partition each second (Timeout (loop) :1) and has<br>
10 seconds (timeout (watchdog):10 ) before the watchdog reboots the server<br>
(I'm assuming SBD doesn't feed the watchdog unless the reading attempt is<br>
succesfull).<br>
<br>
I can see this working as expected using the multipathd -k interactive<br>
console to fail and reinstate paths, and reading the relative multipathd<br>
messages on the syslog about path lost and reinstated, What makes me think<br>
my problem might not be multipath related is that there's no sign of port<br>
down or path lost messages in the syslog when the problem happens, there's<br>
just the sbd delay countdown.<br>
<br>
andrea<br>
<br>
<br>
<br>
Date: Thu, 2 May 2013 20:18:04 +0200<br>
From: emmanuel segura <<a href="mailto:emi2fast@gmail.com">emi2fast@gmail.com</a>><br>
<div class="im">To: The Pacemaker cluster resource manager<br>
</div> <<a href="mailto:pacemaker@oss.clusterlabs.org">pacemaker@oss.clusterlabs.org</a>><br>
Subject: Re: [Pacemaker] Frequent SBD triggered server reboots<br>
Message-ID:<br>
<<a href="mailto:CAE7pJ3DGq4KsOVr8svFk0YpreGc67YG7prrjtvQOp2BA8vcc5Q@mail.gmail.com">CAE7pJ3DGq4KsOVr8svFk0YpreGc67YG7prrjtvQOp2BA8vcc5Q@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="iso-8859-1"<br>
<div class="im HOEnZb"><br>
if you think your problems are related to multipath timeout, try to use<br>
multipath -v3 and look well the sbd timeout<br>
<br>
Thanks<br>
<br>
<br>
<br>
<br>
<br>
</div><div class="HOEnZb"><div class="h5">_______________________________________________<br>
Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>esta es mi vida e me la vivo hasta que dios quiera
</div>