[Pacemaker] Frequent SBD triggered server reboots

Fri May 3 00:15:54 UTC 2013

Thanks Emmanuel,

using the -v3 switch I can see that multipath is querying my local /dev/sda
disk too, which is unneded, so I blacklisted /dev/sda in multipath.conf and
reloaded multipath; it migh not be related to my problem but it was a
mistake indeed. Thanks. My vision on all the timouts involved in my scenario
is:

- each hba has 1 second (qlport_down_retry=1 ) to manage a port down event
and report upwards
- multipathd has at most 5 seconds (polling_interval  5) to notice for path
failures and do its job
- sbd attempts to read its partition each second (Timeout (loop) :1) and has
10 seconds (timeout (watchdog):10 ) before the watchdog reboots the server
(I'm assuming SBD doesn't feed the watchdog unless the reading attempt is
succesfull).

I can see this working as expected using the multipathd -k interactive
console to fail and reinstate paths, and reading the relative multipathd
messages on the syslog about path lost and reinstated, What makes me think
my problem might not be multipath related is that there's no sign of port
down or path lost messages in the syslog when the problem happens, there's
just the sbd delay countdown. 

andrea

Date: Thu, 2 May 2013 20:18:04 +0200
From: emmanuel segura <emi2fast at gmail.com>
To: The Pacemaker cluster resource manager
	<pacemaker at oss.clusterlabs.org>
Subject: Re: [Pacemaker] Frequent SBD triggered server reboots
Message-ID:
	<CAE7pJ3DGq4KsOVr8svFk0YpreGc67YG7prrjtvQOp2BA8vcc5Q at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

if you think your problems are related to multipath timeout, try to use
multipath -v3 and look well the sbd timeout

Thanks