[Pacemaker] Frequent SBD triggered server reboots

Lars Marowsky-Bree lmb at suse.com
Thu May 2 17:36:42 EDT 2013

On 2013-05-02T16:11:11, andrea cuozzo <andrea.cuozzo at sysma.it> wrote:

> external\SBD as stonith device. Operating system, is SLES 11 Sp1, cluster
> components come from the SLES Sp1 HA package and are these versions:

SP1? That's no longer supported, and the overlapping support period to
SP2 long since expired. You really want to update to SP2+maintenance

> Each one of the three clusters will work fine for a couple of days, then
> both servers of one of the clusters at the same time will start the SBD
> "WARN: Latency: No liveness for" countdown and restart. It happens at
> different hours, and during different servers load (even at night, when
> servers are close to 0% load). No two clusters have ever went down at the
> same time. Their syslog is superclean, the only warning messages before the
> reboots are the ones telling the SBD liveness countdown. The SAN department
> can’t see anything wrong on their side, the SAN is used by many other
> servers, no-one seems to be experiencing similar problems.

That's really strange.

Newer SBD versions cope much better with IO that gets stuck in the
multipath layer forever - they'll timeout, abort and most of the time
recover. You really want to upgrade.

In case just your one SBD partitions goes bad, you can also have three
of them, which obviously improves resilience (if they are on different
disks/channels, or connected via iSCSI/FCoE etc).

> - is the 1 MB size for the SBD partition strictly mandatory ? in SLES 11 Sp1
> HA documentation it's written: "In an environment where all nodes have
> access to shared storage, a small partition (1MB) is formated for the use
> with SBD",

No, this is just the minimum size that SBD needs. You can make it larger
if you want to.

> http://www.gossamer-threads.com/lists/linuxha/pacemaker/84951 Mr. Lars
> Marowsky-Bree says: "The new SBD versions will not become stuck on IO
> anymore". Is the SBD version I'm using one that can become stuck on IO ?
> I've checked without luck for SLES HA packages newer than the one I'm using,
> but the SBD being stuck on IO really seems something that would apply to my
> case.

Yes. You really want to update, see the first paragraph. There are no
newer SBD versions for SP1. (If you have LTSS, the story may be
different, but in that case, kindly contact our support directly.)


Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

More information about the Pacemaker mailing list