[Pacemaker] Frequent SBD triggered server reboots

emmanuel segura emi2fast at gmail.com
Thu May 2 10:25:22 EDT 2013


Hello Andrea

Can you show me your multipath.conf?

Thanks


2013/5/2 andrea cuozzo <andrea.cuozzo at sysma.it>

> Hi,****
>
> ** **
>
> It's my first try at asking for help on a mailing list, I hope I'll not
> make netiquette mistakes. I really could use some help on SBD, here's my
> scenario:****
>
> ** **
>
> I have three clusters with a similar configuration: two physical servers
> with a fibre channel shared storage, 4 resources (ip address, ext3
> filesystem, oracle listener, oracle database) configured in a group, and
> external\SBD as stonith device. Operating system, is SLES 11 Sp1, cluster
> components come from the SLES Sp1 HA package and are these versions:****
>
> ** **
>
> openais: 1.1.4-5.6.3****
>
> pacemaker: 1.1.5-5.9.11.1****
>
> resource-agents: 3.9.3-0.4.26.1****
>
> cluster-glue: 1.0.8-0.4.4.1****
>
> corosync: 1.3.3-0.3.1****
>
> csync2: 1.34-0.2.39****
>
> ** **
>
> Each one of the three clusters will work fine for a couple of days, then
> both servers of one of the clusters at the same time will start the SBD
> "WARN: Latency: No liveness for" countdown and restart. It happens at
> different hours, and during different servers load (even at night, when
> servers are close to 0% load). No two clusters have ever went down at the
> same time. Their syslog is superclean, the only warning messages before the
> reboots are the ones telling the SBD liveness countdown. The SAN department
> can’t see anything wrong on their side, the SAN is used by many other
> servers, no-one seems to be experiencing similar problems.****
>
> ** **
>
> Hardware****
>
> ** **
>
> Cluster 1 and Cluster 2: two IBM blades, QLogic QMI2582 (one card, two
> ports), Brocade blade center FC switch, SAN switch, HP P9500 SAN ****
>
> Cluster 3: two IBM x3650, QLogic QLE2560 (two cards per server), SAN
> switch, HP P9500 SAN****
>
> ** **
>
> Each cluster have a 50GB LUN on the HP P9500 SAN (the SAN is in common,
> the LUNs are different): partition 1 (7.8 MB) for SBD, partition 2 (49.99
> GB) for Oracle on ext3****
>
> ** **
>
> What I have done so far:****
>
> ** **
>
> - introduced options qla2xxx ql2xmaxqdepth=16 qlport_down_retry=1
> ql2xloginretrycount=5 ql2xextended_error_logging=1  in
> /etc/modprobe.conf.local (and mkinitrd and restarted the servers)****
>
> - verified with the SAN department that the Qlogic firmware of my HBAs is
> compliant with their needs****
>
> - configured multipath.conf as per HP specifications for the OPEN-V type
> of SAN****
>
> - verified multipathd is working as expected, shutting down one port at a
> time, links stay up on the other port, and then shutting down both, cluster
> switches on the other node****
>
> - configured SBD to use the watchdog device (softdog), and the first
> partition of the LUN, and all relevant tests confirm SBD is working as
> expected (list, dump, message test, message exit, killing the SBD process
> the server reboots), here's my /etc/sysconfig/SBD****
>
> ** **
>
> server1:~ # cat /etc/sysconfig/SBD****
>
> SBD_DEVICE="/dev/mapper/san_part1"****
>
> SBD_OPTS="-W"****
>
> ** **
>
> - enhanced (x2) the default values for Timeout (watchdog) and Timeout
> (msgwait), setting them at 10 and 20, while Stonith Timeout is 60s****
>
> ** **
>
> server1:~ # SBD -d /dev/mapper/san_part1 dump****
>
> ==Dumping header on disk /dev/mapper/san_part1****
>
> Header version     : 2****
>
> Number of slots    : 255****
>
> Sector size        : 512****
>
> Timeout (watchdog) : 10****
>
> Timeout (allocate) : 2****
>
> Timeout (loop)     : 1****
>
> Timeout (msgwait)  : 20****
>
> ==Header on disk /dev/mapper/san_part1 is dumped****
>
> ** **
>
> I’ve even tested with 60 and 120 for Timeout (watchdog) and Timeout
> (msgwait), when the problem happened again the serves went all through the
> 60 seconds delay countdown to reboot.****
>
> ** **
>
> Borrowing the idea from here
> http://www.gossamer-threads.com/lists/linuxha/users/79213 , I'm
> monitoring access time on the SBD partition on the three clusters: average
> time to execute the dump command is 30ms, sometimes it spikes over 100ms a
> couple of times in an hour. There's no slow rise from the average when the
> problem comes, though, here's what it looked like the last time, frequency
> of the dump command is 2 seconds:****
>
> ** **
>
> ...****
>
> real    0m0.031s****
>
> real    0m0.031s****
>
> real    0m0.030s****
>
> real    0m0.030s****
>
> real    0m0.030s****
>
> real    0m0.030s****
>
> real    0m0.031s    ß-- last record on the file, no more logging, server
> will reboot after the timeout watchdog period****
>
> ...****
>
> ** **
>
> Right before the last cluster reboot I was monitoring Oracle I/O towards
> its datafiles, to verify whether Oracle could access its partition, on the
> same LUN as the SBD one, when the SBD countdown start, to identify if it’s
> an SBD-only problem or a LUN access problem), and there was no sign of
>  Oracle I/O problems during the countdown, it seems Oracle stopped
> interacting with the I/O monitor software the very moment the Oracle
> servers rebooted (all servers involved have a common time-server, but I
> can’t be 100% sure they were in sync when I checked).****
>
> ** **
>
> I'm in close contact with the SAN department, the problem might well be
> the servers losing access to the LUN for some fibre channel matter they
> still can't see in their SAN logs, but I'd like to be 100% certain the
> cluster configuration is good. Here are my SBD related questions:****
>
> ** **
>
> - is the 1 MB size for the SBD partition strictly mandatory ? in SLES 11
> Sp1 HA documentation it's written: "In an environment where all nodes have
> access to shared storage, a small partition (1MB) is formated for the use
> with SBD", while here http://linux-ha.org/wiki/SBD_Fencing there is no
> size suggested for it. At Os setup the SLES partitioner didn't allow us to
> create a 1MB partition, being it too small, the smallest size available was
> 7.8MB: can this difference in size introduce the random problem we're
> experiencing ? ****
>
> ** **
>
> - I've read here
> http://www.gossamer-threads.com/lists/linuxha/pacemaker/84951 Mr. Lars
> Marowsky-Bree says: "The new SBD versions will not become stuck on IO
> anymore". Is the SBD version I'm using one that can become stuck on IO ?
> I've checked without luck for SLES HA packages newer than the one I'm
> using, but the SBD being stuck on IO really seems something that would
> apply to my case.****
>
> ** **
>
> ** **
>
> Thanks and best regards.****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130502/ae8c1b1d/attachment-0003.html>


More information about the Pacemaker mailing list