[Pacemaker] Best stonith method to avoid split brain on a drbd cluster

Johannes Freygner hannes at freygner.at
Wed Jan 5 04:27:00 EST 2011


Hi Devin,

see *)

-----Original Message-----
From: Devin Reade [mailto:gdr at gno.org] 
Sent: Dienstag, 04. Jänner 2011 19:34
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Best stonith method to avoid split brain on a drbd cluster

--On Monday, January 03, 2011 09:14:29 PM +0100 hannes at freygner.at wrote:

> As I have tested, its not a problem on the shutdown order. On a 
> regular shutdown everything is working fine until I pull the power cable.

So just before pulling the power cable, the running node reports itself as online with all resources migrated, and the other node as offline?

*) Yes, and I found the wrong setting:
          <op id="fence_ilo_1_mon" interval="60s" name="monitor" on-fail="fence"/>
Should be
          <op id="fence_ilo_1_mon" interval="60s" name="monitor" on-fail="stop"/>
After that change it worked fine with a regular shutdown.

But if I pull the power cable without a regular shutting down, the powerless node gets status "UNCLEAN (offline)" and the resources remains stopped.
I found and tested a workaround: I use as second fencing device "meatware" und wrote a script resource which starts on all nodes and checks the status of the nodes. If the status goes to "UNCLEAN (offline)" and the ILO can't be pinged it will call "meatclient -c <node>". The status of the powerless node will be changed and the resources gets started on the online node.

I have two networks, one client network which is bounded to two network interfaces and connected to two different switches and second on the third network interface a communication network for heartbeat and drbd.

Pulling the cable of the communication network works also fine, because one of the nodes gets automatically fenced.

Regards,
Hannes


> After losing the ilo communication the status of the online node 
> changes
to
> "online UNCLEAN". The other node which is turned off and without any 
> power gets "offline UNCLEAN".

Well, that's certainly not polite ...

Would you be able to post that portion of your config relating to stonith, including constraints et al?  Feel free to scrub your passwords and hostnames.

Devin


_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker






More information about the Pacemaker mailing list