[Pacemaker] Best stonith method to avoid split brain on a drbd cluster

Devin Reade gdr at gno.org
Mon Jan 3 11:53:53 EST 2011


Johannes Freygner <hannes at freygner.at> wrote:

> could somebody give me an idea what will be the best stonith solution on a drbd cluster to avoid split brain if the network between the nodes is lost.
> 
> I have already tried to use stonith with ILO, but if the power cable is removed from the node (because we have to service the hardware) the resource will not start on the remaining node, because the remaining node can't fence the removed node.

There should be nothing wrong with using ILO, but please read on.
(In fact, ILO/ALOM/DRAC based fencing is the cleanest solution for
enterprise grade hardware with multiple power sources.)

If you're bringing something down for maintenance, fencing shouldn't
occur.  If you do a 'shutdown -r now' on once node, does that
node normally get fenced by the other?  If so, does doing a
'service corosync stop' allow that node to cleanly leave the cluster 
without being fenced?  If the answer to both are 'yes', then you
probably have an rc script sequencing problem that you should deal
with first.

If you're running RHEL/CentOS or a derivative, have a look at your
corosync rc script.  If it has
	# chkconfig: - 20 20
then change it to
	# chkconfig: - 75 25

and do a:
	service corosync reset

Remeber to do this on all nodes.

Now if you now do a 'shutdown -r now' (or -h) on one node, it should
not get fenced, and your resources should all be nicely moved to
the remaining node before the first node is down.

Devin
-- 
If it's sinful, it's more fun.





More information about the Pacemaker mailing list