[Pacemaker] Best stonith method to avoid split brain on a drbd cluster
Johannes Freygner
hannes at freygner.at
Mon Jan 3 12:40:00 EST 2011
Devin,
You mean with corosync will work fine, because I am using heartbeat instead. Using "shutdown ...." and pulling the power cable or pulling the power cable directly without shutting down gives always the same solution. The resource wouldn't be started by the other node, because it can't fence the missing node without power on ILO.
Hannes
Devin Reade wrote:
> could somebody give me an idea what will be the best stonith solution on a drbd cluster to avoid split brain if the network between the nodes is lost.
>
> I have already tried to use stonith with ILO, but if the power cable is removed from the node (because we have to service the hardware) the resource will not start on the remaining node, because the remaining node can't fence the removed node.
There should be nothing wrong with using ILO, but please read on.
(In fact, ILO/ALOM/DRAC based fencing is the cleanest solution for
enterprise grade hardware with multiple power sources.)
If you're bringing something down for maintenance, fencing shouldn't
occur. If you do a 'shutdown -r now' on once node, does that
node normally get fenced by the other? If so, does doing a
'service corosync stop' allow that node to cleanly leave the cluster
without being fenced? If the answer to both are 'yes', then you
probably have an rc script sequencing problem that you should deal
with first.
If you're running RHEL/CentOS or a derivative, have a look at your
corosync rc script. If it has
# chkconfig: - 20 20
then change it to
# chkconfig: - 75 25
and do a:
service corosync reset
Remeber to do this on all nodes.
Now if you now do a 'shutdown -r now' (or -h) on one node, it should
not get fenced, and your resources should all be nicely moved to
the remaining node before the first node is down.
Devin
--
If it's sinful, it's more fun.
More information about the Pacemaker
mailing list