[Pacemaker] Two node cluster and no hardware device for stonith.

Andrea a.bacchi at codices.com
Fri Feb 6 04:08:52 EST 2015

 <lists at ...> writes:

> > If the pingnode became not visible on node2, I will see pingd attribute 
> > on
> > node2 set to 0 and dummy resources stop on node2.
> > If I cut off nentire network on node2, I will see pingd attribute on 
> > node2
> > set to 0 bud dummy resource never stop.
> > During network failure...stonith agent is active and try to fence node 
> > 1
> > without success.
> > Why? Is the failed fence action that block location constraint?
> > 
> > 
> > Andrea
> When you disable stonith, pacemaker just assumes that "no contact" == 
> "peer dead", so recovery happens. This is a very false sense of security 
> though, because most people test by actually crashing a node, so there 
> is no risk of a split-brain. The problem is, in the real world, this can 
> not be assured. A node can be running just fine, but the connection 
> fails. If you disable stonith, you get a split-brain.
> So when you enable stonith, and you really must, then pacemaker will 
> never make an assumption about the state of the peer. So when the peer 
> stops responding, pacemaker blocks and calls a fence. It will then sit 
> there and wait for the fence to succeed. If the fence *doesn't* succeed, 
> it ends up staying blocked. This is the proper behaviour!
> Now, if you enable stonith *and* it is configured properly, then you 
> will see that recovery proceeds as expected *after* the fence action 
> completes successfully. So, setup stonith! :)


I don't want to disable stonith, I have stonith enabled, and I use it.
The problem is that during network failure on node2, fence action is
activated on this node, but fail. Fail because it is the unconnected node.
And also it doesn't reboot because watchdog can't check for key registration.   
There is a method to stop resources on this node?
Maybe fence_scsi on remote iscsi target isn't the good solution?


More information about the Pacemaker mailing list