[ClusterLabs] What is the logic when two node are down at the same time and needs to be fenced

Ken Gaillot kgaillot at redhat.com
Wed Nov 2 13:02:48 EDT 2016


On 11/02/2016 11:17 AM, Niu Sibo wrote:
> Hi all,
> 
> I have a general question regarding the fence login in pacemaker.
> 
> I have setup a three nodes cluster with Pacemaker 1.1.13 and cluster
> property no_quorum_policy set to ignore. When two nodes lost their NIC
> corosync is running on at the same time, it looks like the two nodes are
> getting fenced one by one, even I have three fence devices defined for
> each of the node.
> 
> What should I be expecting in the case?

It's probably coincidence that the fencing happens serially; there is
nothing enforcing that for separate fence devices. There are many steps
in a fencing request, so they can easily take different times to complete.

> I noticed if the node rejoins the cluster before the cluster starts the
> fence actions, some resources will get activated on 2 nodes at the
> sametime. This is really not good if the resource happens to be
> VirutalGuest.  Thanks for any suggestions.

Since you're ignoring quorum, there's nothing stopping the disconnected
node from starting all resources on its own. It can even fence the other
nodes, unless the downed NIC is used for fencing. From that node's point
of view, it's the other two nodes that are lost.

Quorum is the only solution I know of to prevent that. Fencing will
correct the situation, but it won't prevent it.

See the votequorum(5) man page for various options that can affect how
quorum is calculated. Also, the very latest version of corosync supports
qdevice (a lightweight daemon that run on a host outside the cluster
strictly for the purposes of quorum).




More information about the Users mailing list