[ClusterLabs] Two node cluster and extended distance/site failure

Wed Jun 24 02:09:10 EDT 2020

Two node is what I almost exclusively deal with. It works reasonably
well in one location where failures to perform fencing are rare and can
be mitigated by two different fencing methods. Usually SBD is reliable
enough, as failure of shared storage also implies failure of the whole
cluster.

When two nodes are located on separate sites (not necessary
Asia/America, two buildings across the street is already enough) we have
issue of complete site isolation where normal fencing becomes impossible
together with missing node (power outage, network outage etc).

Usual recommendation is third site which functions as witness. This
works fine up to failure of this third site itself. Unavailability of
the witness makes normal maintenance of either of two nodes impossible.
If witness is not available and (pacemaker on) one of two nodes needs to
be restarted the remaining node goes out of quorum or commits suicide.
At most we can statically designate one node as tiebreaker (and this is
already incompatible with qdevice).

I think I finally can formulate what I miss. The behavior that I would
really want is

- if (pacemaker on) one node performs normal shutdown, remaining node
continues managing services, independently of witness state or
availability. Usually this is achieved either by two_node or by
no-quorum-policy=ignore, but that absolutely requires successful
fencing, so cannot be used alone. Such feature likely mandates WFA, but
that is probably unavoidable.

- if other node is lost unexpectedly, first try normal fencing between
two nodes, independently of witness state or availability. If fencing
succeeds, we can continue managing services.

- if normal fencing fails (due to other site isolation), consult witness
- and follow normal procedure. If witness is not available/does not
grant us quorum - suicide/go out of quorum, if witness is available and
grants us quorum - continue managing services.

Any potential issues with this? If it is possible to implement using
current tools I did not find it.

And note, that this is not actually limited to two node cluster - we
have more or less the same issue with any 50-50 split cluster and
witness on third site.