[ClusterLabs] Antw: [EXT] Two node cluster and extended distance/site failure

Wed Jun 24 03:28:46 EDT 2020

>>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 24.06.2020 um 08:09 in
Nachricht
<15769_1592978959_5EF2EE0F_15769_691_1_0a00df27-a249-5c57-4055-f378108941f4 at gmai
.com>:
> Two node is what I almost exclusively deal with. It works reasonably
> well in one location where failures to perform fencing are rare and can
> be mitigated by two different fencing methods. Usually SBD is reliable
> enough, as failure of shared storage also implies failure of the whole
> cluster.

You could have two shared storage systems, replicating data on each other,
making the setup "more interesting".

> 
> When two nodes are located on separate sites (not necessary
> Asia/America, two buildings across the street is already enough) we have
> issue of complete site isolation where normal fencing becomes impossible
> together with missing node (power outage, network outage etc).
> 
> Usual recommendation is third site which functions as witness. This
> works fine up to failure of this third site itself. Unavailability of
> the witness makes normal maintenance of either of two nodes impossible.

That's a problem of pacemaker:
Assume you have two nodes and a shared storage: If one node tells via shared
storage that it is going to leave the cluster, there won't be any issues.
Likewise if one node crashes (the other node could think it's just a network
problem), both nodes could try to access the shared storage atomically and
"leave their mark" there (like most locking works). Then the other node will
see who (in case there was any) node was fastest claiming the lock, and all
other nodes would commit suicide (self-fence) or freeze until network is up
again.

(I'm sorry if I repeat myself from time to time, but this was how a two-node
cluster in HP-UX Service Guard worked, and it worked quite well)

> If witness is not available and (pacemaker on) one of two nodes needs to
> be restarted the remaining node goes out of quorum or commits suicide.
> At most we can statically designate one node as tiebreaker (and this is
> already incompatible with qdevice).

So shared storage actually could play the "witness role".

> 
> I think I finally can formulate what I miss. The behavior that I would
> really want is
> 
> ‑ if (pacemaker on) one node performs normal shutdown, remaining node
> continues managing services, independently of witness state or
> availability. Usually this is achieved either by two_node or by
> no‑quorum‑policy=ignore, but that absolutely requires successful
> fencing, so cannot be used alone. Such feature likely mandates WFA, but
> that is probably unavoidable.
> 
> ‑ if other node is lost unexpectedly, first try normal fencing between
> two nodes, independently of witness state or availability. If fencing
> succeeds, we can continue managing services.
> 
> ‑ if normal fencing fails (due to other site isolation), consult witness
> ‑ and follow normal procedure. If witness is not available/does not
> grant us quorum ‑ suicide/go out of quorum, if witness is available and
> grants us quorum ‑ continue managing services.
> 
> Any potential issues with this? If it is possible to implement using
> current tools I did not find it.
> 
> And note, that this is not actually limited to two node cluster ‑ we
> have more or less the same issue with any 50‑50 split cluster and
> witness on third site.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/