[ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

Ken Gaillot kgaillot at redhat.com
Tue Apr 18 10:14:23 EDT 2017

On 04/18/2017 02:47 AM, Ulrich Windl wrote:
>>>> Digimer <lists at alteeve.ca> schrieb am 16.04.2017 um 20:17 in Nachricht
> <12cde13f-8bad-a2f1-6834-960ff3afce6c at alteeve.ca>:
>> On 16/04/17 01:53 PM, Eric Robinson wrote:
>>> I was reading in "Clusters from Scratch" where Beekhof states, "Some would
>> argue that two-node clusters are always pointless, but that is an argument 
>> for another time." Is there a page or thread where this argument has been 
>> fleshed out? Most of my dozen clusters are 2 nodes. I hate to think they're
>> pointless.  
>>> --
>>> Eric Robinson
>> There is a belief that you can't build a reliable cluster without
>> quorum. I am of the mind that you *can* build a very reliable 2-node
>> cluster. In fact, every cluster our company has deployed, going back
>> over five years, has been 2-node and have had exception uptimes.
>> The confusion comes from the belief that quorum is required and stonith
>> is option. The reality is the opposite. I'll come back to this in a minute.
>> In a two-node cluster, you have two concerns;
>> 1. If communication between the nodes fail, but both nodes are alive,
>> how do you avoid a split brain?
> By killing one of the two parties.
>> 2. If you have a two node cluster and enable cluster startup on boot,
>> how do you avoid a fence loop?
> I think the problem in the question is using "you" instead of "it" ;-)
> Pacemaker assumes all problems that cause STONITH will be solved by STONITH.
> That's not always true (e.g. configuration errors). Maybe a node's failcount
> should not be reset if the node was fenced.
> So you'll avoid a fencing loop, but might end in a state where no resources
> are running. IMHO I'd prefer that over a fencing loop.
>> Many answer #1 by saying "you need a quorum node to break the tie". In
>> some cases, this works, but only when all nodes are behaving in a
>> predictable manner.
> All software relies on the fact that it behaves in a predictable manner, BTW.
> The problem is not "the predictable manner for all nodes", but the predictable
> manner for the cluster.
>> Many answer #2 by saying "well, with three nodes, if a node boots and
>> can't talk to either other node, it is inquorate and won't do anything".
> "wan't do anything" is also wrong: I must go offline without killing others,
> preferrably.
>> This is a valid mechanism, but it is not the only one.
>> So let me answer these from a 2-node perspective;
>> 1. You use stonith and the faster node lives, the slower node dies. From
> Isn't there a possibility that both nodes shoot each other? Is there a
> guarantee that there will always be one faster node?
>> the moment of comms failure, the cluster blocks (needed with quorum,
>> too) and doesn't restore operation until the (slower) peer is in a known
>> state; Off. You can bias this by setting a fence delay against your
>> preferred node. So say node 1 is the node that normally hosts your
>> services, then you add 'delay="15"' to node 1's fence method. This tells
>> node 2 to wait 15 seconds before fencing node 1. If both nodes are
>> alive, node 2 will be fenced before the timer expires.
> Can only the DC issue fencing?

No, any cluster node can initiate fencing. Fencing can also be requested
from a remote node (e.g. via stonith_admin), but the remote node will
ask a cluster node to initiate the fencing.

Also, fence device resources do not need to be "running" in order to be
used. If they are intentionally disabled (target-role=Stopped), they
will not be used, but if they are simply not running, the cluster will
still use the device when needed. "Running" is used solely to determine
whether recurring monitor actions are done.

This design ensures that fencing requires a bare minimum to be
functional (stonith daemon running, and fence devices configured), so it
can be used even at startup before resources are running, and even if
the DC is the node that needs to be fenced or a DC has not yet been elected.

>> 2. In Corosync v2+, there is a 'wait_for_all' option that tells a node
>> to not do anything until it is able to talk to the peer node. So in the
>> case of a fence after a comms break, the node that reboots will come up,
>> fail to reach the survivor node and do nothing more. Perfect.
> Does "do nothing more" mean continuously polling for other nodes?
>> Now let me come back to quorum vs. stonith;
>> Said simply; Quorum is a tool for when everything is working. Fencing is
>> a tool for when things go wrong.
> I'd say: Quorum is the tool to decide who'll be alive and who's going to die,
> and STONITH is the tool to make nodes die. If everything is working you need
> neither quorum nor STONITH.
>> Lets assume that your cluster is working find, then for whatever reason,
>> node 1 hangs hard. At the time of the freeze, it was hosting a virtual
>> IP and an NFS service. Node 2 declares node 1 lost after a period of
>> time and decides it needs to take over;
> In case node 1 is DC, isn't a selection for a new DC coming first, and the new
> DC doing the STONITH?
>> In the 3-node scenario, without stonith, node 2 reforms a cluster with
>> node 3 (quorum node), decides that it is quorate, starts its NFS server
>> and takes over the virtual IP. So far, so good... Until node 1 comes out
> Again if node 1 was DC, it's not that simple.
>> of its hang. At that moment, node 1 has no idea time has passed. It has
> You assume no fencing was done...
>> no reason to think "am I still quorate? Are my locks still valid?" It
>> just finishes whatever it was in the middle of doing and bam,
>> split-brain. At the least, you have two nodes claiming the same IP at
>> the same time. At worse, you had uncoordinated writes to shared storage
>> and you've corrupted your data.
> But that's no cluster; that's a mess ;-)
>> In the 2-node scenario, with stonith, node 2 is always quorate, so after
>> declaring node 1 lost, it moves to fence node 1. Once node 1 is fenced,
>> *then* it starts NFS, takes over the virtual IP and restores services.
> So you compare "2 nodes + fencing" to "3 nodes without fencing"?
>> In this case, no split-brain is possible because node 1 has rebooted and
>> comes up with a fresh state (or it's on fire and never coming back anyway).
>> This is why quorum is optional and stonith/fencing is not.
> You did not convince me how only one node has the ability to fence the other
> without a quorum: Wouldn't both nodes shoot at each other? (I quoted this so
> many times, but once again: In HP-UX Service Guard, a lock disk was used as a
> tie-breaker: Only one node suceeded to get the lock, and the other committed
> suicide (via kernel watchdog timeout)).
>> Now, with this said, I won't say that 3+ node clusters are bad. They're
>> fine if they suit your use-case, but even with 3+ nodes you still must
>> use stonith.
>> My *personal* arguments in favour of 2-node clusters over 3+ nodes is this;
> Again: You compare "2 nodes with fencing" to "3 nodes without fencing". My
> personal vote would be "3 nodes with fencing" if there is enough work for two
> nodes.
>> A cluster is not beautiful when there is nothing left to add. It is
>> beautiful when there is nothing left to take away.
>> In availability clustering, nothing should ever be more important than
>> availability, and availability is a product of simplicity. So in my
>> view, a 3-node cluster adds complexity that is avoidable, and so is
>> sub-optimal.
> IMHO: a valid cluster software works starting at 1 node, then per induction
> also for n+1 nodes. Complexity should grow only linear with the number of
> nodes. Of course you shouldn't add nodes just for the number of nodes, but for
> the actual need.
> Regards,
> Ulrich
>> I'm happy to answer any questions you have on my comments/point of view
>> on this.
>> -- 
>> Digimer
>> Papers and Projects: https://alteeve.com/w/ 
>> "I am, somehow, less interested in the weight and convolutions of
>> Einstein’s brain than in the near certainty that people of equal talent
>> have lived and died in cotton fields and sweatshops." - Stephen Jay Gould

More information about the Users mailing list