[ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

Tue Apr 18 16:00:15 CEST 2017

On 18/04/17 03:47 AM, Ulrich Windl wrote:
>>>> Digimer <lists at alteeve.ca> schrieb am 16.04.2017 um 20:17 in Nachricht
> <12cde13f-8bad-a2f1-6834-960ff3afce6c at alteeve.ca>:
>> On 16/04/17 01:53 PM, Eric Robinson wrote:
>>> I was reading in "Clusters from Scratch" where Beekhof states, "Some would
> 
>> argue that two-node clusters are always pointless, but that is an argument 
>> for another time." Is there a page or thread where this argument has been 
>> fleshed out? Most of my dozen clusters are 2 nodes. I hate to think they're
> 
>> pointless.  
>>>
>>> --
>>> Eric Robinson
>>
>> There is a belief that you can't build a reliable cluster without
>> quorum. I am of the mind that you *can* build a very reliable 2-node
>> cluster. In fact, every cluster our company has deployed, going back
>> over five years, has been 2-node and have had exception uptimes.
>>
>> The confusion comes from the belief that quorum is required and stonith
>> is option. The reality is the opposite. I'll come back to this in a minute.
>>
>> In a two-node cluster, you have two concerns;
>>
>> 1. If communication between the nodes fail, but both nodes are alive,
>> how do you avoid a split brain?
> 
> By killing one of the two parties.
> 
>>
>> 2. If you have a two node cluster and enable cluster startup on boot,
>> how do you avoid a fence loop?
> 
> I think the problem in the question is using "you" instead of "it" ;-)
> Pacemaker assumes all problems that cause STONITH will be solved by STONITH.
> That's not always true (e.g. configuration errors). Maybe a node's failcount
> should not be reset if the node was fenced.
> So you'll avoid a fencing loop, but might end in a state where no resources
> are running. IMHO I'd prefer that over a fencing loop.
> 
>>
>> Many answer #1 by saying "you need a quorum node to break the tie". In
>> some cases, this works, but only when all nodes are behaving in a
>> predictable manner.
> 
> All software relies on the fact that it behaves in a predictable manner, BTW.
> The problem is not "the predictable manner for all nodes", but the predictable
> manner for the cluster.
> 
>>
>> Many answer #2 by saying "well, with three nodes, if a node boots and
>> can't talk to either other node, it is inquorate and won't do anything".
> 
> "wan't do anything" is also wrong: I must go offline without killing others,
> preferrably.
> 
>> This is a valid mechanism, but it is not the only one.
>>
>> So let me answer these from a 2-node perspective;
>>
>> 1. You use stonith and the faster node lives, the slower node dies. From
> 
> Isn't there a possibility that both nodes shoot each other? Is there a
> guarantee that there will always be one faster node?
> 
>> the moment of comms failure, the cluster blocks (needed with quorum,
>> too) and doesn't restore operation until the (slower) peer is in a known
>> state; Off. You can bias this by setting a fence delay against your
>> preferred node. So say node 1 is the node that normally hosts your
>> services, then you add 'delay="15"' to node 1's fence method. This tells
>> node 2 to wait 15 seconds before fencing node 1. If both nodes are
>> alive, node 2 will be fenced before the timer expires.
> 
> Can only the DC issue fencing?
> 
>>
>> 2. In Corosync v2+, there is a 'wait_for_all' option that tells a node
>> to not do anything until it is able to talk to the peer node. So in the
>> case of a fence after a comms break, the node that reboots will come up,
>> fail to reach the survivor node and do nothing more. Perfect.
> 
> Does "do nothing more" mean continuously polling for other nodes?
> 
>>
>> Now let me come back to quorum vs. stonith;
>>
>> Said simply; Quorum is a tool for when everything is working. Fencing is
>> a tool for when things go wrong.
> 
> I'd say: Quorum is the tool to decide who'll be alive and who's going to die,
> and STONITH is the tool to make nodes die. If everything is working you need
> neither quorum nor STONITH.
> 
>>
>> Lets assume that your cluster is working find, then for whatever reason,
>> node 1 hangs hard. At the time of the freeze, it was hosting a virtual
>> IP and an NFS service. Node 2 declares node 1 lost after a period of
>> time and decides it needs to take over;
> 
> In case node 1 is DC, isn't a selection for a new DC coming first, and the new
> DC doing the STONITH?
> 
> 
>>
>> In the 3-node scenario, without stonith, node 2 reforms a cluster with
>> node 3 (quorum node), decides that it is quorate, starts its NFS server
>> and takes over the virtual IP. So far, so good... Until node 1 comes out
> 
> Again if node 1 was DC, it's not that simple.
> 
>> of its hang. At that moment, node 1 has no idea time has passed. It has
> 
> You assume no fencing was done...
> 
>> no reason to think "am I still quorate? Are my locks still valid?" It
>> just finishes whatever it was in the middle of doing and bam,
>> split-brain. At the least, you have two nodes claiming the same IP at
>> the same time. At worse, you had uncoordinated writes to shared storage
>> and you've corrupted your data.
> 
> But that's no cluster; that's a mess ;-)
> 
>>
>> In the 2-node scenario, with stonith, node 2 is always quorate, so after
>> declaring node 1 lost, it moves to fence node 1. Once node 1 is fenced,
>> *then* it starts NFS, takes over the virtual IP and restores services.
> 
> So you compare "2 nodes + fencing" to "3 nodes without fencing"?
> 
>> In this case, no split-brain is possible because node 1 has rebooted and
>> comes up with a fresh state (or it's on fire and never coming back anyway).
>>
>> This is why quorum is optional and stonith/fencing is not.
> 
> You did not convince me how only one node has the ability to fence the other
> without a quorum: Wouldn't both nodes shoot at each other? (I quoted this so
> many times, but once again: In HP-UX Service Guard, a lock disk was used as a
> tie-breaker: Only one node suceeded to get the lock, and the other committed
> suicide (via kernel watchdog timeout)).
> 
>>
>> Now, with this said, I won't say that 3+ node clusters are bad. They're
>> fine if they suit your use-case, but even with 3+ nodes you still must
>> use stonith.
>>
>> My *personal* arguments in favour of 2-node clusters over 3+ nodes is this;
> 
> Again: You compare "2 nodes with fencing" to "3 nodes without fencing". My
> personal vote would be "3 nodes with fencing" if there is enough work for two
> nodes.
> 
>>
>> A cluster is not beautiful when there is nothing left to add. It is
>> beautiful when there is nothing left to take away.
>>
>> In availability clustering, nothing should ever be more important than
>> availability, and availability is a product of simplicity. So in my
>> view, a 3-node cluster adds complexity that is avoidable, and so is
>> sub-optimal.
> 
> IMHO: a valid cluster software works starting at 1 node, then per induction
> also for n+1 nodes. Complexity should grow only linear with the number of
> nodes. Of course you shouldn't add nodes just for the number of nodes, but for
> the actual need.
> 
> Regards,
> Ulrich

I was addressing the misconception that fencing was optional and quorum
was not. I wrote a longer reply as an article to follow up on this down
the thread.

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould