[ClusterLabs] 2-Node Cluster Pointless?

Sun Apr 16 21:50:48 UTC 2017

On 16/04/17 04:04 PM, Eric Robinson wrote:
>> -----Original Message-----
>> From: Digimer [mailto:lists at alteeve.ca]
>> Sent: Sunday, April 16, 2017 11:17 AM
>> To: Cluster Labs - All topics related to open-source clustering welcomed
>> <users at clusterlabs.org>; Eric Robinson <eric.robinson at psmnv.com>
>> Subject: Re: [ClusterLabs] 2-Node Cluster Pointless?
>>
>> On 16/04/17 01:53 PM, Eric Robinson wrote:
>>> I was reading in "Clusters from Scratch" where Beekhof states, "Some
>> would argue that two-node clusters are always pointless, but that is an
>> argument for another time." Is there a page or thread where this argument
>> has been fleshed out? Most of my dozen clusters are 2 nodes. I hate to think
>> they're pointless.
>>>
>>> --
>>> Eric Robinson
>>
>> There is a belief that you can't build a reliable cluster without quorum. I am of
>> the mind that you *can* build a very reliable 2-node cluster. In fact, every
>> cluster our company has deployed, going back over five years, has been 2-
>> node and have had exception uptimes.
>>
>> The confusion comes from the belief that quorum is required and stonith is
>> option. The reality is the opposite. I'll come back to this in a minute.
>>
>> In a two-node cluster, you have two concerns;
>>
>> 1. If communication between the nodes fail, but both nodes are alive, how
>> do you avoid a split brain?
>>
>> 2. If you have a two node cluster and enable cluster startup on boot, how do
>> you avoid a fence loop?
>>
>> Many answer #1 by saying "you need a quorum node to break the tie". In
>> some cases, this works, but only when all nodes are behaving in a predictable
>> manner.
>>
>> Many answer #2 by saying "well, with three nodes, if a node boots and can't
>> talk to either other node, it is inquorate and won't do anything".
>> This is a valid mechanism, but it is not the only one.
>>
>> So let me answer these from a 2-node perspective;
>>
>> 1. You use stonith and the faster node lives, the slower node dies. From the
>> moment of comms failure, the cluster blocks (needed with quorum,
>> too) and doesn't restore operation until the (slower) peer is in a known
>> state; Off. You can bias this by setting a fence delay against your preferred
>> node. So say node 1 is the node that normally hosts your services, then you
>> add 'delay="15"' to node 1's fence method. This tells node 2 to wait 15
>> seconds before fencing node 1. If both nodes are alive, node 2 will be fenced
>> before the timer expires.
>>
>> 2. In Corosync v2+, there is a 'wait_for_all' option that tells a node to not do
>> anything until it is able to talk to the peer node. So in the case of a fence after
>> a comms break, the node that reboots will come up, fail to reach the survivor
>> node and do nothing more. Perfect.
>>
>> Now let me come back to quorum vs. stonith;
>>
>> Said simply; Quorum is a tool for when everything is working. Fencing is a tool
>> for when things go wrong.
>>
>> Lets assume that your cluster is working find, then for whatever reason,
>> node 1 hangs hard. At the time of the freeze, it was hosting a virtual IP and
>> an NFS service. Node 2 declares node 1 lost after a period of time and
>> decides it needs to take over;
>>
>> In the 3-node scenario, without stonith, node 2 reforms a cluster with node 3
>> (quorum node), decides that it is quorate, starts its NFS server and takes
>> over the virtual IP. So far, so good... Until node 1 comes out of its hang. At
>> that moment, node 1 has no idea time has passed. It has no reason to think
>> "am I still quorate? Are my locks still valid?" It just finishes whatever it was in
>> the middle of doing and bam, split-brain. At the least, you have two nodes
>> claiming the same IP at the same time. At worse, you had uncoordinated
>> writes to shared storage and you've corrupted your data.
>>
>> In the 2-node scenario, with stonith, node 2 is always quorate, so after
>> declaring node 1 lost, it moves to fence node 1. Once node 1 is fenced,
>> *then* it starts NFS, takes over the virtual IP and restores services.
>> In this case, no split-brain is possible because node 1 has rebooted and
>> comes up with a fresh state (or it's on fire and never coming back anyway).
>>
>> This is why quorum is optional and stonith/fencing is not.
>>
>> Now, with this said, I won't say that 3+ node clusters are bad. They're fine if
>> they suit your use-case, but even with 3+ nodes you still must use stonith.
>>
>> My *personal* arguments in favour of 2-node clusters over 3+ nodes is this;
>>
>> A cluster is not beautiful when there is nothing left to add. It is beautiful
>> when there is nothing left to take away.
>>
>> In availability clustering, nothing should ever be more important than
>> availability, and availability is a product of simplicity. So in my view, a 3-node
>> cluster adds complexity that is avoidable, and so is sub-optimal.
>>
>> I'm happy to answer any questions you have on my comments/point of view
>> on this.
>>
>> --
>> Digimer
> 
> That is a very thoughtful response and it will take me some time to digest it. I appreciate the feedback very much and will get back to you later today.
> 
> --
> Eric Robninson

This isn't the first time this has come up, so I decided to elaborate on
this email by writing an article on the topic.

It's a first-draft so there are likely spelling/grammar mistakes.
However, the body is done.

https://www.alteeve.com/w/The_2-Node_Myth

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould