[ClusterLabs] I've been working on a split-brain prevention strategy for 2-node clusters.

Mon Oct 10 00:02:38 EDT 2016

On 09/10/16 11:58 PM, Andrei Borzenkov wrote:
> 10.10.2016 00:42, Eric Robinson пишет:
>> Digimer, thanks for your thoughts. Booth is one of the solutions I
>> looked at, but I don't like it because it is complex and difficult to
>> implement
> 
> HA is complex. There is no way around it.
> 
>> (and perhaps costly in terms of AWS services or something
>> similar)). As I read through your comments, I returned again and
>> again to the feeling that the troubles you described do not apply to
>> the deaddrop scenario. Your observations are correct in that you
>> cannot make assumptions about the state of the other node when all
>> coms are down.  You cannot count on the other node being in a
>> predictable state. That is certainly true, and it is the very problem
>> that I hope to address with DeadDrop. It provides a last-resort "back
>> channel" for coms between the cluster nodes when all other coms are
>> down, removing the element of assumption.
>>
>> Consider a few scenarios.
>>
>> 1. Data center A is primary, B is secondary. Coms are lost between A
>> and B, but both of them can still reach the Internet. Node A notices
>> loss of coms with B, but it is already primary so it cares not. Node
>> B sees loss of normal cluster communication, and it might normally
>> think of switching to primary, but first it checks the DeadDrop and
>> it sees a note from A saying, "I'm fine and serving pages for
>> customers." B aborts its plan to become primary. Later, after normal
>> links are restored, B rejoins the cluster still as secondary. There
>> is no element of assumption here.
>>
>> 2.  Data center A is primary, B is secondary. A loses communication
>> with the Internet, but not with B. B can still talk to the Internet.
>> B initiates a graceful failover. Again no assumptions.
>>
>> 3. Data center A is primary, B is secondary. Data center A goes
>> completely dark. No communication to anything, not to B, and not to
>> the outside world. B wants to go primary, but first it checks
>> DeadDrop, and it finds that A is not leaving messages there either.
>> It therefore KNOWS that A cannot reach the Internet and is not
>> reachable by customers.
> 
> Depending on your application it still may have active consumers or
> providers on site A so data on site A and site B can diverge. You need
> some steps to ensure that site A is really dead. I.e. site A in this
> case probably needs to commit suicide. This returns us to the same
> question - to which extent we can trust other side. In practice there
> are quite a few of HA solutions that rely on suicide in case of
> communication loss, so it appears to work in real life.

Not really. It's just that the times it fails is sufficiently small that
people don't hit it often. Doesn't mean the danger isn't there.

Consider this; Node stops responding, peer waits, then assumes it's dead
(failed or suicided) and takes over. Meanwhile, node is hung, not dead.
It finally recovers and being a machine, doesn't realize time has passed
(at least not for a short bit). It has no reason to check it's locks or
other states, and proceeds as it was before it hung. Depending on what
it was doing, this could be very bad. Had this been a booth setup, the
hung node would have been fenced, and the remote side can actually trust
that this would happen so wouldn't need direct confirmation.

There are other scenarios, this is just the first one to come to mind.

>> No assumptions there. B assumes primary role
>> and customers are happy. When A comes back online, it detects
>> split-brain and refuses to join the cluster, notifying operators.
>> Later, operators manually resolve the split brain.
>>
>> There is no perfect solution, of course, but is seems to me that this
>> simple approach provides a level of availability beyond what you
>> would normally get with a 2-node cluster. What am I missing?
>>
> 
> Note that tie breaker solution answers single question - is it safe to
> take over another node. But there is much more flowing over cluster
> interconnect, so your cluster is basically frozen - no state change may
> be allowed. This means you cannot do anything on both sites, and it is
> absolutely unclear how HA monitor should now behave when it needs to
> initiate state change, e.g. in response to external events.
> 
> Unless you again trust other side to stop all services (i.e. - go to
> known state).
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?