[ClusterLabs] Recovering after split-brain

Digimer lists at alteeve.ca
Wed Jun 22 05:14:45 UTC 2016


On 22/06/16 01:07 AM, Nikhil Utane wrote:
> I don't get it.  Pacemaker + Corosync is providing me so much of
> functionality.
> For e.g. if we leave out the condition of split-brain for a while, then
> it provides:
> 1) Discovery and cluster formation
> 2) Synchronization of data
> 3) Heartbeat mechanism
> 4) Swift failover of the resource
> 5) Guarantee that one resource will be started on only 1 node
> 
> So in case of normal fail-over we need the basic functionality of
> resource being migrated to a standby node.
> And it is giving me all that.
> So I don't agree that it needs to be as black and white as you say. Our
> solution has different requirements than a typical HA solution. But that
> is only now. In the future we might have to implement all the things. So
> in that sense Pacemaker gives us a good framework that we can extend.
> 
> BTW, we are not even using a virtual IP resource which again I believe
> is something that everyone employs.
> Because of the nature of the service a small glitch is going to happen.
> Using virtual IPs is not giving any real benefit for us.
> And with regard to the question, why even have a standby and let it be
> active all the time, two-node cluster is one of the possible
> configuration, but main requirement is to support N + 1. So standby node
> doesn't know which active it has to take over until a failover occurs.
> 
> Your comments however has made me re-consider using fencing. It was not
> that we didn't want to do it.
> Just that I felt it may not be needed. So I'll definitely explore this
> further.

It is needed, and it is that black and white. Ask yourself, for your
particular installation; Can I run X in two places at the same time
without coordination?

If the answer is "yes", then just do that and be done with it.

If the answer is "no", then you need fencing to allow pacemaker to know
the state of all nodes (otherwise, the ability to coordinate is lost).

I've never once seen a valid HA setup where fencing was not needed. I
don't claim to be the best by any means, but I've been around long
enough to say this with some confidence.

digimer

> Thanks everyone for the comments. 
> 
> -Regards 
> Nikhil
> 
> On Tue, Jun 21, 2016 at 10:17 PM, Digimer <lists at alteeve.ca
> <mailto:lists at alteeve.ca>> wrote:
> 
>     On 21/06/16 10:57 AM, Dmitri Maziuk wrote:
>     > On 2016-06-20 17:19, Digimer wrote:
>     >
>     >> Nikhil indicated that they could switch where traffic went up-stream
>     >> without issue, if I understood properly.
>     >
>     > They have some interesting setup, but that notwithstanding: if split
>     > brain happens some clients will connect to "old master" and some: to
>     > "new master", dep. on arp update. If there's a shared resource
>     > unavailable on one node, clients going there will error out. The other
>     > ones will not. It will work for some clients.
>     >
>     > Cf. both nodes going into stonith deathmatch and killing each other: the
>     > service now is not available for all clients. What I don't get is the
>     > blanket assertion that this "more highly" available that option #1.
>     >
>     > Dimitri
> 
>     As I've explained many times (here and on IRC);
> 
>     If you don't need to coordinate services/access, you don't need HA.
> 
>     If you do need to coordinate services/access, you need fencing.
> 
>     So if Nikhil really believes s/he doesn't need fencing and that
>     split-brains are OK, then drop HA. If that is not the case, then s/he
>     needs to implement fencing in pacemaker. It's pretty much that simple.
> 
>     --
>     Digimer
>     Papers and Projects: https://alteeve.ca/w/
>     What if the cure for cancer is trapped in the mind of a person without
>     access to education?
> 
>     _______________________________________________
>     Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>     http://clusterlabs.org/mailman/listinfo/users
> 
>     Project Home: http://www.clusterlabs.org
>     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>     Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?




More information about the Users mailing list