[ClusterLabs] Recovering after split-brain

Nikhil Utane nikhil.subscribed at gmail.com
Wed Jun 22 05:40:43 UTC 2016


Hmm. I will then work towards bringing this in. Thanks for your input.

On Wed, Jun 22, 2016 at 10:44 AM, Digimer <lists at alteeve.ca> wrote:

> On 22/06/16 01:07 AM, Nikhil Utane wrote:
> > I don't get it.  Pacemaker + Corosync is providing me so much of
> > functionality.
> > For e.g. if we leave out the condition of split-brain for a while, then
> > it provides:
> > 1) Discovery and cluster formation
> > 2) Synchronization of data
> > 3) Heartbeat mechanism
> > 4) Swift failover of the resource
> > 5) Guarantee that one resource will be started on only 1 node
> >
> > So in case of normal fail-over we need the basic functionality of
> > resource being migrated to a standby node.
> > And it is giving me all that.
> > So I don't agree that it needs to be as black and white as you say. Our
> > solution has different requirements than a typical HA solution. But that
> > is only now. In the future we might have to implement all the things. So
> > in that sense Pacemaker gives us a good framework that we can extend.
> >
> > BTW, we are not even using a virtual IP resource which again I believe
> > is something that everyone employs.
> > Because of the nature of the service a small glitch is going to happen.
> > Using virtual IPs is not giving any real benefit for us.
> > And with regard to the question, why even have a standby and let it be
> > active all the time, two-node cluster is one of the possible
> > configuration, but main requirement is to support N + 1. So standby node
> > doesn't know which active it has to take over until a failover occurs.
> >
> > Your comments however has made me re-consider using fencing. It was not
> > that we didn't want to do it.
> > Just that I felt it may not be needed. So I'll definitely explore this
> > further.
>
> It is needed, and it is that black and white. Ask yourself, for your
> particular installation; Can I run X in two places at the same time
> without coordination?
>
> If the answer is "yes", then just do that and be done with it.
>
> If the answer is "no", then you need fencing to allow pacemaker to know
> the state of all nodes (otherwise, the ability to coordinate is lost).
>
> I've never once seen a valid HA setup where fencing was not needed. I
> don't claim to be the best by any means, but I've been around long
> enough to say this with some confidence.
>
> digimer
>
> > Thanks everyone for the comments.
> >
> > -Regards
> > Nikhil
> >
> > On Tue, Jun 21, 2016 at 10:17 PM, Digimer <lists at alteeve.ca
> > <mailto:lists at alteeve.ca>> wrote:
> >
> >     On 21/06/16 10:57 AM, Dmitri Maziuk wrote:
> >     > On 2016-06-20 17:19, Digimer wrote:
> >     >
> >     >> Nikhil indicated that they could switch where traffic went
> up-stream
> >     >> without issue, if I understood properly.
> >     >
> >     > They have some interesting setup, but that notwithstanding: if
> split
> >     > brain happens some clients will connect to "old master" and some:
> to
> >     > "new master", dep. on arp update. If there's a shared resource
> >     > unavailable on one node, clients going there will error out. The
> other
> >     > ones will not. It will work for some clients.
> >     >
> >     > Cf. both nodes going into stonith deathmatch and killing each
> other: the
> >     > service now is not available for all clients. What I don't get is
> the
> >     > blanket assertion that this "more highly" available that option #1.
> >     >
> >     > Dimitri
> >
> >     As I've explained many times (here and on IRC);
> >
> >     If you don't need to coordinate services/access, you don't need HA.
> >
> >     If you do need to coordinate services/access, you need fencing.
> >
> >     So if Nikhil really believes s/he doesn't need fencing and that
> >     split-brains are OK, then drop HA. If that is not the case, then s/he
> >     needs to implement fencing in pacemaker. It's pretty much that
> simple.
> >
> >     --
> >     Digimer
> >     Papers and Projects: https://alteeve.ca/w/
> >     What if the cure for cancer is trapped in the mind of a person
> without
> >     access to education?
> >
> >     _______________________________________________
> >     Users mailing list: Users at clusterlabs.org <mailto:
> Users at clusterlabs.org>
> >     http://clusterlabs.org/mailman/listinfo/users
> >
> >     Project Home: http://www.clusterlabs.org
> >     Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >     Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20160622/48030f01/attachment-0002.html>


More information about the Users mailing list