[ClusterLabs] Recovering after split-brain

Wed Jun 22 05:07:05 UTC 2016

I don't get it.  Pacemaker + Corosync is providing me so much of
functionality.
For e.g. if we leave out the condition of split-brain for a while, then it
provides:
1) Discovery and cluster formation
2) Synchronization of data
3) Heartbeat mechanism
4) Swift failover of the resource
5) Guarantee that one resource will be started on only 1 node

So in case of normal fail-over we need the basic functionality of resource
being migrated to a standby node.
And it is giving me all that.
So I don't agree that it needs to be as black and white as you say. Our
solution has different requirements than a typical HA solution. But that is
only now. In the future we might have to implement all the things. So in
that sense Pacemaker gives us a good framework that we can extend.

BTW, we are not even using a virtual IP resource which again I believe is
something that everyone employs.
Because of the nature of the service a small glitch is going to happen.
Using virtual IPs is not giving any real benefit for us.
And with regard to the question, why even have a standby and let it be
active all the time, two-node cluster is one of the possible configuration,
but main requirement is to support N + 1. So standby node doesn't know
which active it has to take over until a failover occurs.

Your comments however has made me re-consider using fencing. It was not
that we didn't want to do it.
Just that I felt it may not be needed. So I'll definitely explore this
further.

Thanks everyone for the comments.

-Regards
Nikhil

On Tue, Jun 21, 2016 at 10:17 PM, Digimer <lists at alteeve.ca> wrote:

> On 21/06/16 10:57 AM, Dmitri Maziuk wrote:
> > On 2016-06-20 17:19, Digimer wrote:
> >
> >> Nikhil indicated that they could switch where traffic went up-stream
> >> without issue, if I understood properly.
> >
> > They have some interesting setup, but that notwithstanding: if split
> > brain happens some clients will connect to "old master" and some: to
> > "new master", dep. on arp update. If there's a shared resource
> > unavailable on one node, clients going there will error out. The other
> > ones will not. It will work for some clients.
> >
> > Cf. both nodes going into stonith deathmatch and killing each other: the
> > service now is not available for all clients. What I don't get is the
> > blanket assertion that this "more highly" available that option #1.
> >
> > Dimitri
>
> As I've explained many times (here and on IRC);
>
> If you don't need to coordinate services/access, you don't need HA.
>
> If you do need to coordinate services/access, you need fencing.
>
> So if Nikhil really believes s/he doesn't need fencing and that
> split-brains are OK, then drop HA. If that is not the case, then s/he
> needs to implement fencing in pacemaker. It's pretty much that simple.
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160622/063e9703/attachment.htm>