[Pacemaker] A/P Corosync, PGSQL and Split Brains questions

Stephan-Frank Henry Frank.Henry at gmx.net
Thu Feb 10 03:48:55 EST 2011


> On: Thu, 10 Feb 2011 09:25:22 +0100, Andrew Beekhof wrote:
> On Thu, Feb 10, 2011 at 9:09 AM, Stephan-Frank Henry
> <Frank.Henry at gmx.net> wrote:
> >
> >> You forgot
> >> 0) Configure stonith
> >>
> >> If data is being written to both sides, one of the sets is always
> >> going to be lost.
> >
> > Agreed and acceptable, it is more a question of who survives.
> > And that is maybe where my confusion lies, I always thought stonith
> would actually shut down the 'zombie' node (as STONITH would imply).
> > Thus I would lose the zombie node.
> 
> The 64-million dollar question: Which one is the zombie?
> There is no way to tell from inside the cluster.
> 
> And before you say "the one that cant connect to the outside", what if
> its a switch failure?
> To the node that you pulled the plug from it would look identical, and
> it can't talk to the other node to check what it sees.

Yes, that is naturally a use-case where 'who is the zombie' is kinda mute.
Though in that case it would not really matter because no node would be losing data.

> >
> > I would like that when the split brain is detected (not only on drbd)
> the one that has the latest data will retain the master roll, and the slave
> syncs up.
> > Am I asking the impossible?
> 
> I'm sure drbd has some heuristics for this, but it would require only
> one node to be getting new data.

Yes, from what I have read, this is the case.
In a split-brain-due-to-cable-disconnect scenario this would also be the case, as one is still accessible from the outside and might change.

My presumption was that maybe corosync is also somehow able to use this information.

> >
> > If this is not possible, also no problem! I am just seeking
> clarification.
> >
> > I'll check out the black magic called STONITH now. :D
> 
> A third node would help - that would make quorum a useful input.
> (Ie. the one failed node would not try to run resources)

Sadly I am currently still limited to 2 nodes, thanks to the historical battle of customers-wanting-ha-but-not-wanting-to-pay-for-it.



Maybe I could ask a counter-question:
What is the 'standard' case how such a thing is handled?

I have read up a little and have contemplate how I might be able to use the ping resource to possibly control that the disconnected node cannot become a master.
And I will also see how I could use Stonith for my evil plans.

Any other tips?

thanks for the feedback

Frank
-- 
GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit 
gratis Handy-Flat! http://portal.gmx.net/de/go/dsl




More information about the Pacemaker mailing list