[Pacemaker] A/P Corosync, PGSQL and Split Brains questions

Thu Feb 10 03:25:22 EST 2011

On Thu, Feb 10, 2011 at 9:09 AM, Stephan-Frank Henry
<Frank.Henry at gmx.net> wrote:
>
>> On: Thu, 10 Feb 2011 08:51:01 +0100, Andrew Beekhof wrote:
>> On Wed, Feb 9, 2011 at 2:48 PM, Stephan-Frank Henry <Frank.Henry at gmx.net>
>> wrote:
>> > Hello agian,
>> >
>> > after fixing up my VirtualIP problem, I have been doing some Split Brain
>> tests and while everything 'returns to normal', it is not quite what I had
>> desired.
>> >
>> > My scenario:
>> > Acive/Passive 2 node cluster (serverA & serverB) with Corosync, DRBD &
>> PGSQL.
>> > The resources are configured as Master/Slave and sofar it is fine.
>> >
>> > Since bullet points speak more then words: ;)
>> > Test:
>> >  1) Pull the plug on the master (serverA)
>> >  2) Then Reattach
>>
>> You forgot
>> 0) Configure stonith
>>
>> If data is being written to both sides, one of the sets is always
>> going to be lost.
>
> Agreed and acceptable, it is more a question of who survives.
> And that is maybe where my confusion lies, I always thought stonith would actually shut down the 'zombie' node (as STONITH would imply).
> Thus I would lose the zombie node.

The 64-million dollar question: Which one is the zombie?
There is no way to tell from inside the cluster.

And before you say "the one that cant connect to the outside", what if
its a switch failure?
To the node that you pulled the plug from it would look identical, and
it can't talk to the other node to check what it sees.

>
> I would like that when the split brain is detected (not only on drbd) the one that has the latest data will retain the master roll, and the slave syncs up.
> Am I asking the impossible?

I'm sure drbd has some heuristics for this, but it would require only
one node to be getting new data.

>
> If this is not possible, also no problem! I am just seeking clarification.
>
> I'll check out the black magic called STONITH now. :D

A third node would help - that would make quorum a useful input.
(Ie. the one failed node would not try to run resources)

>
>> > Expected results:
>> >  1) serverB becomes Master
>>
>> You mean master for the drbd resource right?
>> Actually I'd expect both sides would be promoted - there is no way for
>> either server to know whether it or its peer is dead.
>
> Yes, naturally (sorry). Both become master (from the outside it is only the one that remains) and that all works fine.
>
> --
> NEU: FreePhone - kostenlos mobil telefonieren und surfen!
> Jetzt informieren: http://www.gmx.net/de/go/freephone
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>