[Pacemaker] crm_mon/pacemaker split brain

Andrew Beekhof beekhof at gmail.com
Fri Nov 28 06:32:09 EST 2008


There's a few things wrong here...

For starters the stonith resources appear to be badly configured.
This means that stonithd fails when we try to shoot the node because
extip_ftp resource isn't able to be stopped.

At which point the cluster can't do anything.

Moving on, you're using underscores instead of dashes in a 1.0 configuration.
So all the meta options are being ignored and its causing the cluster
to explode.

My guess is you loaded an xml fragment from a 0.6 cluster into a blank
1.0 configuration - instead of leaving it in place when you upgraded
and letting cibadmin do the conversion (which would have fixed the
underscores)

On Wed, Nov 19, 2008 at 15:09, Raoul Bhatia [IPAX] <r.bhatia at ipax.at> wrote:
> hi,
>
> crm_mon shows me a kind of split-brain view of my cluster:
>
> common lines on my two nodes:
>
>> ============
>> Last updated: Wed Nov 19 15:02:22 2008
>> Current DC: wc02 (f36760d8-d84a-46b2-b452-4c8cac8b3396)
>> 2 Nodes configured.
>> 9 Resources configured.
>> ============
>>
>> Node: wc01 (31de4ab3-2d05-476e-8f9a-627ad6cd94ca): standby
>> Node: wc02 (f36760d8-d84a-46b2-b452-4c8cac8b3396): online
>> ...
>
> wc01's view:
>> Clone Set: clone_nfs-common
>>     Resource Group: group_nfs-common:0
>>         nfs-common:0    (lsb:nfs-common):       Started wc01
>>     Resource Group: group_nfs-common:1
>>         nfs-common:1    (lsb:nfs-common) Started [      wc01    wc02 ]
>
> wc02's view:
>> Clone Set: clone_nfs-common
>>     Resource Group: group_nfs-common:0
>>         nfs-common:0    (lsb:nfs-common) Started [      wc01    wc02 ]
>>     Resource Group: group_nfs-common:1
>>         nfs-common:1    (lsb:nfs-common):       Started wc01
>
> the information basically is the same, but the two instances of the
> clone "group_nfs-common:0" and "group_nfs-common:1" are swapped.
>
> the configuration is:
> wc01: pacemaker 1.0.1; heartbeat 2.99.2
> wc02: pacemaker 1.0.0; heartbeat 2.99.1
>
> hb_report available at [1]
>
> cheers,
> raoul
>
> ps: regarding the logfiles, please note that i had different system
> times and just updated the clocks:
>> wc01: 19 Nov 15:03:18 ntpdate[3455]: adjust time server 81.223.14.147 offset 0.002208 sec
>> wc02: 19 Nov 15:03:22 ntpdate[22517]: step time server 81.223.14.147 offset -4.483140 sec
>
> [1] http://ip52.ipax.at/~raoul/cluster/hb_report_splitbrain.tar.bz2
> --
> ____________________________________________________________________
> DI (FH) Raoul Bhatia M.Sc.          email.          r.bhatia at ipax.at
> Technischer Leiter
>
> IPAX - Aloy Bhatia Hava OEG         web.          http://www.ipax.at
> Barawitzkagasse 10/2/2/11           email.            office at ipax.at
> 1190 Wien                           tel.               +43 1 3670030
> FN 277995t HG Wien                  fax.            +43 1 3670030 15
> ____________________________________________________________________
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>
>




More information about the Pacemaker mailing list