[Pacemaker] [DRBD-user] examples of dual primary DRBD
    Florian Haas 
    florian at hastexo.com
       
    Mon Oct 10 10:12:43 UTC 2011
    
    
  
On 2011-10-08 15:55, Bart Coninckx wrote:
> On 10/08/11 00:25, Lars Ellenberg wrote:
>> On Fri, Oct 07, 2011 at 10:21:08PM +0200, Bart Coninckx wrote:
>>> On 10/06/11 22:03, Florian Haas wrote:
>>>> On 2011-10-06 21:43, Bart Coninckx wrote:
>>>>> Hi all,
>>>>>
>>>>> would you mind sending me examples of your crm config for a dual
>>>>> primary
>>>>> DRBD resource?
>>>>>
>>>>> I used the one on
>>>>>
>>>>> http://www.drbd.org/users-guide/s-ocfs2-pacemaker.html
>>>>>
>>>>> and on
>>>>>
>>>>> http://www.clusterlabs.org/wiki/Dual_Primary_DRBD_%2B_OCFS2
>>>>>
>>>>> and they both result into split brain, except for when I start drbd
>>>>> manually first.
>>>>
>>>> They clearly should not. Rather than soliciting other people's
>>>> configurations and then try to adapt yours based on that, why don't you
>>>> upload _your_ CIB (not just a "crm configure dump", but a full
>>>> "cibadmin
>>>> -Q") and your DRBD configuration to your pastebin/pastie/fpaste and let
>>>> people tell you where your problem is?
>>>
>>> OK, I posted the drbd.conf on http://pastebin.com/SQe9YxhY
>>>
>>> cibadmin -Q is on http://pastebin.com/gTZqsACq
>>>
>>> The split brain logging is on http://pastebin.com/7unKKkdi .
>>
>> I somehow think you added some "--force" or "--overwrite-data-of-peer"
>> to some drbdadm/drbdsetup primary invocation?
>>
>>> Could this be some sort of timing issue? Manually things are find,
>>> but there are some seconds in between the primary promotions.
>>
> 
> OK, seems to be some sort of timing issue. I "fixed" this by adding a
> "sleep 1" in the RA right before the "do_drbdadm primary $DRBD_RESOURCE"
> line.
> 
> I'm surprised though that I'm the first one to run into this.
Er, wait. I'm cross-posting this to the Pacemaker list on a hunch.
Andrew, in Boston last year you mentioned you were planning to implement
a change to Master/Slave sets in which, iirc, startup and promotion
would happen in one fell swoop (I believe the NTT folks made a
compelling case for this). Has that change ever been implemented? And if
so, at which Pacemaker version? Is there a configuration option to
revert back to the old behavior where the resource would be started
first, and then promotion would occur some time after that?
Cheers,
Florian
-- 
Need help with High Availability?
http://www.hastexo.com/now
    
    
More information about the Pacemaker
mailing list