[Pacemaker] Dual primary drbd resouce not promoted on one host

Jürgen Herrmann Juergen.Herrmann at XLhost.de
Tue Feb 5 16:00:48 EST 2013


Am 05.02.2013 16:32, schrieb Jake Smith:
> ----- Original Message -----
>> From: "Jürgen Herrmann" <Juergen.Herrmann at XLhost.de>
>> To: pacemaker at oss.clusterlabs.org
>> Sent: Tuesday, February 5, 2013 7:04:26 AM
>> Subject: [Pacemaker] Dual primary drbd resouce not promoted on one 
>> host
>>
>> Hi there!
>>
>> I have the following problem:
>>
>> I have a 2 node cluster with a dual primary drbd resource. On top
>> of it sits an OCFS2 file system. nodes: app1a, app1b
>>
>> Now today I had the following scenario (occurred several times now):
>> - crm node standby app1a
>> - poweroff app1a for hdd replacement (hw raid controller)
>> - poweron app1a
>> - crm node online app1a
>>
>> all the other resources come back up as expecte, expect the master
>> slave set for the dual primary drbd.
>>
>> here's the relevant portion of my cluster config:
>>
>> node app1a.xlhost.de \
>>          attributes standby="off"
>> node app1b.xlhost.de \
>>          attributes standby="off"
>> primitive resDLM ocf:pacemaker:controld \
>>          op start interval="0" timeout="90s" \
>>          op stop interval="0" timeout="100s" \
>>          op monitor interval="120s"
>> primitive resDRBD0 ocf:linbit:drbd \
>>          op monitor interval="23" role="Slave" timeout="30" \
>>          op monitor interval="13" role="Master" timeout="20" \
>>          op start interval="0" timeout="240s" \
>>          op promote interval="0" timeout="240s" \
>>          op demote interval="0" timeout="100s" \
>>          op stop interval="0" timeout="100s" \
>>          params drbd_resource="drbd0"
>> primitive resFSDRBD0 ocf:heartbeat:Filesystem \
>>          params device="/dev/drbd0" directory="/mnt/drbd0"
>> fstype="ocfs2" options="noatime,intr,nodiratime,heartbeat=none" \
>>          op monitor interval="120s" timeout="50s" \
>>          op start interval="0" timeout="70s" \
>>          op stop interval="0" timeout="70s"
>> primitive resO2CB ocf:pacemaker:o2cb \
>>          op start interval="0" timeout="90s" \
>>          op stop interval="0" timeout="100s" \
>>          op monitor interval="120s"
>> ms msDRBD0 resDRBD0 \
>>          meta master-max="2" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true" target-role="Master"
>> clone cloneDLM resDLM \
>>          meta globally-unique="false" interleave="true"
>> target-role="Started"
>> clone cloneFSDRBD0 resFSDRBD0 \
>>          meta interleave="true" globally-unique="false"
>> target-role="Started"
>> clone cloneO2CB resO2CB \
>>          meta globally-unique="false" interleave="true"
>> target-role="Started"
>> colocation colFSDRBD0_DRBD0 inf: cloneFSDRBD0 msDRBD0:Master
>
> ^^^ This colocation should be cloneDLM on msDRBD0.
>
>> colocation colFSDRBD0_O2CB inf: cloneFSDRBD0 cloneO2CB
>> colocation colO2CB_DLM inf: cloneO2CB cloneDLM
>> order ordDLM_FSDRBD0 inf: cloneDLM cloneFSDRBD0
>
> ^^^ This order statement is not needed.
>
>> order ordDLM_O2CB inf: cloneDLM cloneO2CB
>> order ordDRBD0_FSDRBD0 inf: msDRBD0:promote cloneFSDRBD0
>
> ^^^ This order should be msDRBD0:promote then cloneDLM:start
>
> If you explicitly define the action in an order statement for the
> resource then the same action is implied for the rest of the
> resources.  So your statement is going to try to promote 
> cloneFSDRBD0.
> You should define both actions explicitly like this:
>
> order ordDRBD0_DLM inf: msDRBD0:promote cloneDLM:start
>
>> order ordO2CB_FSDRBD0 inf: cloneO2CB cloneFSDRBD0
>>
>
>
>> if i take down both nodes and fire them up again, everything goes
>> back
>> to normal and msDRBD0 is promoted to master on both nodes.
>>
>> I suspect this has something to do with ordering or colocation
>> constraints
>> but i'm not sure though. i've been staring at this problem for 
>> dozens
>> of
>> times now and a vast amount of googling did not turn up my specific
>> problem either.
>
> I'm pretty sure you are correct.  I haven't used/tested OCFS on
> Pacemaker in awhile but I believe this is the correct
> ordering/collocation you're looking for (same as my notes above):
>
> Order - DRBD:promote then DLM:start then O2CB:start then FS:start
> Collocation - FS on O2CB on DLM on DRBD:master
>

Hi Jake!

Thanks very much for your comments!

To sum it up i rewrote all six order/colo statements here:

colocation colDLM_DRBD0 inf: cloneDLM msDRBD0:Master
colocation colO2CB_DLM inf: cloneO2CB cloneDLM
colocation colFSDRBD0_O2CB inf: cloneFSDRBD0 cloneO2CB

order ordDRBD0_DLM inf: msDRBD0:promote cloneDLM:start
order ordDLM_O2CB inf: cloneDLM cloneO2CB
order ordO2CB_FSDRBD0 inf: cloneO2CB cloneFSDRBD0

will try this sometime in the upcoming nights and will report back,
maybe in the meantime you could have a look at the statements again
to doublecheck? thanks in advance.

best regards,
Jürgen Herrmann

-- 
>> XLhost.de ® - Webhosting von supersmall bis eXtra Large <<

XLhost.de GmbH
Jürgen Herrmann, Geschäftsführer
Boelckestrasse 21, 93051 Regensburg, Germany

Geschäftsführer: Jürgen Herrmann
Registriert unter: HRB9918
Umsatzsteuer-Identifikationsnummer: DE245931218

Fon:  +49 (0)800 XLHOSTDE [0800 95467833]
Fax:  +49 (0)800 95467830
Web:  http://www.XLhost.de




More information about the Pacemaker mailing list