[Pacemaker] Dual primary drbd resouce not promoted on one host

Jake Smith jsmith at argotec.com
Tue Feb 5 10:32:28 EST 2013


----- Original Message -----
> From: "Jürgen Herrmann" <Juergen.Herrmann at XLhost.de>
> To: pacemaker at oss.clusterlabs.org
> Sent: Tuesday, February 5, 2013 7:04:26 AM
> Subject: [Pacemaker] Dual primary drbd resouce not promoted on one host
> 
> Hi there!
> 
> I have the following problem:
> 
> I have a 2 node cluster with a dual primary drbd resource. On top
> of it sits an OCFS2 file system. nodes: app1a, app1b
> 
> Now today I had the following scenario (occurred several times now):
> - crm node standby app1a
> - poweroff app1a for hdd replacement (hw raid controller)
> - poweron app1a
> - crm node online app1a
> 
> all the other resources come back up as expecte, expect the master
> slave set for the dual primary drbd.
> 
> here's the relevant portion of my cluster config:
> 
> node app1a.xlhost.de \
>          attributes standby="off"
> node app1b.xlhost.de \
>          attributes standby="off"
> primitive resDLM ocf:pacemaker:controld \
>          op start interval="0" timeout="90s" \
>          op stop interval="0" timeout="100s" \
>          op monitor interval="120s"
> primitive resDRBD0 ocf:linbit:drbd \
>          op monitor interval="23" role="Slave" timeout="30" \
>          op monitor interval="13" role="Master" timeout="20" \
>          op start interval="0" timeout="240s" \
>          op promote interval="0" timeout="240s" \
>          op demote interval="0" timeout="100s" \
>          op stop interval="0" timeout="100s" \
>          params drbd_resource="drbd0"
> primitive resFSDRBD0 ocf:heartbeat:Filesystem \
>          params device="/dev/drbd0" directory="/mnt/drbd0"
> fstype="ocfs2" options="noatime,intr,nodiratime,heartbeat=none" \
>          op monitor interval="120s" timeout="50s" \
>          op start interval="0" timeout="70s" \
>          op stop interval="0" timeout="70s"
> primitive resO2CB ocf:pacemaker:o2cb \
>          op start interval="0" timeout="90s" \
>          op stop interval="0" timeout="100s" \
>          op monitor interval="120s"
> ms msDRBD0 resDRBD0 \
>          meta master-max="2" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" target-role="Master"
> clone cloneDLM resDLM \
>          meta globally-unique="false" interleave="true"
> target-role="Started"
> clone cloneFSDRBD0 resFSDRBD0 \
>          meta interleave="true" globally-unique="false"
> target-role="Started"
> clone cloneO2CB resO2CB \
>          meta globally-unique="false" interleave="true"
> target-role="Started"
> colocation colFSDRBD0_DRBD0 inf: cloneFSDRBD0 msDRBD0:Master

^^^ This colocation should be cloneDLM on msDRBD0.

> colocation colFSDRBD0_O2CB inf: cloneFSDRBD0 cloneO2CB
> colocation colO2CB_DLM inf: cloneO2CB cloneDLM
> order ordDLM_FSDRBD0 inf: cloneDLM cloneFSDRBD0

^^^ This order statement is not needed.

> order ordDLM_O2CB inf: cloneDLM cloneO2CB
> order ordDRBD0_FSDRBD0 inf: msDRBD0:promote cloneFSDRBD0

^^^ This order should be msDRBD0:promote then cloneDLM:start

If you explicitly define the action in an order statement for the resource then the same action is implied for the rest of the resources.  So your statement is going to try to promote cloneFSDRBD0. You should define both actions explicitly like this:

order ordDRBD0_DLM inf: msDRBD0:promote cloneDLM:start

> order ordO2CB_FSDRBD0 inf: cloneO2CB cloneFSDRBD0
> 


> if i take down both nodes and fire them up again, everything goes
> back
> to normal and msDRBD0 is promoted to master on both nodes.
> 
> I suspect this has something to do with ordering or colocation
> constraints
> but i'm not sure though. i've been staring at this problem for dozens
> of
> times now and a vast amount of googling did not turn up my specific
> problem either.

I'm pretty sure you are correct.  I haven't used/tested OCFS on Pacemaker in awhile but I believe this is the correct ordering/collocation you're looking for (same as my notes above):

Order - DRBD:promote then DLM:start then O2CB:start then FS:start
Collocation - FS on O2CB on DLM on DRBD:master

HTH

Jake

> 
> anybody have a clue? :) any hint in the right direction as where too
> look
> etc. would really be appreciated.
> 
> Thanks in advance for your help and best regards,
> Jürgen Herrmann
> --
> >> XLhost.de ® - Webhosting von supersmall bis eXtra Large <<
> 
> XLhost.de GmbH
> Jürgen Herrmann, Geschäftsführer
> Boelckestrasse 21, 93051 Regensburg, Germany
> 
> Geschäftsführer: Jürgen Herrmann
> Registriert unter: HRB9918
> Umsatzsteuer-Identifikationsnummer: DE245931218
> 
> Fon:  +49 (0)800 XLHOSTDE [0800 95467833]
> Fax:  +49 (0)800 95467830
> Web:  http://www.XLhost.de
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 




More information about the Pacemaker mailing list