[Pacemaker] Dual primary drbd + ocfs2: problems starting o2cb

Elmar Marschke elmar.marschke at schenker.at
Fri Aug 16 22:30:34 EDT 2013


Am 16.08.2013 15:46, schrieb Jake Smith:
>> -----Original Message-----
>> From: Elmar Marschke [mailto:elmar.marschke at schenker.at]
>> Sent: Friday, August 16, 2013 9:05 AM
>> To: The Pacemaker cluster resource manager
>> Subject: [Pacemaker] Dual primary drbd + ocfs2: problems starting o2cb
>>
>> Hi all,
>>
>> i'm working on a two node pacemaker cluster with dual primary drbd and
>> ocfs2.
>>
>> Dual pri drbd and ocfs2 WITHOUT pacemaker work fine (mounting, reading,
>> writing, everything...).
>>
>> When i try to make this work in pacemaker, there seems to be a problem
> to
>> start the o2cb resource.
>>
>> My (already simplified) configuration is:
>> -----------------------------------------
>> node poc1 \
>> 	attributes standby="off"
>> node poc2 \
>> 	attributes standby="off"
>> primitive res_dlm ocf:pacemaker:controld \
>> 	op monitor interval="120"
>> primitive res_drbd ocf:linbit:drbd \
>> 	params drbd_resource="r0" \
>> 	op stop interval="0" timeout="100" \
>> 	op start interval="0" timeout="240" \
>> 	op promote interval="0" timeout="90" \
>> 	op demote interval="0" timeout="90" \
>> 	op notifiy interval="0" timeout="90" \
>> 	op monitor interval="40" role="Slave" timeout="20" \
>> 	op monitor interval="20" role="Master" timeout="20"
>> primitive res_o2cb ocf:pacemaker:o2cb \
>> 	op monitor interval="60"
>> ms ms_drbd res_drbd \
>> 	meta notify="true" master-max="2" master-node-max="1" target-
>> role="Started"
>> property $id="cib-bootstrap-options" \
>> 	no-quorum-policy="ignore" \
>> 	stonith-enabled="false" \
>> 	dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
>> 	cluster-infrastructure="openais" \
>> 	expected-quorum-votes="2" \
>> 	last-lrm-refresh="1376574860"
>>
>
> Looks like you are missing ordering and colocation and clone (even group
> to make it a shorter config; group = order and colocation in one
> statement) statements.  The resources *must* start in a particular order
> and they much run on the same node and there must be an instance of each
> resource on each node.
>
> More here for DRBD 8.4:
> http://www.drbd.org/users-guide/s-ocfs2-pacemaker.html
> Or DRBD 8.3:
> http://www.drbd.org/users-guide-8.3/s-ocfs2-pacemaker.html
>
> Basically add:
> Group grp_dlm_o2cb res_dlm res_o2cb
> Clone cl_dlm_o2cb grp_dlm_o2cb meta interleave=true
> Order ord_drbd_then_dlm_o2cb  res_drbd:promote cl_dlm_o2cb:start
> Colocation col_dlm_o2cb_with_drbdmaster cl_dlm_o2cb res_drbd:Master
>
> HTH
>
> Jake
>

Hello Jake,

thanks for your reply. I already had res_dlm and res_o2cb grouped 
together and cloned like in your advice; indeed this was my initial 
configuration. But the problem showed up, so i tried to simplify the 
configuration to reduce possible error sources.

But now it seems i found a solution; or at least a workaround: i just 
use the LSB resource agent lsb:o2cb. This one works! The resource starts 
without a problem on both nodes and as far as i can see right now 
everything is fine (tried with and without additional group and clone 
resource).

Don't know if this will bring some drawbacks in the future; but for the 
moment my problem seems to be solved.

Currently it seems to me that there's a subtle problem with the 
ocf:pacemaker:o2cb resource agent; at least on my system.

Anyway, thanks a lot for your answer..!
Best regards
elmar


>
>> First error message in corosync.log as far as i can identify it:
>> ----------------------------------------------------------------
>> lrmd: [5547]: info: RA output: (res_dlm:probe:stderr) dlm_controld.pcmk:
>> no process found
>> [ other stuff ]
>> lrmd: [5547]: info: RA output: (res_dlm:start:stderr) dlm_controld.pcmk:
>> no process found
>> [ other stuff ]
>>    lrmd: [5547]: info: RA output: (res_o2cb:start:stderr)
>> 2013/08/16_13:25:20 ERROR: ocfs2_controld.pcmk did not come up
>>
>> (
>> You can find the whole corosync logfile (starting corosync on node 1
> from
>> beginning until after starting of resources) on:
>> http://www.marschke.info/corosync_drei.log
>> )
>>
>> syslog shows:
>> -------------
>> ocfs2_controld.pcmk[5774]: Unable to connect to CKPT: Object does not
>> exist
>>
>>
>> Output of crm_mon:
>> ------------------
>> ============
>> Stack: openais
>> Current DC: poc1 - partition WITHOUT quorum
>> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
>> 2 Nodes configured, 2 expected votes
>> 4 Resources configured.
>> ============
>>
>> Online: [ poc1 ]
>> OFFLINE: [ poc2 ]
>>
>>    Master/Slave Set: ms_drbd [res_drbd]
>>        Masters: [ poc1 ]
>>        Stopped: [ res_drbd:1 ]
>>    res_dlm	(ocf::pacemaker:controld):	Started poc1
>>
>> Migration summary:
>> * Node poc1:
>>      res_o2cb: migration-threshold=1000000 fail-count=1000000
>>
>> Failed actions:
>>       res_o2cb_start_0 (node=poc1, call=6, rc=1, status=complete):
>> unknown error
>>
>> ---------------------------------------------------------------------
>> This is the situation after a reboot of node poc1. For simplification i
> left
>> pacemaker / corosync unstarted on the second node, and already removed a
>> group and a clone resource where dlm and o2cb already had been in
> (errors
>> were there also).
>>
>> Is my configuration of the resource agents correct?
>> I checked using "ra meta ...", but as far as i recognized everything is
> ok.
>>
>> Is some piece of software missing?
>> dlm-pcmk is installed, ocfs2_controld.pcmk and dlm_controld.pcmk are
>> available, i even did additional links in /usr/sbin:
>> root at poc1:~# which ocfs2_controld.pcmk
>> /usr/sbin/ocfs2_controld.pcmk
>> root at poc1:~# which dlm_controld.pcmk
>> /usr/sbin/dlm_controld.pcmk
>> root at poc1:~#
>>
>> I already googled but couldn't find any useful. Thanks for any
> hints...:)
>>
>> kind regards
>> elmar
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>




More information about the Pacemaker mailing list