[Pacemaker] Validate strategy for RA on DRBD standby node

Serge Dubrouski sergeyfd at gmail.com
Thu Feb 24 11:26:22 EST 2011


Ahh! I see, you need to use ocf_is_probe function in your RA to
isolate that case.

On Thu, Feb 24, 2011 at 9:17 AM, David McCurley <mac at fabric.com> wrote:
> I'm not trying to start it.  The problem is that my validate function was failing.  Here is the case:
>
> Deploy RA on both nodes (master DRBD and slave).
> Edit crm config to add the ldap resource, co_location,etc.
> Save the config and Pacemaker attempts to start the LDAP, but it also runs a check on both the master and the slave, and my validate was failing on the slave since it didn't have the file system resources for ldap available.
>
> We are in active/passive case so it is problems with my code when PM runs the monitor/validate check on the slave.  The live ldap instance is colocated with DRBD, filesystem, eg from crm configure show:
>
> node vcoresrv1 \
>        attributes standby="off"
> node vcoresrv2 \
>        attributes standby="off"
> primitive clusterip ocf:heartbeat:IPaddr2 \
>        params ip="192.168.1.4" cidr_netmask="24" nic="eth0" iflabel="cip" \
>        op monitor interval="30s"
> primitive clusteripsourcing ocf:heartbeat:IPsrcaddr \
>        params ipaddress="192.168.1.4" \
>        op monitor interval="10" timeout="20s" depth="0"
> primitive ldap ocf:fabric:openldap \
>    op monitor interval="10"
> primitive drbd_vcoreshare ocf:linbit:drbd \
>        params drbd_resource="r0" \
>        op start interval="0" timeout="240s" \
>        op stop interval="0" timeout="100s" \
>        op promote interval="0" timeout="90s" \
>        op demote interval="0" timeout="90s" \
>        op monitor interval="15s"
> primitive fs_vcoreshare ocf:heartbeat:Filesystem \
>        params device="/dev/drbd/by-res/r0" directory="/vcoreshare" fstype="ext4" \
>        op start interval="0" timeout="60s" \
>        op stop interval="0" timeout="60s"
> ms ms_drbd_vcoreshare drbd_vcoreshare \
>        meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
> colocation clusterip_with_vcoreshare inf: clusterip fs_vcoreshare
> colocation ipsourcing_with_clusterip inf: clusteripsourcing clusterip
> colocation vcoreshare_on_drbd inf: fs_vcoreshare ms_drbd_vcoreshare:Master
> colocation ldap_with_vcoreshare inf: ldap fs_vcoreshare
> order clusterip_after_vcoreshare inf: fs_vcoreshare clusterip
> order ldap_after_clusterip inf: clusterip ldap
> order ipsourcing_after_clusterip inf: clusterip clusteripsourcing
> order vcoreshare_after_drbd inf: ms_drbd_vcoreshare:promote fs_vcoreshare:start
> property $id="cib-bootstrap-options" \
>        dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
>        cluster-infrastructure="openais" \
>        expected-quorum-votes="2" \
>        stonith-enabled="false" \
>        no-quorum-policy="ignore"
> rsc_defaults $id="rsc-options" \
>        resource-stickiness="100"
>
>
> ----- Original Message -----
>> From: "Serge Dubrouski" <sergeyfd at gmail.com>
>> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
>> Sent: Thursday, February 24, 2011 11:05:56 AM
>> Subject: Re: [Pacemaker] Validate strategy for RA on DRBD standby node
>>
>> Why are you trying to start LDAP on a node where you don't have your
>> DRBD resource mounted. Having LDAP up on both nodes would make sense
>> if you were building an active/active LDAP cluster with syncrepl or
>> any other replication mechanism. In that case you'd set it up and M/S
>> and or as a clone and would have to provide access to the config file
>> on both nodes. In active/passive case you have to collocate your LDAP
>> resource with your DRBD and filesystem resources and Pacemaker won't
>> try to start LDAP on a node that doesn't have DRBD activated and
>> filesystem mounted.
>>
>> On Thu, Feb 24, 2011 at 6:06 AM, David McCurley <mac at fabric.com>
>> wrote:
>> > Pacemaker and list newbie here :)
>> >
>> > I'm writing a resource adapter in python for the newer release of
>> > OpenLDAP but I need some pointers on a strategy for the validate
>> > function in a certain case.  (In python because the more advanced
>> > shell scripting hurts my head :).  Here is the situation:
>> >
>> > The config file for OpenLDAP is stored in
>> > /etc/ldap/slapd.d/cn=config.ldif.  This is on a DRBD
>> > active-passive system and the /etc/ldap directory is actually a
>> > symlink to the DRBD controlled share /vcoreshare/etc/ldap.  The
>> > real config file is at
>> > /vcoreshare/etc/ldap/slapd.d/cn=config.ldif.
>> >
>> > So I'm trying to be very judicious with every function and
>> > validation, checking file permissions, etc.  But the problem is
>> > that /etc/ldap/slapd.d/cn=config.ldif is only present on the
>> > active DRBD node.  My validate function checks that the file is
>> > readable by the user/group that slapd is to run as.  Now, as soon
>> > as I start ldap in the cluster, it starts fine, but validate fails
>> > on the standby node (because the DRBD volume isn't mounted) and
>> > crm_mon shows a failed action:
>> > ----------------------------------------------
>> > ============
>> > Last updated: Wed Feb 23 07:35:19 2011
>> > Stack: openais
>> > Current DC: vcoresrv1 - partition with quorum
>> > Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
>> > 2 Nodes configured, 2 expected votes
>> > 5 Resources configured.
>> > ============
>> >
>> > Online: [ vcoresrv1 vcoresrv2 ]
>> >
>> > fs_vcoreshare   (ocf::heartbeat:Filesystem):    Started vcoresrv1
>> >  Master/Slave Set: ms_drbd_vcoreshare
>> >     Masters: [ vcoresrv1 ]
>> >     Slaves: [ vcoresrv2 ]
>> > clusterip       (ocf::heartbeat:IPaddr2):       Started vcoresrv1
>> > clusteripsourcing       (ocf::heartbeat:IPsrcaddr):     Started
>> > vcoresrv1
>> >
>> > Failed actions:
>> >    ldap_monitor_0 (node=vcoresrv2, call=130, rc=5,
>> >    status=complete): not installed
>> > ---------------------------------------------
>> >
>> > Is there a way for my RA to know that it is being called on the
>> > active node instead of the passive node.  Or more generally, what
>> > would anyone recommend here?  I really didn't want to write the
>> > resource adapter so it would be specific to our setup (e.g.
>> > checking to make sure the DRBD mount is readable before looking
>> > for the config files).  Maybe Pacemaker passes in some extra env
>> > variable that can be used?
>> >
>> > I'm reluctanct to post the code for the RA here in the list because
>> > it is 450 lines.  But, here is the logic for the validate
>> > function:
>> >
>> > if the appropriate slapd user and group do not exist:
>> >   return OCF_ERR_INSTALLED
>> > if the ldap config file doesn't exist or isn't readable by the
>> > slapd user:
>> >   return OCF_ERR_INSTALLED
>> > if the ldap binary doesn't exist or isn't executable:
>> >   return OCF_ERR_INSTALLED
>> > return OCF_SUCCESS
>> >
>> > Or maybe I'm overdoing it in my tests or have misinterpreted the
>> > "OCF Resource Agent Developer's Guide"?
>> >
>> > Any advice or guidance / clarification appreciated.
>> >
>> > Thanks,
>> >
>> > Mac
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs:
>> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>> >
>>
>>
>>
>> --
>> Serge Dubrouski.
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>



-- 
Serge Dubrouski.




More information about the Pacemaker mailing list