[Pacemaker] Validate strategy for RA on DRBD standby node

Thu Feb 24 11:17:57 EST 2011

I'm not trying to start it.  The problem is that my validate function was failing.  Here is the case:

Deploy RA on both nodes (master DRBD and slave).
Edit crm config to add the ldap resource, co_location,etc.
Save the config and Pacemaker attempts to start the LDAP, but it also runs a check on both the master and the slave, and my validate was failing on the slave since it didn't have the file system resources for ldap available.

We are in active/passive case so it is problems with my code when PM runs the monitor/validate check on the slave.  The live ldap instance is colocated with DRBD, filesystem, eg from crm configure show:

node vcoresrv1 \
	attributes standby="off"
node vcoresrv2 \
	attributes standby="off"
primitive clusterip ocf:heartbeat:IPaddr2 \
	params ip="192.168.1.4" cidr_netmask="24" nic="eth0" iflabel="cip" \
	op monitor interval="30s"
primitive clusteripsourcing ocf:heartbeat:IPsrcaddr \
	params ipaddress="192.168.1.4" \
	op monitor interval="10" timeout="20s" depth="0"
primitive ldap ocf:fabric:openldap \
    op monitor interval="10"
primitive drbd_vcoreshare ocf:linbit:drbd \
	params drbd_resource="r0" \
	op start interval="0" timeout="240s" \
	op stop interval="0" timeout="100s" \
	op promote interval="0" timeout="90s" \
	op demote interval="0" timeout="90s" \
	op monitor interval="15s"
primitive fs_vcoreshare ocf:heartbeat:Filesystem \
	params device="/dev/drbd/by-res/r0" directory="/vcoreshare" fstype="ext4" \
	op start interval="0" timeout="60s" \
	op stop interval="0" timeout="60s"
ms ms_drbd_vcoreshare drbd_vcoreshare \
	meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
colocation clusterip_with_vcoreshare inf: clusterip fs_vcoreshare
colocation ipsourcing_with_clusterip inf: clusteripsourcing clusterip
colocation vcoreshare_on_drbd inf: fs_vcoreshare ms_drbd_vcoreshare:Master
colocation ldap_with_vcoreshare inf: ldap fs_vcoreshare
order clusterip_after_vcoreshare inf: fs_vcoreshare clusterip
order ldap_after_clusterip inf: clusterip ldap
order ipsourcing_after_clusterip inf: clusterip clusteripsourcing
order vcoreshare_after_drbd inf: ms_drbd_vcoreshare:promote fs_vcoreshare:start
property $id="cib-bootstrap-options" \
	dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
	cluster-infrastructure="openais" \
	expected-quorum-votes="2" \
	stonith-enabled="false" \
	no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
	resource-stickiness="100"

----- Original Message -----
> From: "Serge Dubrouski" <sergeyfd at gmail.com>
> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Thursday, February 24, 2011 11:05:56 AM
> Subject: Re: [Pacemaker] Validate strategy for RA on DRBD standby node
>
> Why are you trying to start LDAP on a node where you don't have your
> DRBD resource mounted. Having LDAP up on both nodes would make sense
> if you were building an active/active LDAP cluster with syncrepl or
> any other replication mechanism. In that case you'd set it up and M/S
> and or as a clone and would have to provide access to the config file
> on both nodes. In active/passive case you have to collocate your LDAP
> resource with your DRBD and filesystem resources and Pacemaker won't
> try to start LDAP on a node that doesn't have DRBD activated and
> filesystem mounted.
>
> On Thu, Feb 24, 2011 at 6:06 AM, David McCurley <mac at fabric.com>
> wrote:
> > Pacemaker and list newbie here :)
> >
> > I'm writing a resource adapter in python for the newer release of
> > OpenLDAP but I need some pointers on a strategy for the validate
> > function in a certain case.  (In python because the more advanced
> > shell scripting hurts my head :).  Here is the situation:
> >
> > The config file for OpenLDAP is stored in
> > /etc/ldap/slapd.d/cn=config.ldif.  This is on a DRBD
> > active-passive system and the /etc/ldap directory is actually a
> > symlink to the DRBD controlled share /vcoreshare/etc/ldap.  The
> > real config file is at
> > /vcoreshare/etc/ldap/slapd.d/cn=config.ldif.
> >
> > So I'm trying to be very judicious with every function and
> > validation, checking file permissions, etc.  But the problem is
> > that /etc/ldap/slapd.d/cn=config.ldif is only present on the
> > active DRBD node.  My validate function checks that the file is
> > readable by the user/group that slapd is to run as.  Now, as soon
> > as I start ldap in the cluster, it starts fine, but validate fails
> > on the standby node (because the DRBD volume isn't mounted) and
> > crm_mon shows a failed action:
> > ----------------------------------------------
> > ============
> > Last updated: Wed Feb 23 07:35:19 2011
> > Stack: openais
> > Current DC: vcoresrv1 - partition with quorum
> > Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
> > 2 Nodes configured, 2 expected votes
> > 5 Resources configured.
> > ============
> >
> > Online: [ vcoresrv1 vcoresrv2 ]
> >
> > fs_vcoreshare   (ocf::heartbeat:Filesystem):    Started vcoresrv1
> >  Master/Slave Set: ms_drbd_vcoreshare
> >     Masters: [ vcoresrv1 ]
> >     Slaves: [ vcoresrv2 ]
> > clusterip       (ocf::heartbeat:IPaddr2):       Started vcoresrv1
> > clusteripsourcing       (ocf::heartbeat:IPsrcaddr):     Started
> > vcoresrv1
> >
> > Failed actions:
> >    ldap_monitor_0 (node=vcoresrv2, call=130, rc=5,
> >    status=complete): not installed
> > ---------------------------------------------
> >
> > Is there a way for my RA to know that it is being called on the
> > active node instead of the passive node.  Or more generally, what
> > would anyone recommend here?  I really didn't want to write the
> > resource adapter so it would be specific to our setup (e.g.
> > checking to make sure the DRBD mount is readable before looking
> > for the config files).  Maybe Pacemaker passes in some extra env
> > variable that can be used?
> >
> > I'm reluctanct to post the code for the RA here in the list because
> > it is 450 lines.  But, here is the logic for the validate
> > function:
> >
> > if the appropriate slapd user and group do not exist:
> >   return OCF_ERR_INSTALLED
> > if the ldap config file doesn't exist or isn't readable by the
> > slapd user:
> >   return OCF_ERR_INSTALLED
> > if the ldap binary doesn't exist or isn't executable:
> >   return OCF_ERR_INSTALLED
> > return OCF_SUCCESS
> >
> > Or maybe I'm overdoing it in my tests or have misinterpreted the
> > "OCF Resource Agent Developer's Guide"?
> >
> > Any advice or guidance / clarification appreciated.
> >
> > Thanks,
> >
> > Mac
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
>
>
>
> --
> Serge Dubrouski.
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>