[Pacemaker] None of the standard agents in ocf:heartbeat are working in centos 6

David Barchas dave at barchas.com
Mon Jul 23 10:38:21 EDT 2012


> Date: Mon, 23 Jul 2012 12:16:20 +0200
> From: Andreas Kurz 
> 
> On 07/23/2012 07:06 AM, David Barchas wrote:
> > Hello.
> > 
> > I have been working on this for 3 days now, and must be so stressed out
> > that I am being blinded to what is probably an obvious cause of this. In
> > a word, HELP.
> > 
> > I am trying specifically to utilize ocf:heartbeat:IPaddr2, but this
> > issue seems to occur with any of the ocf:heartbeat agents. I will just
> > focus on IPaddr2 for purposes of figuring this out, but it happens
> > exactly the same with any of the default agents. However, I can
> > successfully use ocf:linbit:drbd for example. it seems to be limited to
> > the RAs that are installed along with coro/pace in the resource-agents
> > package.
> > 
> 
> 
> What are the exact package versions you have installed?
> 
> pacemaker*
> resource-agents
> cluster-glue*
> 
bah, all the info i provide and miss that.
clusterlib-3.0.12.1-32.el6.x86_64
cluster-glue-1.0.5-6.el6.x86_64
cluster-glue-libs-1.0.5-6.el6.x86_64

pacemaker-cli-1.1.7-6.el6.x86_64
pacemaker-libs-1.1.7-6.el6.x86_64
pacemaker-cluster-libs-1.1.7-6.el6.x86_64
pacemaker-1.1.7-6.el6.x86_64

resource-agents-3.9.2-12.el6.x86_64

my full rpm -qa  just in case its helpful http://pastebin.com/d2y7Sii4 
> 
> 
> > 
> > I am using CentOS 6.3, fully updated (though this happens in 6.2 with no
> > updates as well). Install pacemaker/coro from default repo. I have
> > stripped everything down to figure this out in vmware and just install
> > centos, update it, install pace/coro (no drbd for this discussion),
> > configure coro, and then start it. pacemaker starts up fine (or at least
> > I think its fine). I can set quorum ignore for example from crm. (crm
> > configure property no-quorum-policy="ignore")
> > 
> > here is the process list
> > root 1447 0.3 0.6 556080 6636 ? Ssl 21:09 0:00 corosync
> > 499 1453 0.0 0.5 88720 5556 ? S 21:09 0:00 \_
> > /usr/libexec/pacemaker/cib
> > root 1454 0.0 0.3 86968 3488 ? S 21:09 0:00 \_
> > /usr/libexec/pacemaker/stonithd
> > root 1455 0.0 0.2 76188 2492 ? S 21:09 0:00 \_
> > /usr/lib64/heartbeat/lrmd
> > 499 1456 0.0 0.3 91160 3432 ? S 21:09 0:00 \_
> > /usr/libexec/pacemaker/attrd
> > 499 1457 0.0 0.3 87440 3824 ? S 21:09 0:00 \_
> > /usr/libexec/pacemaker/pengine
> > 499 1458 0.0 0.3 91312 3884 ? S 21:09 0:00 \_
> > /usr/libexec/pacemaker/crmd
> > 
> 
> 
> so you are using plugin version 0 to start Pacemaker .... That would
> explain why /etc/init.d/pacemaker is unable to start ... it is already
> started by Corosync.

i mostly included that info "just in case" and because its confusing to me that I can't start pacemaker even from fresh install before configuring or starting corosync. 
> 
> > 
> > 499 is hacluster btw.
> > 
> > ***BUT***
> > 
> > When I run as root the following:
> > # crm ra meta ocf:heartbeat:IPaddr2
> > 
> > I get this response:
> > lrmadmin[1484]: 2012/07/22_13:28:23 ERROR:
> > lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply
> > message of rmetadata with function get_ret_from_msg.
> > ERROR: ocf:heartbeat:IPaddr2: could not parse meta-data: 
> > 
> > And this is in /var/log/messages:
> > Jul 22 16:35:14 MST lrmd: [48093]: ERROR: get_resource_meta: pclose
> > failed: Resource temporarily unavailable
> > Jul 22 16:35:14 MST lrmd: [48093]: WARN: on_msg_get_metadata: empty
> > metadata for ocf::heartbeat::IPaddr2.
> > Jul 22 16:35:14 MST lrmd: [48093]: WARN: G_SIG_dispatch: Dispatch
> > function for SIGCHLD was delayed 200 ms (> 100 ms) before being called
> > (GSource: 0x187df10)
> > Jul 22 16:35:14 MST lrmd: [48093]: info: G_SIG_dispatch: started at
> > 429616889 should have started at 429616869
> > Jul 22 16:35:14 MST lrmadmin: [48254]: ERROR:
> > lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply
> > message of rmetadata with function get_ret_from_msg.
> > 
> > I am using crm ra meta as a way to test, but crm will not accept my
> > trying to add the resource as a primitive either.
> > 
> > In my research, I have found that often it's permissions. So just to
> > rule that out i set my entire system to 777 permissions. no joy.
> > 
> > Another suggestion i find often has been to set OCF_ROOT (export
> > OCF_ROOT=/usr/lib/ocf) and then do
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2 meta-data.
> > That produces the desired output. But does not work before i export. 
> > And CRM still does not accept my meta request 
> > 
> > Another suggestion i find is to make sure that shellfuncs exists in the
> > agents folder. the soft links exist
> > lrwxrwxrwx. 1 root root 32 Jul 22 04:08 .ocf-binaries ->
> > ../../lib/heartbeat/ocf-binaries
> > lrwxrwxrwx. 1 root root 35 Jul 22 04:08 .ocf-directories ->
> > ../../lib/heartbeat/ocf-directories
> > lrwxrwxrwx. 1 root root 35 Jul 22 04:08 .ocf-returncodes ->
> > ../../lib/heartbeat/ocf-returncodes
> > lrwxrwxrwx. 1 root root 34 Jul 22 04:08 .ocf-shellfuncs ->
> > ../../lib/heartbeat/ocf-shellfuncs
> > 
> > And just to make sure I did un-hidden soft links as well with no joy.
> 
> Strange, that errors are typically related to wrong paths for
> initialization of environment and helper functions:
> 
> # Initialization:
> 
> : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
> . ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
> 
> DRBD agent has an extra failback check, that may be the reason that it
> still works ...
> 
> # Resource-agents have moved their ocf-shellfuncs file around.
> # There are supposed to be symlinks or wrapper files in the old location,
> # pointing to the new one, but people seem to get it wrong all the time.
> # Try several locations.
> 
> if test -n "${OCF_FUNCTIONS_DIR}" ; then
> if test -e "${OCF_FUNCTIONS_DIR}/ocf-shellfuncs" ; then
> . "${OCF_FUNCTIONS_DIR}/ocf-shellfuncs"
> elif test -e "${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs" ; then
> . "${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs"
> fi
> else
> if test -e "${OCF_ROOT}/lib/heartbeat/ocf-shellfuncs" ; then
> . "${OCF_ROOT}/lib/heartbeat/ocf-shellfuncs"
> elif test -e "${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs"; then
> . "${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs"
> fi
> fi
> 
I noticed this as well, and I tried updating the IPaddr2 agent to use the directory code from DRBD (what you have above) with no success either.
Though, pace was already running. I assume it doesn't load all the agents into ram and never read them again. Instead executing them when needed. So no caching issue.
i am going to try that again though because it really does sound like it could fix it. Though not explain why its busted in the first place. right now i'll take a hack though if it works. Pretty sure it won't though.
> 
> 
> Regards,
> Andreas

thanks for the help. greatly appreciated. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120723/2ef226e1/attachment-0003.html>


More information about the Pacemaker mailing list