[ClusterLabs] ocf-tester always claims failure, even with built-in resource agents?

Andrei Borzenkov arvidjaar at gmail.com
Fri Mar 26 12:59:07 EDT 2021


On 26.03.2021 17:28, Antony Stone wrote:
> Hi.
> 
> I've just signed up to the list.  I've been using corosync and pacemaker for 
> several years, mostly under Debian 9, which means:
> 
> 	corosync 2.4.2
> 	pacemaker 1.1.16
> 
> I've recently upgraded a test cluster to Debian 10, which gives me:
> 
> 	corosync 3.0.1
> 	pacemaker 2.0.1
> 
> I've made a few adjustments to my /etc/corosync/corosync.conf configuration so 
> that corosync seems happy, and also some minor changes (mostly to the cluster 
> defaults) in /etc/corosync/cluster.cib so that pacemaker is happy.
> 
> So far all is well and good, my cluster synchronises, starts the resources, 
> and everything's working as expected.  It'll move the resources from one 
> cluster member to another (either if I ask it to, or if there's a problem), 
> and it seems to work just as the older version did.
> 
> Then, several times a day, I get resource failures such as:
> 
> 	* Asterisk_start_0 on castor 'insufficient privileges' (4):
> 	 call=58,
> 	 status=complete,
> 	 exitreason='',
> 	 last-rc-change='Fri Mar 26 13:37:08 2021',
> 	 queued=0ms,
> 	 exec=55ms
> 
> I have no idea why the machine might tell me it cannot start Asterisk due to 
> insufficient privilege when it's already been able to run it before the cluster 
> resources moved back to this machine.  Asterisk *can* and *does* run on this 
> machine.
> 
> Another error I get is:
> 
> 	* Kann-Bear_monitor_5000 on helen 'unknown error' (1):
> 	 call=62,
> 	 status=complete,
> 	 exitreason='',
> 	 last-rc-change='Fri Mar 26 14:23:05 2021',
> 	 queued=0ms,
> 	 exec=0ms
> 
> Now, that second resource is one which doesn't have a standard resource agent 
> available for it under /usr/lib/ocf/resource.d, so I'm using the general-
> purpose agent /usr/lib/ocf/resource.d/heartbeat/anything to manage it.
> 
> I thought, "perhaps there's something dodgy about using this 'anything' agent, 
> because it can't really know about the resource it's managing", so I tested it 
> with ocf-tester:
> 
> # ocf-tester -n Kann-Bear -o binfile="/usr/sbin/bearerbox" -o 
> cmdline_options="/etc/kannel/kannel.conf" -o 
> pidfile="/var/run/kannel/kannel_bearerbox.pid" 
> /usr/lib/ocf/resource.d/heartbeat/anything
> Beginning tests for /usr/lib/ocf/resource.d/heartbeat/anything...
> /usr/sbin/ocf-tester: 226: /usr/sbin/ocf-tester: xmllint: not found
> * rc=127: Your agent produces meta-data which does not conform to ra-api-1.dtd
> * Your agent does not support the notify action (optional)
> * Your agent does not support the demote action (optional)
> * Your agent does not support the promote action (optional)
> * Your agent does not support master/slave (optional)
> * Your agent does not support the reload action (optional)
> Tests failed: /usr/lib/ocf/resource.d/heartbeat/anything failed 1 tests
> 
> Okay, something's not right.
> 
> BUT, it doesn't matter *which* resource agent I test, it tells me the same 
> thing every time, including for the built-in standard agents:
> 
> * rc=127: Your agent produces meta-data which does not conform to ra-api-1.dtd
> 
> For example:
> 
> # ocf-tester -n Asterisk /usr/lib/ocf/resource.d/heartbeat/asterisk
> Beginning tests for /usr/lib/ocf/resource.d/heartbeat/asterisk...
> /usr/sbin/ocf-tester: 226: /usr/sbin/ocf-tester: xmllint: not found
> * rc=127: Your agent produces meta-data which does not conform to ra-api-1.dtd
> * Your agent does not support the notify action (optional)
> * Your agent does not support the demote action (optional)
> * Your agent does not support the promote action (optional)
> * Your agent does not support master/slave (optional)
> * Your agent does not support the reload action (optional)
> Tests failed: /usr/lib/ocf/resource.d/heartbeat/asterisk failed 1 tests
> 
> 
> # ocf-tester -n IP-Float4 -o ip=10.1.0.42 -o cidr_netmask=28 
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2
> Beginning tests for /usr/lib/ocf/resource.d/heartbeat/IPaddr2...
> /usr/sbin/ocf-tester: 226: /usr/sbin/ocf-tester: xmllint: not found
> * rc=127: Your agent produces meta-data which does not conform to ra-api-1.dtd
> * Your agent does not support the notify action (optional)
> * Your agent does not support the demote action (optional)
> * Your agent does not support the promote action (optional)
> * Your agent does not support master/slave (optional)
> * Your agent does not support the reload action (optional)
> Tests failed: /usr/lib/ocf/resource.d/heartbeat/IPaddr2 failed 1 tests
> 
> 
> So, it seems to be telling me that even the standard built-in resource agents 
> "produce meta-data which does not conform to ra-api-1.dtd"
> 
> 
> My first question is: what's going wrong here?  Am I using ocf-tester 
> incorrectly, or is it a bug?
> 

As is pretty clear from error messages, ocf-tester calls xmllint which
is missing.

> My second question is: how can I debug what caused pacemaker to decide that it 
> couldn't run Asterisk due to "insufficient privileges" on a machine which is 
> perfectly well capacble of running Asterisk, and including when it gets 
> started by pacemaker (in fact, that's the only way Asterisk gets started on 
> these machines; it's a floating resource which pacemaker is in charge of).
> 

Agent returns this error if it fails to chown directory specified in its
configuration file:

        # Regardless of whether we just created the directory or it
        # already existed, check whether it is writable by the configured
        # user
        if ! su -s /bin/sh - $OCF_RESKEY_user -c "test -w $dir"; then
            ocf_log warn "Directory $dir is not writable by
$OCF_RESKEY_user, attempting chown"
            ocf_run chown $OCF_RESKEY_user:$OCF_RESKEY_group $dir \
                || exit $OCF_ERR_PERM

Either agent does not run as root or something blocks chown. Usual
suspects are apparmor or SELinux.


> 
> Please let me know if I can provide any further information to help work out 
> what's going on here.
> 
> 
> Thanks,
> 
> 
> Antony.
> 



More information about the Users mailing list