[ClusterLabs] ocf-tester always claims failure, even with built-in resource agents?
Antony Stone
Antony.Stone at ha.open.source.it
Fri Mar 26 10:28:24 EDT 2021
Hi.
I've just signed up to the list. I've been using corosync and pacemaker for
several years, mostly under Debian 9, which means:
corosync 2.4.2
pacemaker 1.1.16
I've recently upgraded a test cluster to Debian 10, which gives me:
corosync 3.0.1
pacemaker 2.0.1
I've made a few adjustments to my /etc/corosync/corosync.conf configuration so
that corosync seems happy, and also some minor changes (mostly to the cluster
defaults) in /etc/corosync/cluster.cib so that pacemaker is happy.
So far all is well and good, my cluster synchronises, starts the resources,
and everything's working as expected. It'll move the resources from one
cluster member to another (either if I ask it to, or if there's a problem),
and it seems to work just as the older version did.
Then, several times a day, I get resource failures such as:
* Asterisk_start_0 on castor 'insufficient privileges' (4):
call=58,
status=complete,
exitreason='',
last-rc-change='Fri Mar 26 13:37:08 2021',
queued=0ms,
exec=55ms
I have no idea why the machine might tell me it cannot start Asterisk due to
insufficient privilege when it's already been able to run it before the cluster
resources moved back to this machine. Asterisk *can* and *does* run on this
machine.
Another error I get is:
* Kann-Bear_monitor_5000 on helen 'unknown error' (1):
call=62,
status=complete,
exitreason='',
last-rc-change='Fri Mar 26 14:23:05 2021',
queued=0ms,
exec=0ms
Now, that second resource is one which doesn't have a standard resource agent
available for it under /usr/lib/ocf/resource.d, so I'm using the general-
purpose agent /usr/lib/ocf/resource.d/heartbeat/anything to manage it.
I thought, "perhaps there's something dodgy about using this 'anything' agent,
because it can't really know about the resource it's managing", so I tested it
with ocf-tester:
# ocf-tester -n Kann-Bear -o binfile="/usr/sbin/bearerbox" -o
cmdline_options="/etc/kannel/kannel.conf" -o
pidfile="/var/run/kannel/kannel_bearerbox.pid"
/usr/lib/ocf/resource.d/heartbeat/anything
Beginning tests for /usr/lib/ocf/resource.d/heartbeat/anything...
/usr/sbin/ocf-tester: 226: /usr/sbin/ocf-tester: xmllint: not found
* rc=127: Your agent produces meta-data which does not conform to ra-api-1.dtd
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
* Your agent does not support the reload action (optional)
Tests failed: /usr/lib/ocf/resource.d/heartbeat/anything failed 1 tests
Okay, something's not right.
BUT, it doesn't matter *which* resource agent I test, it tells me the same
thing every time, including for the built-in standard agents:
* rc=127: Your agent produces meta-data which does not conform to ra-api-1.dtd
For example:
# ocf-tester -n Asterisk /usr/lib/ocf/resource.d/heartbeat/asterisk
Beginning tests for /usr/lib/ocf/resource.d/heartbeat/asterisk...
/usr/sbin/ocf-tester: 226: /usr/sbin/ocf-tester: xmllint: not found
* rc=127: Your agent produces meta-data which does not conform to ra-api-1.dtd
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
* Your agent does not support the reload action (optional)
Tests failed: /usr/lib/ocf/resource.d/heartbeat/asterisk failed 1 tests
# ocf-tester -n IP-Float4 -o ip=10.1.0.42 -o cidr_netmask=28
/usr/lib/ocf/resource.d/heartbeat/IPaddr2
Beginning tests for /usr/lib/ocf/resource.d/heartbeat/IPaddr2...
/usr/sbin/ocf-tester: 226: /usr/sbin/ocf-tester: xmllint: not found
* rc=127: Your agent produces meta-data which does not conform to ra-api-1.dtd
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
* Your agent does not support the reload action (optional)
Tests failed: /usr/lib/ocf/resource.d/heartbeat/IPaddr2 failed 1 tests
So, it seems to be telling me that even the standard built-in resource agents
"produce meta-data which does not conform to ra-api-1.dtd"
My first question is: what's going wrong here? Am I using ocf-tester
incorrectly, or is it a bug?
My second question is: how can I debug what caused pacemaker to decide that it
couldn't run Asterisk due to "insufficient privileges" on a machine which is
perfectly well capacble of running Asterisk, and including when it gets
started by pacemaker (in fact, that's the only way Asterisk gets started on
these machines; it's a floating resource which pacemaker is in charge of).
Please let me know if I can provide any further information to help work out
what's going on here.
Thanks,
Antony.
--
"Hi, I've found a fault with the English language and I need an entomologist."
"I think you mean an etymologist."
"No. It's a bug, not a feature."
Please reply to the list;
please *don't* CC me.
More information about the Users
mailing list