[ClusterLabs] Finding attributes of a past resource agent invocation
Ondrej
ondrej-clusterlabs at famera.cz
Tue Mar 3 19:10:23 EST 2020
On 3/3/20 11:22 PM, wferi at niif.hu wrote:
> Hi,
>
> I suffered unexpected fencing under Pacemaker 2.0.1. I set a resource
> to unmanaged (crm_resource -r vm-invtest -m -p is-managed -v false),
> then played with ocf-tester, which left the resource stopped. Finally I
> deleted the resource (crm_resource -r vm-invtest --delete -t primitive),
> which led to:
>
> pacemaker-controld[11670]: notice: State transition S_IDLE -> S_POLICY_ENGINE
> pacemaker-schedulerd[11669]: notice: Clearing failure of vm-invtest on inv1 because resource parameters have changed
> pacemaker-schedulerd[11669]: warning: Processing failed monitor of vm-invtest on inv1: not running
> pacemaker-schedulerd[11669]: warning: Detected active orphan vm-invtest running on inv1
> pacemaker-schedulerd[11669]: notice: Clearing failure of vm-invtest on inv1 because it is orphaned
> pacemaker-schedulerd[11669]: notice: * Stop vm-invtest ( inv1 ) due to node availability
> pacemaker-schedulerd[11669]: notice: Calculated transition 959, saving inputs in /var/lib/pacemaker/pengine/pe-input-87.bz2
> pacemaker-controld[11670]: notice: Initiating stop operation vm-invtest_stop_0 on inv1
> pacemaker-controld[11670]: notice: Transition 959 aborted by deletion of lrm_rsc_op[@id='vm-invtest_last_failure_0']: Resource operation removal
> pacemaker-controld[11670]: warning: Action 6 (vm-invtest_stop_0) on inv1 failed (target: 0 vs. rc: 6): Error
> pacemaker-controld[11670]: notice: Transition 959 (Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-87.bz2): Complete
> pacemaker-schedulerd[11669]: warning: Processing failed stop of vm-invtest on inv1: not configured
> pacemaker-schedulerd[11669]: error: Preventing vm-invtest from re-starting anywhere: operation stop failed 'not configured' (6)
> pacemaker-schedulerd[11669]: warning: Processing failed stop of vm-invtest on inv1: not configured
> pacemaker-schedulerd[11669]: error: Preventing vm-invtest from re-starting anywhere: operation stop failed 'not configured' (6)
> pacemaker-schedulerd[11669]: warning: Cluster node inv1 will be fenced: vm-invtest failed there
> pacemaker-schedulerd[11669]: warning: Detected active orphan vm-invtest running on inv1
> pacemaker-schedulerd[11669]: warning: Scheduling Node inv1 for STONITH
> pacemaker-schedulerd[11669]: notice: Stop of failed resource vm-invtest is implicit after inv1 is fenced
> pacemaker-schedulerd[11669]: notice: * Fence (reboot) inv1 'vm-invtest failed there'
> pacemaker-schedulerd[11669]: notice: * Move fencing-inv3 ( inv1 -> inv2 )
> pacemaker-schedulerd[11669]: notice: * Stop vm-invtest ( inv1 ) due to node availability
>
> The OCF resource agent (on inv1) reported that it failed to validate one
> of the attributes passed to it for the stop operation, hence the "not
> configured" error, which caused the fencing. Is there a way to find out
> what attributes were passed to the OCF agent in that fateful invocation?
> I've got pe-input files, Pacemaker detail logs and a hard time wading
> through them. I failed to reproduce the issue till now (but I haven't
> rewound the CIB yet).
>
Hi Feri,
> Is there a way to find out what attributes were passed to the OCF
agent in that fateful invocation?
Basically same as with any other operation while the resource was
configured (with exception of ACTION which was 'stop' in case of
stopping resource).
As you have the pe-input files which contains the attributes of the
resource you can get the attributes and their values from there.
==
For example if I have tried to delete my test resource with same name,
the following can be found in pe-input file
...
<primitive class="ocf" id="vm-invtest" provider="pacemaker"
type="Dummy">
<meta_attributes id="vm-invtest-meta_attributes">
<nvpair id="vm-invtest-meta_attributes-target-role"
name="target-role" value="Stopped"/>
</meta_attributes>
<instance_attributes id="vm-invtest-instance_attributes">
<nvpair id="vm-invtest-instance_attributes-fake" name="fake"
value="some_value"/>
</instance_attributes>
<operations>
<op id="vm-invtest-migrate_from-interval-0s" interval="0s"
name="migrate_from" timeout="20s"/>
<op id="vm-invtest-migrate_to-interval-0s" interval="0s"
name="migrate_to" timeout="20s"/>
<op id="vm-invtest-monitor-interval-10s" interval="10s"
name="monitor" timeout="20s"/>
<op id="vm-invtest-reload-interval-0s" interval="0s"
name="reload" timeout="20s"/>
<op id="vm-invtest-start-interval-0s" interval="0s"
name="start" timeout="20s"/>
<op id="vm-invtest-stop-interval-0s" interval="0s"
name="stop" timeout="20s"/>
</operations>
</primitive>
...
From above you can see that cluster will be stopping it because of the
'name="target-role" value="Stopped"'. Also you can see that this
resource has one attribute (nvpair) with value - name="fake
value="some_value"'. Taking inspiration from
/usr/lib/ocf/resource.d/pacemaker/Dummy I can see that resource agent
will be called like
"/usr/lib/ocf/resource.d/pacemaker/Dummy stop" and there will be at
minimum $OCF_RESKEY_fake variable passed to it. If you can reproduce the
same issue you can try to dump all variables to file when validation
fails (take inspiration from function 'dump_env()' of Dummy resource).
So if you wanna check what attributes were set around the time of
deletion have a look at /var/lib/pacemaker/pengine/pe-input-87.bz2 or
maybe /var/lib/pacemaker/pengine/pe-input-86.bz2.
--
Ondrej Famera
More information about the Users
mailing list