[ClusterLabs] CIB: op-status=4 ?

Thu May 18 17:37:27 EDT 2017

On 05/17/2017 06:10 PM, Radoslaw Garbacz wrote:
> Hi,
> 
> I have a question regarding '<lrm_resource><lrm_rsc_op> 'op-status
> attribute getting value 4.
> 
> In my case I have a strange behavior, when resources get those "monitor"
> operation entries in the CIB with op-status=4, and they do not seem to
> be called (exec-time=0).
> 
> What does 'op-status' = 4 mean?

The action had an error status

> 
> I would appreciate some elaboration regarding this, since this is
> interpreted by pacemaker as an error, which causes logs:
> crm_mon:    error: unpack_rsc_op:    Preventing dbx_head_head from
> re-starting anywhere: operation monitor failed 'not configured' (6)

The rc-code="6" is the more interesting number; it's the result returned
by the resource agent. As you can see above, it means "not configured".
What that means exactly is up to the resource agent's interpretation.

> and I am pretty sure the resource agent was not called (no logs,
> exec-time=0)

Normally this could only come from the resource agent.

However there are two cases where pacemaker generates this error itself:
if the resource definition in the CIB is invalid; and if your version of
pacemaker was compiled with support for reading sensitive parameter
values from a file, but that file could not be read.

It doesn't sound like your case is either one of those though, since
they would prevent the resource from even starting. Most likely it's
coming from the resource agent. I'd look at the resource agent source
code and see where it can return OCF_ERR_CONFIGURED.

> There are two aspects of this:
> 
> 1) harmless (pacemaker seems to not bother about it), which I guess
> indicates cancelled monitoring operations:
> op-status=4, rc-code=189

This error means the connection between the crmd and lrmd daemons was
lost -- most commonly, that shows up for operations that were pending at
shutdown.

> 
> * Example:
> <lrm_rsc_op id="dbx_first_datas_last_failure_0"
> operation_key="dbx_first_datas_monitor_0" operation="monitor"
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.12"
> transition-key="38:0:7:c8b63d9d-9c70-4f99-aa1b-e993de6e4739"
> transition-magic="4:189;38:0:7:c8b63d9d-9c70-4f99-aa1b-e993de6e4739"
> on_node="olegdbx61-vm000001" call-id="10" rc-code="189" op-status="4"
> interval="0" last-run="1495057378" last-rc-change="1495057378"
> exec-time="0" queue-time="0" op-digest="f6bd1386a336e8e6ee25ecb651a9efb6"/>
> 
> 
> 2) error level one (op-status=4, rc-code=6), which generates logs:
> crm_mon:    error: unpack_rsc_op:    Preventing dbx_head_head from
> re-starting anywhere: operation monitor failed 'not configured' (6)
> 
> * Example:
> <lrm_rsc_op id="dbx_head_head_last_failure_0"
> operation_key="dbx_head_head_monitor_0" operation="monitor"
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.12"
> transition-key="39:0:7:c8b63d9d-9c70-4f99-aa1b-e993de6e4739"
> transition-magic="4:6;39:0:7:c8b63d9d-9c70-4f99-aa1b-e993de6e4739"
> on_node="olegdbx61-vm000001" call-id="999999999" rc-code="6"
> op-status="4" interval="0" last-run="1495057389"
> last-rc-change="1495057389" exec-time="0" queue-time="0"
> op-digest="60cdc9db1c5b77e8dba698d3d0c8cda8"/>
> 
> 
> Could it be some hardware (VM hyperviser) issue?
> 
> 
> Thanks in advance,
> 
> -- 
> Best Regards,
> 
> Radoslaw Garbacz
> XtremeData Incorporated