[ClusterLabs] CIB: op-status=4 ?

Tue May 23 02:34:39 UTC 2017

On 05/19/2017 02:03 PM, Radoslaw Garbacz wrote:
> Hi,
> 
> I have some more information regarding this issue (pacemaker debug logs).
> 
> Firstly, I have not mentioned probably important facts:
> 1) this happen rarely
> 2) this happen only on first boot
> 3) turning on debug in corosync/pacemaker significantly reduced
> frequency of this happening, i.e. without debug every ~7 cluster
> creation, with debug every ~66 cluster creation.
> 
> This is a 3 nodes cluster on Azure Cloud and it does not seem like the
> resource agent is reporting an error, because all nodes logs proper "not
> running" results:
> 
> The resource in question name is "dbx_head_head".
> 
> node1)
> May 19 13:15:41 [6872] olegdbx39-vm-0 stonith-ng:    debug:
> xml_patch_version_check:    Can apply patch 2.5.32 to 2.5.31
> head.ocf.sh <http://head.ocf.sh>(dbx_head_head)[7717]:   
> 2017/05/19_13:15:42 DEBUG: head_monitor: return 7
> May 19 13:15:42 [6873] olegdbx39-vm-0       lrmd:    debug:
> operation_finished:    dbx_head_head_monitor_0:7717 - exited with rc=7
> May 19 13:15:42 [6873] olegdbx39-vm-0       lrmd:    debug:
> operation_finished:    dbx_head_head_monitor_0:7717:stderr [ -- empty -- ]
> May 19 13:15:42 [6873] olegdbx39-vm-0       lrmd:    debug:
> operation_finished:    dbx_head_head_monitor_0:7717:stdout [ -- empty -- ]
> May 19 13:15:42 [6873] olegdbx39-vm-0       lrmd:    debug:
> log_finished:    finished - rsc:dbx_head_head action:monitor call_id:14
> pid:7717 exit-code:7 exec-time:932ms queue-time:0ms
> 
> 
> node2)
> May 19 13:15:41 [6266] olegdbx39-vm000002 stonith-ng:    debug:
> xml_patch_version_check:    Can apply patch 2.5.31 to 2.5.30
> head.ocf.sh <http://head.ocf.sh>(dbx_head_head)[6485]:   
> 2017/05/19_13:15:41 DEBUG: head_monitor: return 7
> May 19 13:15:41 [6267] olegdbx39-vm000002       lrmd:    debug:
> operation_finished:    dbx_head_head_monitor_0:6485 - exited with rc=7
> May 19 13:15:41 [6267] olegdbx39-vm000002       lrmd:    debug:
> operation_finished:    dbx_head_head_monitor_0:6485:stderr [ -- empty -- ]
> May 19 13:15:41 [6267] olegdbx39-vm000002       lrmd:    debug:
> operation_finished:    dbx_head_head_monitor_0:6485:stdout [ -- empty -- ]
> May 19 13:15:41 [6267] olegdbx39-vm000002       lrmd:    debug:
> log_finished:    finished - rsc:dbx_head_head action:monitor call_id:14
> pid:6485 exit-code:7 exec-time:790ms queue-time:0ms
> May 19 13:15:41 [6266] olegdbx39-vm000002 stonith-ng:    debug:
> xml_patch_version_check:    Can apply patch 2.5.32 to 2.5.31
> May 19 13:15:41 [6266] olegdbx39-vm000002 stonith-ng:    debug:
> xml_patch_version_check:    Can apply patch 2.5.33 to 2.5.32
> 
> 
> node3)
> ==  the logs here are different - there is no probing, just stop attempt
> (with proper exit code) ==
> 
> == reporting not existing resource ==
> 
> May 19 13:15:29 [6293] olegdbx39-vm000003       lrmd:    debug:
> process_lrmd_message:    Processed lrmd_rsc_info operation from
> d2c8a871-410a-4006-be52-ee684c0a5f38: rc=0, reply=0, notify=0
> May 19 13:15:29 [6293] olegdbx39-vm000003       lrmd:    debug:
> process_lrmd_message:    Processed lrmd_rsc_exec operation from
> d2c8a871-410a-4006-be52-ee684c0a5f38: rc=10, reply=1, notify=0
> May 19 13:15:29 [6293] olegdbx39-vm000003       lrmd:    debug:
> log_execute:    executing - rsc:dbx_first_datas action:monitor call_id:10
> May 19 13:15:29 [6293] olegdbx39-vm000003       lrmd:     info:
> process_lrmd_get_rsc_info:    Resource 'dbx_head_head' not found (2
> active resources)

FYI, this is normal. It just means the lrmd hasn't been asked to do
anything with this resource before, so it's not found in the lrmd's memory.

> May 19 13:15:29 [6293] olegdbx39-vm000003       lrmd:    debug:
> process_lrmd_message:    Processed lrmd_rsc_info operation from
> d2c8a871-410a-4006-be52-ee684c0a5f38: rc=0, reply=0, notify=0
> May 19 13:15:29 [6293] olegdbx39-vm000003       lrmd:     info:
> process_lrmd_rsc_register:    Added 'dbx_head_head' to the rsc list (3
> active resources)
> May 19 13:15:40 [6293] olegdbx39-vm000003       lrmd:    debug:
> process_lrmd_message:    Processed lrmd_rsc_register operation from
> d2c8a871-410a-4006-be52-ee684c0a5f38: rc=0, reply=1, notify=1
> May 19 13:15:29 [6292] olegdbx39-vm000003 stonith-ng:    debug:
> xml_patch_version_check:    Can apply patch 2.5.9 to 2.5.8
> May 19 13:15:40 [6292] olegdbx39-vm000003 stonith-ng:    debug:
> xml_patch_version_check:    Can apply patch 2.5.10 to 2.5.9
> May 19 13:15:40 [6292] olegdbx39-vm000003 stonith-ng:    debug:
> xml_patch_version_check:    Can apply patch 2.5.11 to 2.5.10
> May 19 13:15:40 [6292] olegdbx39-vm000003 stonith-ng:    debug:
> xml_patch_version_check:    Can apply patch 2.5.12 to 2.5.11
> May 19 13:15:40 [6293] olegdbx39-vm000003       lrmd:    debug:
> process_lrmd_message:    Processed lrmd_rsc_info operation from
> d2c8a871-410a-4006-be52-ee684c0a5f38: rc=0, reply=0, notify=0
> 
> 
> == and much later it tries to stop the resource on node3, even though it
> has never been started, nor probed here ==

At any given time, one node in the cluster has been elected DC. The DC
schedules all actions. So, the logs on the DC (not this node) will have
the explanation for why there was no probe, and why there was a stop.

> 
> May 19 13:15:45 [6292] olegdbx39-vm000003 stonith-ng:    debug:
> xml_patch_version_check:    Can apply patch 2.5.59 to 2.5.58
> May 19 13:15:45 [6293] olegdbx39-vm000003       lrmd:    debug:
> process_lrmd_message:    Processed lrmd_rsc_info operation from
> 069e399c-6c10-47cf-b55f-de46985562b3: rc=0, reply=0, notify=0
> May 19 13:15:45 [6293] olegdbx39-vm000003       lrmd:    debug:
> process_lrmd_message:    Processed lrmd_rsc_exec operation from
> 069e399c-6c10-47cf-b55f-de46985562b3: rc=77, reply=1, notify=0
> May 19 13:15:45 [6293] olegdbx39-vm000003       lrmd:     info:
> log_execute:    executing - rsc:dbx_head_head action:stop call_id:77
> head.ocf.sh <http://head.ocf.sh>(dbx_head_head)[7202]:   
> 2017/05/19_13:15:45 INFO: head_stop: Stopping resource
> head.ocf.sh <http://head.ocf.sh>(dbx_head_head)[7202]:   
> 2017/05/19_13:15:47 DEBUG: head_monitor: return 7
> head.ocf.sh <http://head.ocf.sh>(dbx_head_head)[7202]:   
> 2017/05/19_13:15:47 INFO: Resource is already stopped
> May 19 13:15:47 [6293] olegdbx39-vm000003       lrmd:    debug:
> operation_finished:    dbx_head_head_stop_0:7202 - exited with rc=0
> May 19 13:15:47 [6293] olegdbx39-vm000003       lrmd:    debug:
> operation_finished:    dbx_head_head_stop_0:7202:stderr [ -- empty -- ]
> May 19 13:15:47 [6293] olegdbx39-vm000003       lrmd:    debug:
> operation_finished:    dbx_head_head_stop_0:7202:stdout [ -- empty -- ]
> May 19 13:15:47 [6293] olegdbx39-vm000003       lrmd:     info:
> log_finished:    finished - rsc:dbx_head_head action:stop call_id:77
> pid:7202 exit-code:0 exec-time:1359ms queue-time:0ms
> May 19 13:15:47 [6292] olegdbx39-vm000003 stonith-ng:    debug:
> xml_patch_version_check:    Can apply patch 2.5.60 to 2.5.59
> 
> 
> 
> 
> node1)
> == reports an error on node3, even though the resource has never been
> even configured, probed nor started here ==
> 
> The error is reported on node1 that it happened on node3 (acquired from
> 'crm_mon'):
> 
> Failed Actions:
> * dbx_head_head_monitor_0 on olegdbx39-vm000003 'not configured' (6):
> call=999999999, status=Error, exitreason='none',
>     last-rc-change='Fri May 19 13:15:40 2017', queued=0ms, exec=0ms

The combination of 'not configured' and call=999999999 has only one
possible source: the crmd could not register the resource with the lrmd.

Usually this means the resource definition in the CIB is inherently
invalid. However, it can also be reported when there is any kind of
error in the communication between the two daemons: the connection
dropped, the machine was out of memory, the communication timed out
(e.g. due to high load), etc.

The "Added 'dbx_head_head' to the rsc list" message means that the lrmd
did receive and complete the register request, so the problem occurred
sometime between there and crmd getting the result back.

This explains why you don't see a probe being executed. The crmd
couldn't register the resource with the lrmd, so it immediately failed
the probe on its side.

The root cause is a mystery at this point, but hopefully that gives you
some hints as to what to look for. Check the logs on the DC, and check
the host's system logs for any signs of trouble at that time.

> == Node1 logs "not configured" even though all return codes logged
> earlier are OK, below are logs for thread, which reports this error: ==
> 
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:     info:
> crm_xml_cleanup:    Cleaning up memory from libxml2
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:     info: main:   
> Starting crm_mon
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> qb_rb_open_2:    shm size:524301; real_size:528384; rb->word_size:132096
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> qb_rb_open_2:    shm size:524301; real_size:528384; rb->word_size:132096
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> qb_rb_open_2:    shm size:524301; real_size:528384; rb->word_size:132096
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> cib_native_signon_raw:    Connection to CIB successful
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> unpack_config:    STONITH timeout: 60000
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> unpack_config:    STONITH of failed nodes is disabled
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> unpack_config:    Concurrent fencing is disabled
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> unpack_config:    Stop all active resources: false
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> unpack_config:    Cluster is symmetric - resources can run anywhere by
> default
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> unpack_config:    Default stickiness: 0
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:   notice:
> unpack_config:    On loss of CCM Quorum: Ignore
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> unpack_config:    Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:     info:
> determine_online_status:    Node olegdbx39-vm000003 is online
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:     info:
> determine_online_status:    Node olegdbx39-vm-0 is online
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:     info:
> determine_online_status:    Node olegdbx39-vm000002 is online
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> determine_op_status:    dbx_head_head_monitor_0 on olegdbx39-vm000003
> returned 'not configured' (6) instead of the expected value: 'not
> running' (7)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:  warning:
> unpack_rsc_op_failure:    Processing failed op monitor for dbx_head_head
> on olegdbx39-vm000003: not configured (6)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    error:
> unpack_rsc_op:    Preventing dbx_head_head from re-starting anywhere:
> operation monitor failed 'not configured' (6)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> determine_op_status:    dbx_head_head_monitor_0 on olegdbx39-vm000003
> returned 'not configured' (6) instead of the expected value: 'not
> running' (7)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:  warning:
> unpack_rsc_op_failure:    Processing failed op monitor for dbx_head_head
> on olegdbx39-vm000003: not configured (6)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    error:
> unpack_rsc_op:    Preventing dbx_head_head from re-starting anywhere:
> operation monitor failed 'not configured' (6)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_bind_mounts_nodes on
> olegdbx39-vm000003 to dbx_bind_mounts_nodes:0
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> determine_op_status:    dbx_bind_mounts_nodes_monitor_0 on
> olegdbx39-vm000003 returned 'unknown' (189) instead of the expected
> value: 'not running' (7)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:  warning:
> unpack_rsc_op_failure:    Processing failed op monitor for
> dbx_bind_mounts_nodes:0 on olegdbx39-vm000003: unknown (189)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> determine_op_status:    dbx_bind_mounts_nodes_monitor_0 on
> olegdbx39-vm000003 returned 'unknown' (189) instead of the expected
> value: 'not running' (7)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:  warning:
> unpack_rsc_op_failure:    Processing failed op monitor for
> dbx_bind_mounts_nodes:0 on olegdbx39-vm000003: unknown (189)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_swap_nodes on
> olegdbx39-vm000003 to dbx_swap_nodes:0
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> determine_op_status:    dbx_swap_nodes_monitor_0 on olegdbx39-vm000003
> returned 'unknown' (189) instead of the expected value: 'not running' (7)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:  warning:
> unpack_rsc_op_failure:    Processing failed op monitor for
> dbx_swap_nodes:0 on olegdbx39-vm000003: unknown (189)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_nfs_nodes on
> olegdbx39-vm000003 to dbx_nfs_nodes:0
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> determine_op_status:    dbx_nfs_nodes_monitor_0 on olegdbx39-vm000003
> returned 'unknown' (189) instead of the expected value: 'not running' (7)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:  warning:
> unpack_rsc_op_failure:    Processing failed op monitor for
> dbx_nfs_nodes:0 on olegdbx39-vm000003: unknown (189)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_first_datas on
> olegdbx39-vm000003 to dbx_first_datas:0
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> determine_op_status:    dbx_first_datas_monitor_0 on olegdbx39-vm000003
> returned 'unknown' (189) instead of the expected value: 'not running' (7)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:  warning:
> unpack_rsc_op_failure:    Processing failed op monitor for
> dbx_first_datas:0 on olegdbx39-vm000003: unknown (189)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> determine_op_status:    dbx_first_datas_monitor_0 on olegdbx39-vm000003
> returned 'unknown' (189) instead of the expected value: 'not running' (7)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:  warning:
> unpack_rsc_op_failure:    Processing failed op monitor for
> dbx_first_datas:0 on olegdbx39-vm000003: unknown (189)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> determine_op_status:    dbx_first_head_monitor_0 on olegdbx39-vm000003
> returned 'unknown' (189) instead of the expected value: 'not running' (7)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:  warning:
> unpack_rsc_op_failure:    Processing failed op monitor for
> dbx_first_head on olegdbx39-vm000003: unknown (189)
> May 19 13:15:42 [6872] olegdbx39-vm-0 stonith-ng:    debug:
> xml_patch_version_check:    Can apply patch 2.5.47 to 2.5.46
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> determine_op_status:    dbx_first_head_monitor_0 on olegdbx39-vm000003
> returned 'unknown' (189) instead of the expected value: 'not running' (7)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:  warning:
> unpack_rsc_op_failure:    Processing failed op monitor for
> dbx_first_head on olegdbx39-vm000003: unknown (189)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_mounts_nodes on
> olegdbx39-vm000003 to dbx_mounts_nodes:0
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> determine_op_status:    dbx_mounts_nodes_monitor_0 on olegdbx39-vm000003
> returned 'unknown' (189) instead of the expected value: 'not running' (7)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:  warning:
> unpack_rsc_op_failure:    Processing failed op monitor for
> dbx_mounts_nodes:0 on olegdbx39-vm000003: unknown (189)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> determine_op_status:    dbx_mounts_nodes_monitor_0 on olegdbx39-vm000003
> returned 'unknown' (189) instead of the expected value: 'not running' (7)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:  warning:
> unpack_rsc_op_failure:    Processing failed op monitor for
> dbx_mounts_nodes:0 on olegdbx39-vm000003: unknown (189)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_nfs_mounts_datas on
> olegdbx39-vm000003 to dbx_nfs_mounts_datas:0
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> determine_op_status:    dbx_nfs_mounts_datas_monitor_0 on
> olegdbx39-vm000003 returned 'unknown' (189) instead of the expected
> value: 'not running' (7)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:  warning:
> unpack_rsc_op_failure:    Processing failed op monitor for
> dbx_nfs_mounts_datas:0 on olegdbx39-vm000003: unknown (189)
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_ready_primary on
> olegdbx39-vm000003 to dbx_ready_primary:0
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_first_datas on
> olegdbx39-vm-0 to dbx_first_datas:1
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_swap_nodes on
> olegdbx39-vm-0 to dbx_swap_nodes:0
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_mounts_nodes on
> olegdbx39-vm-0 to dbx_mounts_nodes:1
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_bind_mounts_nodes on
> olegdbx39-vm-0 to dbx_bind_mounts_nodes:1
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_nfs_mounts_datas on
> olegdbx39-vm-0 to dbx_nfs_mounts_datas:0
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_nfs_nodes on
> olegdbx39-vm-0 to dbx_nfs_nodes:0
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_ready_primary on
> olegdbx39-vm-0 to dbx_ready_primary:0
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_first_datas on
> olegdbx39-vm000002 to dbx_first_datas:1
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_swap_nodes on
> olegdbx39-vm000002 to dbx_swap_nodes:0
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_nfs_mounts_datas on
> olegdbx39-vm000002 to dbx_nfs_mounts_datas:0
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_mounts_nodes on
> olegdbx39-vm000002 to dbx_mounts_nodes:1
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_bind_mounts_nodes on
> olegdbx39-vm000002 to dbx_bind_mounts_nodes:1
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_nfs_nodes on
> olegdbx39-vm000002 to dbx_nfs_nodes:0
> May 19 13:15:42 [8114] olegdbx39-vm-0    crm_mon:    debug:
> find_anonymous_clone:    Internally renamed dbx_ready_primary on
> olegdbx39-vm000002 to dbx_ready_primary:0
> [...]
> 
> 
> Thanks in advance,
> 
> 
> On Thu, May 18, 2017 at 4:37 PM, Ken Gaillot <kgaillot at redhat.com
> <mailto:kgaillot at redhat.com>> wrote:
> 
>     On 05/17/2017 06:10 PM, Radoslaw Garbacz wrote:
>     > Hi,
>     >
>     > I have a question regarding '<lrm_resource><lrm_rsc_op> 'op-status
>     > attribute getting value 4.
>     >
>     > In my case I have a strange behavior, when resources get those "monitor"
>     > operation entries in the CIB with op-status=4, and they do not seem to
>     > be called (exec-time=0).
>     >
>     > What does 'op-status' = 4 mean?
> 
>     The action had an error status
> 
>     >
>     > I would appreciate some elaboration regarding this, since this is
>     > interpreted by pacemaker as an error, which causes logs:
>     > crm_mon:    error: unpack_rsc_op:    Preventing dbx_head_head from
>     > re-starting anywhere: operation monitor failed 'not configured' (6)
> 
>     The rc-code="6" is the more interesting number; it's the result returned
>     by the resource agent. As you can see above, it means "not configured".
>     What that means exactly is up to the resource agent's interpretation.
> 
>     > and I am pretty sure the resource agent was not called (no logs,
>     > exec-time=0)
> 
>     Normally this could only come from the resource agent.
> 
>     However there are two cases where pacemaker generates this error itself:
>     if the resource definition in the CIB is invalid; and if your version of
>     pacemaker was compiled with support for reading sensitive parameter
>     values from a file, but that file could not be read.
> 
>     It doesn't sound like your case is either one of those though, since
>     they would prevent the resource from even starting. Most likely it's
>     coming from the resource agent. I'd look at the resource agent source
>     code and see where it can return OCF_ERR_CONFIGURED.
> 
>     > There are two aspects of this:
>     >
>     > 1) harmless (pacemaker seems to not bother about it), which I guess
>     > indicates cancelled monitoring operations:
>     > op-status=4, rc-code=189
> 
>     This error means the connection between the crmd and lrmd daemons was
>     lost -- most commonly, that shows up for operations that were pending at
>     shutdown.
> 
>     >
>     > * Example:
>     > <lrm_rsc_op id="dbx_first_datas_last_failure_0"
>     > operation_key="dbx_first_datas_monitor_0" operation="monitor"
>     > crm-debug-origin="do_update_resource" crm_feature_set="3.0.12"
>     > transition-key="38:0:7:c8b63d9d-9c70-4f99-aa1b-e993de6e4739"
>     > transition-magic="4:189;38:0:7:c8b63d9d-9c70-4f99-aa1b-e993de6e4739"
>     > on_node="olegdbx61-vm000001" call-id="10" rc-code="189" op-status="4"
>     > interval="0" last-run="1495057378" last-rc-change="1495057378"
>     > exec-time="0" queue-time="0"
>     op-digest="f6bd1386a336e8e6ee25ecb651a9efb6"/>
>     >
>     >
>     > 2) error level one (op-status=4, rc-code=6), which generates logs:
>     > crm_mon:    error: unpack_rsc_op:    Preventing dbx_head_head from
>     > re-starting anywhere: operation monitor failed 'not configured' (6)
>     >
>     > * Example:
>     > <lrm_rsc_op id="dbx_head_head_last_failure_0"
>     > operation_key="dbx_head_head_monitor_0" operation="monitor"
>     > crm-debug-origin="do_update_resource" crm_feature_set="3.0.12"
>     > transition-key="39:0:7:c8b63d9d-9c70-4f99-aa1b-e993de6e4739"
>     > transition-magic="4:6;39:0:7:c8b63d9d-9c70-4f99-aa1b-e993de6e4739"
>     > on_node="olegdbx61-vm000001" call-id="999999999" rc-code="6"
>     > op-status="4" interval="0" last-run="1495057389"
>     > last-rc-change="1495057389" exec-time="0" queue-time="0"
>     > op-digest="60cdc9db1c5b77e8dba698d3d0c8cda8"/>
>     >
>     >
>     > Could it be some hardware (VM hyperviser) issue?
>     >
>     >
>     > Thanks in advance,
>     >
>     > --
>     > Best Regards,
>     >
>     > Radoslaw Garbacz
>     > XtremeData Incorporated