[ClusterLabs] CIB: op-status=4 ?
Radoslaw Garbacz
radoslaw.garbacz at xtremedatainc.com
Tue May 23 14:15:46 EDT 2017
Thanks, your explanation is very helpful considering that it happens rarely
and only on the first boot after VMs are created.
On Mon, May 22, 2017 at 9:34 PM, Ken Gaillot <kgaillot at redhat.com> wrote:
> On 05/19/2017 02:03 PM, Radoslaw Garbacz wrote:
> > Hi,
> >
> > I have some more information regarding this issue (pacemaker debug logs).
> >
> > Firstly, I have not mentioned probably important facts:
> > 1) this happen rarely
> > 2) this happen only on first boot
> > 3) turning on debug in corosync/pacemaker significantly reduced
> > frequency of this happening, i.e. without debug every ~7 cluster
> > creation, with debug every ~66 cluster creation.
> >
> > This is a 3 nodes cluster on Azure Cloud and it does not seem like the
> > resource agent is reporting an error, because all nodes logs proper "not
> > running" results:
> >
> > The resource in question name is "dbx_head_head".
> >
> > node1)
> > May 19 13:15:41 [6872] olegdbx39-vm-0 stonith-ng: debug:
> > xml_patch_version_check: Can apply patch 2.5.32 to 2.5.31
> > head.ocf.sh <http://head.ocf.sh>(dbx_head_head)[7717]:
> > 2017/05/19_13:15:42 DEBUG: head_monitor: return 7
> > May 19 13:15:42 [6873] olegdbx39-vm-0 lrmd: debug:
> > operation_finished: dbx_head_head_monitor_0:7717 - exited with rc=7
> > May 19 13:15:42 [6873] olegdbx39-vm-0 lrmd: debug:
> > operation_finished: dbx_head_head_monitor_0:7717:stderr [ -- empty
> -- ]
> > May 19 13:15:42 [6873] olegdbx39-vm-0 lrmd: debug:
> > operation_finished: dbx_head_head_monitor_0:7717:stdout [ -- empty
> -- ]
> > May 19 13:15:42 [6873] olegdbx39-vm-0 lrmd: debug:
> > log_finished: finished - rsc:dbx_head_head action:monitor call_id:14
> > pid:7717 exit-code:7 exec-time:932ms queue-time:0ms
> >
> >
> > node2)
> > May 19 13:15:41 [6266] olegdbx39-vm000002 stonith-ng: debug:
> > xml_patch_version_check: Can apply patch 2.5.31 to 2.5.30
> > head.ocf.sh <http://head.ocf.sh>(dbx_head_head)[6485]:
> > 2017/05/19_13:15:41 DEBUG: head_monitor: return 7
> > May 19 13:15:41 [6267] olegdbx39-vm000002 lrmd: debug:
> > operation_finished: dbx_head_head_monitor_0:6485 - exited with rc=7
> > May 19 13:15:41 [6267] olegdbx39-vm000002 lrmd: debug:
> > operation_finished: dbx_head_head_monitor_0:6485:stderr [ -- empty
> -- ]
> > May 19 13:15:41 [6267] olegdbx39-vm000002 lrmd: debug:
> > operation_finished: dbx_head_head_monitor_0:6485:stdout [ -- empty
> -- ]
> > May 19 13:15:41 [6267] olegdbx39-vm000002 lrmd: debug:
> > log_finished: finished - rsc:dbx_head_head action:monitor call_id:14
> > pid:6485 exit-code:7 exec-time:790ms queue-time:0ms
> > May 19 13:15:41 [6266] olegdbx39-vm000002 stonith-ng: debug:
> > xml_patch_version_check: Can apply patch 2.5.32 to 2.5.31
> > May 19 13:15:41 [6266] olegdbx39-vm000002 stonith-ng: debug:
> > xml_patch_version_check: Can apply patch 2.5.33 to 2.5.32
> >
> >
> > node3)
> > == the logs here are different - there is no probing, just stop attempt
> > (with proper exit code) ==
> >
> > == reporting not existing resource ==
> >
> > May 19 13:15:29 [6293] olegdbx39-vm000003 lrmd: debug:
> > process_lrmd_message: Processed lrmd_rsc_info operation from
> > d2c8a871-410a-4006-be52-ee684c0a5f38: rc=0, reply=0, notify=0
> > May 19 13:15:29 [6293] olegdbx39-vm000003 lrmd: debug:
> > process_lrmd_message: Processed lrmd_rsc_exec operation from
> > d2c8a871-410a-4006-be52-ee684c0a5f38: rc=10, reply=1, notify=0
> > May 19 13:15:29 [6293] olegdbx39-vm000003 lrmd: debug:
> > log_execute: executing - rsc:dbx_first_datas action:monitor call_id:10
> > May 19 13:15:29 [6293] olegdbx39-vm000003 lrmd: info:
> > process_lrmd_get_rsc_info: Resource 'dbx_head_head' not found (2
> > active resources)
>
> FYI, this is normal. It just means the lrmd hasn't been asked to do
> anything with this resource before, so it's not found in the lrmd's memory.
>
> > May 19 13:15:29 [6293] olegdbx39-vm000003 lrmd: debug:
> > process_lrmd_message: Processed lrmd_rsc_info operation from
> > d2c8a871-410a-4006-be52-ee684c0a5f38: rc=0, reply=0, notify=0
> > May 19 13:15:29 [6293] olegdbx39-vm000003 lrmd: info:
> > process_lrmd_rsc_register: Added 'dbx_head_head' to the rsc list (3
> > active resources)
> > May 19 13:15:40 [6293] olegdbx39-vm000003 lrmd: debug:
> > process_lrmd_message: Processed lrmd_rsc_register operation from
> > d2c8a871-410a-4006-be52-ee684c0a5f38: rc=0, reply=1, notify=1
> > May 19 13:15:29 [6292] olegdbx39-vm000003 stonith-ng: debug:
> > xml_patch_version_check: Can apply patch 2.5.9 to 2.5.8
> > May 19 13:15:40 [6292] olegdbx39-vm000003 stonith-ng: debug:
> > xml_patch_version_check: Can apply patch 2.5.10 to 2.5.9
> > May 19 13:15:40 [6292] olegdbx39-vm000003 stonith-ng: debug:
> > xml_patch_version_check: Can apply patch 2.5.11 to 2.5.10
> > May 19 13:15:40 [6292] olegdbx39-vm000003 stonith-ng: debug:
> > xml_patch_version_check: Can apply patch 2.5.12 to 2.5.11
> > May 19 13:15:40 [6293] olegdbx39-vm000003 lrmd: debug:
> > process_lrmd_message: Processed lrmd_rsc_info operation from
> > d2c8a871-410a-4006-be52-ee684c0a5f38: rc=0, reply=0, notify=0
> >
> >
> > == and much later it tries to stop the resource on node3, even though it
> > has never been started, nor probed here ==
>
> At any given time, one node in the cluster has been elected DC. The DC
> schedules all actions. So, the logs on the DC (not this node) will have
> the explanation for why there was no probe, and why there was a stop.
>
> >
> > May 19 13:15:45 [6292] olegdbx39-vm000003 stonith-ng: debug:
> > xml_patch_version_check: Can apply patch 2.5.59 to 2.5.58
> > May 19 13:15:45 [6293] olegdbx39-vm000003 lrmd: debug:
> > process_lrmd_message: Processed lrmd_rsc_info operation from
> > 069e399c-6c10-47cf-b55f-de46985562b3: rc=0, reply=0, notify=0
> > May 19 13:15:45 [6293] olegdbx39-vm000003 lrmd: debug:
> > process_lrmd_message: Processed lrmd_rsc_exec operation from
> > 069e399c-6c10-47cf-b55f-de46985562b3: rc=77, reply=1, notify=0
> > May 19 13:15:45 [6293] olegdbx39-vm000003 lrmd: info:
> > log_execute: executing - rsc:dbx_head_head action:stop call_id:77
> > head.ocf.sh <http://head.ocf.sh>(dbx_head_head)[7202]:
> > 2017/05/19_13:15:45 INFO: head_stop: Stopping resource
> > head.ocf.sh <http://head.ocf.sh>(dbx_head_head)[7202]:
> > 2017/05/19_13:15:47 DEBUG: head_monitor: return 7
> > head.ocf.sh <http://head.ocf.sh>(dbx_head_head)[7202]:
> > 2017/05/19_13:15:47 INFO: Resource is already stopped
> > May 19 13:15:47 [6293] olegdbx39-vm000003 lrmd: debug:
> > operation_finished: dbx_head_head_stop_0:7202 - exited with rc=0
> > May 19 13:15:47 [6293] olegdbx39-vm000003 lrmd: debug:
> > operation_finished: dbx_head_head_stop_0:7202:stderr [ -- empty -- ]
> > May 19 13:15:47 [6293] olegdbx39-vm000003 lrmd: debug:
> > operation_finished: dbx_head_head_stop_0:7202:stdout [ -- empty -- ]
> > May 19 13:15:47 [6293] olegdbx39-vm000003 lrmd: info:
> > log_finished: finished - rsc:dbx_head_head action:stop call_id:77
> > pid:7202 exit-code:0 exec-time:1359ms queue-time:0ms
> > May 19 13:15:47 [6292] olegdbx39-vm000003 stonith-ng: debug:
> > xml_patch_version_check: Can apply patch 2.5.60 to 2.5.59
> >
> >
> >
> >
> > node1)
> > == reports an error on node3, even though the resource has never been
> > even configured, probed nor started here ==
> >
> > The error is reported on node1 that it happened on node3 (acquired from
> > 'crm_mon'):
> >
> > Failed Actions:
> > * dbx_head_head_monitor_0 on olegdbx39-vm000003 'not configured' (6):
> > call=999999999, status=Error, exitreason='none',
> > last-rc-change='Fri May 19 13:15:40 2017', queued=0ms, exec=0ms
>
> The combination of 'not configured' and call=999999999 has only one
> possible source: the crmd could not register the resource with the lrmd.
>
> Usually this means the resource definition in the CIB is inherently
> invalid. However, it can also be reported when there is any kind of
> error in the communication between the two daemons: the connection
> dropped, the machine was out of memory, the communication timed out
> (e.g. due to high load), etc.
>
> The "Added 'dbx_head_head' to the rsc list" message means that the lrmd
> did receive and complete the register request, so the problem occurred
> sometime between there and crmd getting the result back.
>
> This explains why you don't see a probe being executed. The crmd
> couldn't register the resource with the lrmd, so it immediately failed
> the probe on its side.
>
> The root cause is a mystery at this point, but hopefully that gives you
> some hints as to what to look for. Check the logs on the DC, and check
> the host's system logs for any signs of trouble at that time.
>
> > == Node1 logs "not configured" even though all return codes logged
> > earlier are OK, below are logs for thread, which reports this error: ==
> >
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: info:
> > crm_xml_cleanup: Cleaning up memory from libxml2
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: info: main:
> > Starting crm_mon
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > cib_native_signon_raw: Connection to CIB successful
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > unpack_config: STONITH timeout: 60000
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > unpack_config: STONITH of failed nodes is disabled
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > unpack_config: Concurrent fencing is disabled
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > unpack_config: Stop all active resources: false
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > unpack_config: Cluster is symmetric - resources can run anywhere by
> > default
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > unpack_config: Default stickiness: 0
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: notice:
> > unpack_config: On loss of CCM Quorum: Ignore
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green'
> = 0
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: info:
> > determine_online_status: Node olegdbx39-vm000003 is online
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: info:
> > determine_online_status: Node olegdbx39-vm-0 is online
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: info:
> > determine_online_status: Node olegdbx39-vm000002 is online
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > determine_op_status: dbx_head_head_monitor_0 on olegdbx39-vm000003
> > returned 'not configured' (6) instead of the expected value: 'not
> > running' (7)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:
> > unpack_rsc_op_failure: Processing failed op monitor for dbx_head_head
> > on olegdbx39-vm000003: not configured (6)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: error:
> > unpack_rsc_op: Preventing dbx_head_head from re-starting anywhere:
> > operation monitor failed 'not configured' (6)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > determine_op_status: dbx_head_head_monitor_0 on olegdbx39-vm000003
> > returned 'not configured' (6) instead of the expected value: 'not
> > running' (7)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:
> > unpack_rsc_op_failure: Processing failed op monitor for dbx_head_head
> > on olegdbx39-vm000003: not configured (6)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: error:
> > unpack_rsc_op: Preventing dbx_head_head from re-starting anywhere:
> > operation monitor failed 'not configured' (6)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_bind_mounts_nodes on
> > olegdbx39-vm000003 to dbx_bind_mounts_nodes:0
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > determine_op_status: dbx_bind_mounts_nodes_monitor_0 on
> > olegdbx39-vm000003 returned 'unknown' (189) instead of the expected
> > value: 'not running' (7)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:
> > unpack_rsc_op_failure: Processing failed op monitor for
> > dbx_bind_mounts_nodes:0 on olegdbx39-vm000003: unknown (189)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > determine_op_status: dbx_bind_mounts_nodes_monitor_0 on
> > olegdbx39-vm000003 returned 'unknown' (189) instead of the expected
> > value: 'not running' (7)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:
> > unpack_rsc_op_failure: Processing failed op monitor for
> > dbx_bind_mounts_nodes:0 on olegdbx39-vm000003: unknown (189)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_swap_nodes on
> > olegdbx39-vm000003 to dbx_swap_nodes:0
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > determine_op_status: dbx_swap_nodes_monitor_0 on olegdbx39-vm000003
> > returned 'unknown' (189) instead of the expected value: 'not running' (7)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:
> > unpack_rsc_op_failure: Processing failed op monitor for
> > dbx_swap_nodes:0 on olegdbx39-vm000003: unknown (189)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_nfs_nodes on
> > olegdbx39-vm000003 to dbx_nfs_nodes:0
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > determine_op_status: dbx_nfs_nodes_monitor_0 on olegdbx39-vm000003
> > returned 'unknown' (189) instead of the expected value: 'not running' (7)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:
> > unpack_rsc_op_failure: Processing failed op monitor for
> > dbx_nfs_nodes:0 on olegdbx39-vm000003: unknown (189)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_first_datas on
> > olegdbx39-vm000003 to dbx_first_datas:0
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > determine_op_status: dbx_first_datas_monitor_0 on olegdbx39-vm000003
> > returned 'unknown' (189) instead of the expected value: 'not running' (7)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:
> > unpack_rsc_op_failure: Processing failed op monitor for
> > dbx_first_datas:0 on olegdbx39-vm000003: unknown (189)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > determine_op_status: dbx_first_datas_monitor_0 on olegdbx39-vm000003
> > returned 'unknown' (189) instead of the expected value: 'not running' (7)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:
> > unpack_rsc_op_failure: Processing failed op monitor for
> > dbx_first_datas:0 on olegdbx39-vm000003: unknown (189)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > determine_op_status: dbx_first_head_monitor_0 on olegdbx39-vm000003
> > returned 'unknown' (189) instead of the expected value: 'not running' (7)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:
> > unpack_rsc_op_failure: Processing failed op monitor for
> > dbx_first_head on olegdbx39-vm000003: unknown (189)
> > May 19 13:15:42 [6872] olegdbx39-vm-0 stonith-ng: debug:
> > xml_patch_version_check: Can apply patch 2.5.47 to 2.5.46
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > determine_op_status: dbx_first_head_monitor_0 on olegdbx39-vm000003
> > returned 'unknown' (189) instead of the expected value: 'not running' (7)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:
> > unpack_rsc_op_failure: Processing failed op monitor for
> > dbx_first_head on olegdbx39-vm000003: unknown (189)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_mounts_nodes on
> > olegdbx39-vm000003 to dbx_mounts_nodes:0
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > determine_op_status: dbx_mounts_nodes_monitor_0 on olegdbx39-vm000003
> > returned 'unknown' (189) instead of the expected value: 'not running' (7)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:
> > unpack_rsc_op_failure: Processing failed op monitor for
> > dbx_mounts_nodes:0 on olegdbx39-vm000003: unknown (189)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > determine_op_status: dbx_mounts_nodes_monitor_0 on olegdbx39-vm000003
> > returned 'unknown' (189) instead of the expected value: 'not running' (7)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:
> > unpack_rsc_op_failure: Processing failed op monitor for
> > dbx_mounts_nodes:0 on olegdbx39-vm000003: unknown (189)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_nfs_mounts_datas on
> > olegdbx39-vm000003 to dbx_nfs_mounts_datas:0
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > determine_op_status: dbx_nfs_mounts_datas_monitor_0 on
> > olegdbx39-vm000003 returned 'unknown' (189) instead of the expected
> > value: 'not running' (7)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:
> > unpack_rsc_op_failure: Processing failed op monitor for
> > dbx_nfs_mounts_datas:0 on olegdbx39-vm000003: unknown (189)
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_ready_primary on
> > olegdbx39-vm000003 to dbx_ready_primary:0
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_first_datas on
> > olegdbx39-vm-0 to dbx_first_datas:1
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_swap_nodes on
> > olegdbx39-vm-0 to dbx_swap_nodes:0
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_mounts_nodes on
> > olegdbx39-vm-0 to dbx_mounts_nodes:1
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_bind_mounts_nodes on
> > olegdbx39-vm-0 to dbx_bind_mounts_nodes:1
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_nfs_mounts_datas on
> > olegdbx39-vm-0 to dbx_nfs_mounts_datas:0
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_nfs_nodes on
> > olegdbx39-vm-0 to dbx_nfs_nodes:0
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_ready_primary on
> > olegdbx39-vm-0 to dbx_ready_primary:0
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_first_datas on
> > olegdbx39-vm000002 to dbx_first_datas:1
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_swap_nodes on
> > olegdbx39-vm000002 to dbx_swap_nodes:0
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_nfs_mounts_datas on
> > olegdbx39-vm000002 to dbx_nfs_mounts_datas:0
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_mounts_nodes on
> > olegdbx39-vm000002 to dbx_mounts_nodes:1
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_bind_mounts_nodes on
> > olegdbx39-vm000002 to dbx_bind_mounts_nodes:1
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_nfs_nodes on
> > olegdbx39-vm000002 to dbx_nfs_nodes:0
> > May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:
> > find_anonymous_clone: Internally renamed dbx_ready_primary on
> > olegdbx39-vm000002 to dbx_ready_primary:0
> > [...]
> >
> >
> > Thanks in advance,
> >
> >
> > On Thu, May 18, 2017 at 4:37 PM, Ken Gaillot <kgaillot at redhat.com
> > <mailto:kgaillot at redhat.com>> wrote:
> >
> > On 05/17/2017 06:10 PM, Radoslaw Garbacz wrote:
> > > Hi,
> > >
> > > I have a question regarding '<lrm_resource><lrm_rsc_op> 'op-status
> > > attribute getting value 4.
> > >
> > > In my case I have a strange behavior, when resources get those
> "monitor"
> > > operation entries in the CIB with op-status=4, and they do not
> seem to
> > > be called (exec-time=0).
> > >
> > > What does 'op-status' = 4 mean?
> >
> > The action had an error status
> >
> > >
> > > I would appreciate some elaboration regarding this, since this is
> > > interpreted by pacemaker as an error, which causes logs:
> > > crm_mon: error: unpack_rsc_op: Preventing dbx_head_head from
> > > re-starting anywhere: operation monitor failed 'not configured' (6)
> >
> > The rc-code="6" is the more interesting number; it's the result
> returned
> > by the resource agent. As you can see above, it means "not
> configured".
> > What that means exactly is up to the resource agent's interpretation.
> >
> > > and I am pretty sure the resource agent was not called (no logs,
> > > exec-time=0)
> >
> > Normally this could only come from the resource agent.
> >
> > However there are two cases where pacemaker generates this error
> itself:
> > if the resource definition in the CIB is invalid; and if your
> version of
> > pacemaker was compiled with support for reading sensitive parameter
> > values from a file, but that file could not be read.
> >
> > It doesn't sound like your case is either one of those though, since
> > they would prevent the resource from even starting. Most likely it's
> > coming from the resource agent. I'd look at the resource agent source
> > code and see where it can return OCF_ERR_CONFIGURED.
> >
> > > There are two aspects of this:
> > >
> > > 1) harmless (pacemaker seems to not bother about it), which I guess
> > > indicates cancelled monitoring operations:
> > > op-status=4, rc-code=189
> >
> > This error means the connection between the crmd and lrmd daemons was
> > lost -- most commonly, that shows up for operations that were
> pending at
> > shutdown.
> >
> > >
> > > * Example:
> > > <lrm_rsc_op id="dbx_first_datas_last_failure_0"
> > > operation_key="dbx_first_datas_monitor_0" operation="monitor"
> > > crm-debug-origin="do_update_resource" crm_feature_set="3.0.12"
> > > transition-key="38:0:7:c8b63d9d-9c70-4f99-aa1b-e993de6e4739"
> > > transition-magic="4:189;38:0:7:c8b63d9d-9c70-4f99-aa1b-
> e993de6e4739"
> > > on_node="olegdbx61-vm000001" call-id="10" rc-code="189"
> op-status="4"
> > > interval="0" last-run="1495057378" last-rc-change="1495057378"
> > > exec-time="0" queue-time="0"
> > op-digest="f6bd1386a336e8e6ee25ecb651a9efb6"/>
> > >
> > >
> > > 2) error level one (op-status=4, rc-code=6), which generates logs:
> > > crm_mon: error: unpack_rsc_op: Preventing dbx_head_head from
> > > re-starting anywhere: operation monitor failed 'not configured' (6)
> > >
> > > * Example:
> > > <lrm_rsc_op id="dbx_head_head_last_failure_0"
> > > operation_key="dbx_head_head_monitor_0" operation="monitor"
> > > crm-debug-origin="do_update_resource" crm_feature_set="3.0.12"
> > > transition-key="39:0:7:c8b63d9d-9c70-4f99-aa1b-e993de6e4739"
> > > transition-magic="4:6;39:0:7:c8b63d9d-9c70-4f99-aa1b-e993de6e4739"
> > > on_node="olegdbx61-vm000001" call-id="999999999" rc-code="6"
> > > op-status="4" interval="0" last-run="1495057389"
> > > last-rc-change="1495057389" exec-time="0" queue-time="0"
> > > op-digest="60cdc9db1c5b77e8dba698d3d0c8cda8"/>
> > >
> > >
> > > Could it be some hardware (VM hyperviser) issue?
> > >
> > >
> > > Thanks in advance,
> > >
> > > --
> > > Best Regards,
> > >
> > > Radoslaw Garbacz
> > > XtremeData Incorporated
>
--
Best Regards,
Radoslaw Garbacz
XtremeData Incorporated
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170523/340c879a/attachment-0003.html>
More information about the Users
mailing list