<div dir="ltr">Thanks, your explanation is very helpful considering that it happens rarely and only on the first boot after VMs are created.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, May 22, 2017 at 9:34 PM, Ken Gaillot <span dir="ltr"><<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 05/19/2017 02:03 PM, Radoslaw Garbacz wrote:<br>
> Hi,<br>
><br>
</span><span class="">> I have some more information regarding this issue (pacemaker debug logs).<br>
><br>
> Firstly, I have not mentioned probably important facts:<br>
> 1) this happen rarely<br>
> 2) this happen only on first boot<br>
> 3) turning on debug in corosync/pacemaker significantly reduced<br>
> frequency of this happening, i.e. without debug every ~7 cluster<br>
> creation, with debug every ~66 cluster creation.<br>
><br>
> This is a 3 nodes cluster on Azure Cloud and it does not seem like the<br>
> resource agent is reporting an error, because all nodes logs proper "not<br>
> running" results:<br>
><br>
> The resource in question name is "dbx_head_head".<br>
><br>
> node1)<br>
> May 19 13:15:41 [6872] olegdbx39-vm-0 stonith-ng: debug:<br>
> xml_patch_version_check: Can apply patch 2.5.32 to 2.5.31<br>
</span>> <a href="http://head.ocf.sh" rel="noreferrer" target="_blank">head.ocf.sh</a> <<a href="http://head.ocf.sh" rel="noreferrer" target="_blank">http://head.ocf.sh</a>>(dbx_head_<wbr>head)[7717]:<br>
<span class="">> 2017/05/19_13:15:42 DEBUG: head_monitor: return 7<br>
> May 19 13:15:42 [6873] olegdbx39-vm-0 lrmd: debug:<br>
> operation_finished: dbx_head_head_monitor_0:7717 - exited with rc=7<br>
> May 19 13:15:42 [6873] olegdbx39-vm-0 lrmd: debug:<br>
> operation_finished: dbx_head_head_monitor_0:7717:<wbr>stderr [ -- empty -- ]<br>
> May 19 13:15:42 [6873] olegdbx39-vm-0 lrmd: debug:<br>
> operation_finished: dbx_head_head_monitor_0:7717:<wbr>stdout [ -- empty -- ]<br>
> May 19 13:15:42 [6873] olegdbx39-vm-0 lrmd: debug:<br>
> log_finished: finished - rsc:dbx_head_head action:monitor call_id:14<br>
> pid:7717 exit-code:7 exec-time:932ms queue-time:0ms<br>
><br>
><br>
> node2)<br>
> May 19 13:15:41 [6266] olegdbx39-vm000002 stonith-ng: debug:<br>
> xml_patch_version_check: Can apply patch 2.5.31 to 2.5.30<br>
</span>> <a href="http://head.ocf.sh" rel="noreferrer" target="_blank">head.ocf.sh</a> <<a href="http://head.ocf.sh" rel="noreferrer" target="_blank">http://head.ocf.sh</a>>(dbx_head_<wbr>head)[6485]:<br>
<div><div class="h5">> 2017/05/19_13:15:41 DEBUG: head_monitor: return 7<br>
> May 19 13:15:41 [6267] olegdbx39-vm000002 lrmd: debug:<br>
> operation_finished: dbx_head_head_monitor_0:6485 - exited with rc=7<br>
> May 19 13:15:41 [6267] olegdbx39-vm000002 lrmd: debug:<br>
> operation_finished: dbx_head_head_monitor_0:6485:<wbr>stderr [ -- empty -- ]<br>
> May 19 13:15:41 [6267] olegdbx39-vm000002 lrmd: debug:<br>
> operation_finished: dbx_head_head_monitor_0:6485:<wbr>stdout [ -- empty -- ]<br>
> May 19 13:15:41 [6267] olegdbx39-vm000002 lrmd: debug:<br>
> log_finished: finished - rsc:dbx_head_head action:monitor call_id:14<br>
> pid:6485 exit-code:7 exec-time:790ms queue-time:0ms<br>
> May 19 13:15:41 [6266] olegdbx39-vm000002 stonith-ng: debug:<br>
> xml_patch_version_check: Can apply patch 2.5.32 to 2.5.31<br>
> May 19 13:15:41 [6266] olegdbx39-vm000002 stonith-ng: debug:<br>
> xml_patch_version_check: Can apply patch 2.5.33 to 2.5.32<br>
><br>
><br>
> node3)<br>
> == the logs here are different - there is no probing, just stop attempt<br>
> (with proper exit code) ==<br>
><br>
> == reporting not existing resource ==<br>
><br>
> May 19 13:15:29 [6293] olegdbx39-vm000003 lrmd: debug:<br>
> process_lrmd_message: Processed lrmd_rsc_info operation from<br>
> d2c8a871-410a-4006-be52-<wbr>ee684c0a5f38: rc=0, reply=0, notify=0<br>
> May 19 13:15:29 [6293] olegdbx39-vm000003 lrmd: debug:<br>
> process_lrmd_message: Processed lrmd_rsc_exec operation from<br>
> d2c8a871-410a-4006-be52-<wbr>ee684c0a5f38: rc=10, reply=1, notify=0<br>
> May 19 13:15:29 [6293] olegdbx39-vm000003 lrmd: debug:<br>
> log_execute: executing - rsc:dbx_first_datas action:monitor call_id:10<br>
> May 19 13:15:29 [6293] olegdbx39-vm000003 lrmd: info:<br>
> process_lrmd_get_rsc_info: Resource 'dbx_head_head' not found (2<br>
> active resources)<br>
<br>
</div></div>FYI, this is normal. It just means the lrmd hasn't been asked to do<br>
anything with this resource before, so it's not found in the lrmd's memory.<br>
<span class=""><br>
> May 19 13:15:29 [6293] olegdbx39-vm000003 lrmd: debug:<br>
> process_lrmd_message: Processed lrmd_rsc_info operation from<br>
> d2c8a871-410a-4006-be52-<wbr>ee684c0a5f38: rc=0, reply=0, notify=0<br>
> May 19 13:15:29 [6293] olegdbx39-vm000003 lrmd: info:<br>
> process_lrmd_rsc_register: Added 'dbx_head_head' to the rsc list (3<br>
> active resources)<br>
> May 19 13:15:40 [6293] olegdbx39-vm000003 lrmd: debug:<br>
> process_lrmd_message: Processed lrmd_rsc_register operation from<br>
> d2c8a871-410a-4006-be52-<wbr>ee684c0a5f38: rc=0, reply=1, notify=1<br>
> May 19 13:15:29 [6292] olegdbx39-vm000003 stonith-ng: debug:<br>
> xml_patch_version_check: Can apply patch 2.5.9 to 2.5.8<br>
> May 19 13:15:40 [6292] olegdbx39-vm000003 stonith-ng: debug:<br>
> xml_patch_version_check: Can apply patch 2.5.10 to 2.5.9<br>
> May 19 13:15:40 [6292] olegdbx39-vm000003 stonith-ng: debug:<br>
> xml_patch_version_check: Can apply patch 2.5.11 to 2.5.10<br>
> May 19 13:15:40 [6292] olegdbx39-vm000003 stonith-ng: debug:<br>
> xml_patch_version_check: Can apply patch 2.5.12 to 2.5.11<br>
> May 19 13:15:40 [6293] olegdbx39-vm000003 lrmd: debug:<br>
> process_lrmd_message: Processed lrmd_rsc_info operation from<br>
> d2c8a871-410a-4006-be52-<wbr>ee684c0a5f38: rc=0, reply=0, notify=0<br>
><br>
><br>
> == and much later it tries to stop the resource on node3, even though it<br>
> has never been started, nor probed here ==<br>
<br>
</span>At any given time, one node in the cluster has been elected DC. The DC<br>
schedules all actions. So, the logs on the DC (not this node) will have<br>
the explanation for why there was no probe, and why there was a stop.<br>
<span class=""><br>
><br>
> May 19 13:15:45 [6292] olegdbx39-vm000003 stonith-ng: debug:<br>
> xml_patch_version_check: Can apply patch 2.5.59 to 2.5.58<br>
> May 19 13:15:45 [6293] olegdbx39-vm000003 lrmd: debug:<br>
> process_lrmd_message: Processed lrmd_rsc_info operation from<br>
> 069e399c-6c10-47cf-b55f-<wbr>de46985562b3: rc=0, reply=0, notify=0<br>
> May 19 13:15:45 [6293] olegdbx39-vm000003 lrmd: debug:<br>
> process_lrmd_message: Processed lrmd_rsc_exec operation from<br>
> 069e399c-6c10-47cf-b55f-<wbr>de46985562b3: rc=77, reply=1, notify=0<br>
> May 19 13:15:45 [6293] olegdbx39-vm000003 lrmd: info:<br>
> log_execute: executing - rsc:dbx_head_head action:stop call_id:77<br>
</span>> <a href="http://head.ocf.sh" rel="noreferrer" target="_blank">head.ocf.sh</a> <<a href="http://head.ocf.sh" rel="noreferrer" target="_blank">http://head.ocf.sh</a>>(dbx_head_<wbr>head)[7202]:<br>
<span class="">> 2017/05/19_13:15:45 INFO: head_stop: Stopping resource<br>
</span>> <a href="http://head.ocf.sh" rel="noreferrer" target="_blank">head.ocf.sh</a> <<a href="http://head.ocf.sh" rel="noreferrer" target="_blank">http://head.ocf.sh</a>>(dbx_head_<wbr>head)[7202]:<br>
<span class="">> 2017/05/19_13:15:47 DEBUG: head_monitor: return 7<br>
</span>> <a href="http://head.ocf.sh" rel="noreferrer" target="_blank">head.ocf.sh</a> <<a href="http://head.ocf.sh" rel="noreferrer" target="_blank">http://head.ocf.sh</a>>(dbx_head_<wbr>head)[7202]:<br>
<span class="">> 2017/05/19_13:15:47 INFO: Resource is already stopped<br>
> May 19 13:15:47 [6293] olegdbx39-vm000003 lrmd: debug:<br>
> operation_finished: dbx_head_head_stop_0:7202 - exited with rc=0<br>
> May 19 13:15:47 [6293] olegdbx39-vm000003 lrmd: debug:<br>
> operation_finished: dbx_head_head_stop_0:7202:<wbr>stderr [ -- empty -- ]<br>
> May 19 13:15:47 [6293] olegdbx39-vm000003 lrmd: debug:<br>
> operation_finished: dbx_head_head_stop_0:7202:<wbr>stdout [ -- empty -- ]<br>
> May 19 13:15:47 [6293] olegdbx39-vm000003 lrmd: info:<br>
> log_finished: finished - rsc:dbx_head_head action:stop call_id:77<br>
> pid:7202 exit-code:0 exec-time:1359ms queue-time:0ms<br>
> May 19 13:15:47 [6292] olegdbx39-vm000003 stonith-ng: debug:<br>
> xml_patch_version_check: Can apply patch 2.5.60 to 2.5.59<br>
><br>
><br>
><br>
><br>
> node1)<br>
> == reports an error on node3, even though the resource has never been<br>
> even configured, probed nor started here ==<br>
><br>
> The error is reported on node1 that it happened on node3 (acquired from<br>
> 'crm_mon'):<br>
><br>
> Failed Actions:<br>
> * dbx_head_head_monitor_0 on olegdbx39-vm000003 'not configured' (6):<br>
> call=999999999, status=Error, exitreason='none',<br>
> last-rc-change='Fri May 19 13:15:40 2017', queued=0ms, exec=0ms<br>
<br>
</span>The combination of 'not configured' and call=999999999 has only one<br>
possible source: the crmd could not register the resource with the lrmd.<br>
<br>
Usually this means the resource definition in the CIB is inherently<br>
invalid. However, it can also be reported when there is any kind of<br>
error in the communication between the two daemons: the connection<br>
dropped, the machine was out of memory, the communication timed out<br>
(e.g. due to high load), etc.<br>
<br>
The "Added 'dbx_head_head' to the rsc list" message means that the lrmd<br>
did receive and complete the register request, so the problem occurred<br>
sometime between there and crmd getting the result back.<br>
<br>
This explains why you don't see a probe being executed. The crmd<br>
couldn't register the resource with the lrmd, so it immediately failed<br>
the probe on its side.<br>
<br>
The root cause is a mystery at this point, but hopefully that gives you<br>
some hints as to what to look for. Check the logs on the DC, and check<br>
the host's system logs for any signs of trouble at that time.<br>
<div class="HOEnZb"><div class="h5"><br>
> == Node1 logs "not configured" even though all return codes logged<br>
> earlier are OK, below are logs for thread, which reports this error: ==<br>
><br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: info:<br>
> crm_xml_cleanup: Cleaning up memory from libxml2<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: info: main:<br>
> Starting crm_mon<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> cib_native_signon_raw: Connection to CIB successful<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> unpack_config: STONITH timeout: 60000<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> unpack_config: STONITH of failed nodes is disabled<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> unpack_config: Concurrent fencing is disabled<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> unpack_config: Stop all active resources: false<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> unpack_config: Cluster is symmetric - resources can run anywhere by<br>
> default<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> unpack_config: Default stickiness: 0<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: notice:<br>
> unpack_config: On loss of CCM Quorum: Ignore<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: info:<br>
> determine_online_status: Node olegdbx39-vm000003 is online<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: info:<br>
> determine_online_status: Node olegdbx39-vm-0 is online<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: info:<br>
> determine_online_status: Node olegdbx39-vm000002 is online<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> determine_op_status: dbx_head_head_monitor_0 on olegdbx39-vm000003<br>
> returned 'not configured' (6) instead of the expected value: 'not<br>
> running' (7)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:<br>
> unpack_rsc_op_failure: Processing failed op monitor for dbx_head_head<br>
> on olegdbx39-vm000003: not configured (6)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: error:<br>
> unpack_rsc_op: Preventing dbx_head_head from re-starting anywhere:<br>
> operation monitor failed 'not configured' (6)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> determine_op_status: dbx_head_head_monitor_0 on olegdbx39-vm000003<br>
> returned 'not configured' (6) instead of the expected value: 'not<br>
> running' (7)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:<br>
> unpack_rsc_op_failure: Processing failed op monitor for dbx_head_head<br>
> on olegdbx39-vm000003: not configured (6)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: error:<br>
> unpack_rsc_op: Preventing dbx_head_head from re-starting anywhere:<br>
> operation monitor failed 'not configured' (6)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_bind_mounts_nodes on<br>
> olegdbx39-vm000003 to dbx_bind_mounts_nodes:0<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> determine_op_status: dbx_bind_mounts_nodes_monitor_<wbr>0 on<br>
> olegdbx39-vm000003 returned 'unknown' (189) instead of the expected<br>
> value: 'not running' (7)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:<br>
> unpack_rsc_op_failure: Processing failed op monitor for<br>
> dbx_bind_mounts_nodes:0 on olegdbx39-vm000003: unknown (189)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> determine_op_status: dbx_bind_mounts_nodes_monitor_<wbr>0 on<br>
> olegdbx39-vm000003 returned 'unknown' (189) instead of the expected<br>
> value: 'not running' (7)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:<br>
> unpack_rsc_op_failure: Processing failed op monitor for<br>
> dbx_bind_mounts_nodes:0 on olegdbx39-vm000003: unknown (189)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_swap_nodes on<br>
> olegdbx39-vm000003 to dbx_swap_nodes:0<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> determine_op_status: dbx_swap_nodes_monitor_0 on olegdbx39-vm000003<br>
> returned 'unknown' (189) instead of the expected value: 'not running' (7)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:<br>
> unpack_rsc_op_failure: Processing failed op monitor for<br>
> dbx_swap_nodes:0 on olegdbx39-vm000003: unknown (189)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_nfs_nodes on<br>
> olegdbx39-vm000003 to dbx_nfs_nodes:0<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> determine_op_status: dbx_nfs_nodes_monitor_0 on olegdbx39-vm000003<br>
> returned 'unknown' (189) instead of the expected value: 'not running' (7)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:<br>
> unpack_rsc_op_failure: Processing failed op monitor for<br>
> dbx_nfs_nodes:0 on olegdbx39-vm000003: unknown (189)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_first_datas on<br>
> olegdbx39-vm000003 to dbx_first_datas:0<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> determine_op_status: dbx_first_datas_monitor_0 on olegdbx39-vm000003<br>
> returned 'unknown' (189) instead of the expected value: 'not running' (7)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:<br>
> unpack_rsc_op_failure: Processing failed op monitor for<br>
> dbx_first_datas:0 on olegdbx39-vm000003: unknown (189)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> determine_op_status: dbx_first_datas_monitor_0 on olegdbx39-vm000003<br>
> returned 'unknown' (189) instead of the expected value: 'not running' (7)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:<br>
> unpack_rsc_op_failure: Processing failed op monitor for<br>
> dbx_first_datas:0 on olegdbx39-vm000003: unknown (189)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> determine_op_status: dbx_first_head_monitor_0 on olegdbx39-vm000003<br>
> returned 'unknown' (189) instead of the expected value: 'not running' (7)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:<br>
> unpack_rsc_op_failure: Processing failed op monitor for<br>
> dbx_first_head on olegdbx39-vm000003: unknown (189)<br>
> May 19 13:15:42 [6872] olegdbx39-vm-0 stonith-ng: debug:<br>
> xml_patch_version_check: Can apply patch 2.5.47 to 2.5.46<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> determine_op_status: dbx_first_head_monitor_0 on olegdbx39-vm000003<br>
> returned 'unknown' (189) instead of the expected value: 'not running' (7)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:<br>
> unpack_rsc_op_failure: Processing failed op monitor for<br>
> dbx_first_head on olegdbx39-vm000003: unknown (189)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_mounts_nodes on<br>
> olegdbx39-vm000003 to dbx_mounts_nodes:0<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> determine_op_status: dbx_mounts_nodes_monitor_0 on olegdbx39-vm000003<br>
> returned 'unknown' (189) instead of the expected value: 'not running' (7)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:<br>
> unpack_rsc_op_failure: Processing failed op monitor for<br>
> dbx_mounts_nodes:0 on olegdbx39-vm000003: unknown (189)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> determine_op_status: dbx_mounts_nodes_monitor_0 on olegdbx39-vm000003<br>
> returned 'unknown' (189) instead of the expected value: 'not running' (7)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:<br>
> unpack_rsc_op_failure: Processing failed op monitor for<br>
> dbx_mounts_nodes:0 on olegdbx39-vm000003: unknown (189)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_nfs_mounts_datas on<br>
> olegdbx39-vm000003 to dbx_nfs_mounts_datas:0<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> determine_op_status: dbx_nfs_mounts_datas_monitor_0 on<br>
> olegdbx39-vm000003 returned 'unknown' (189) instead of the expected<br>
> value: 'not running' (7)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: warning:<br>
> unpack_rsc_op_failure: Processing failed op monitor for<br>
> dbx_nfs_mounts_datas:0 on olegdbx39-vm000003: unknown (189)<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_ready_primary on<br>
> olegdbx39-vm000003 to dbx_ready_primary:0<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_first_datas on<br>
> olegdbx39-vm-0 to dbx_first_datas:1<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_swap_nodes on<br>
> olegdbx39-vm-0 to dbx_swap_nodes:0<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_mounts_nodes on<br>
> olegdbx39-vm-0 to dbx_mounts_nodes:1<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_bind_mounts_nodes on<br>
> olegdbx39-vm-0 to dbx_bind_mounts_nodes:1<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_nfs_mounts_datas on<br>
> olegdbx39-vm-0 to dbx_nfs_mounts_datas:0<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_nfs_nodes on<br>
> olegdbx39-vm-0 to dbx_nfs_nodes:0<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_ready_primary on<br>
> olegdbx39-vm-0 to dbx_ready_primary:0<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_first_datas on<br>
> olegdbx39-vm000002 to dbx_first_datas:1<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_swap_nodes on<br>
> olegdbx39-vm000002 to dbx_swap_nodes:0<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_nfs_mounts_datas on<br>
> olegdbx39-vm000002 to dbx_nfs_mounts_datas:0<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_mounts_nodes on<br>
> olegdbx39-vm000002 to dbx_mounts_nodes:1<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_bind_mounts_nodes on<br>
> olegdbx39-vm000002 to dbx_bind_mounts_nodes:1<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_nfs_nodes on<br>
> olegdbx39-vm000002 to dbx_nfs_nodes:0<br>
> May 19 13:15:42 [8114] olegdbx39-vm-0 crm_mon: debug:<br>
> find_anonymous_clone: Internally renamed dbx_ready_primary on<br>
> olegdbx39-vm000002 to dbx_ready_primary:0<br>
> [...]<br>
><br>
><br>
> Thanks in advance,<br>
><br>
><br>
> On Thu, May 18, 2017 at 4:37 PM, Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a><br>
</div></div><div class="HOEnZb"><div class="h5">> <mailto:<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>>> wrote:<br>
><br>
> On 05/17/2017 06:10 PM, Radoslaw Garbacz wrote:<br>
> > Hi,<br>
> ><br>
> > I have a question regarding '<lrm_resource><lrm_rsc_op> 'op-status<br>
> > attribute getting value 4.<br>
> ><br>
> > In my case I have a strange behavior, when resources get those "monitor"<br>
> > operation entries in the CIB with op-status=4, and they do not seem to<br>
> > be called (exec-time=0).<br>
> ><br>
> > What does 'op-status' = 4 mean?<br>
><br>
> The action had an error status<br>
><br>
> ><br>
> > I would appreciate some elaboration regarding this, since this is<br>
> > interpreted by pacemaker as an error, which causes logs:<br>
> > crm_mon: error: unpack_rsc_op: Preventing dbx_head_head from<br>
> > re-starting anywhere: operation monitor failed 'not configured' (6)<br>
><br>
> The rc-code="6" is the more interesting number; it's the result returned<br>
> by the resource agent. As you can see above, it means "not configured".<br>
> What that means exactly is up to the resource agent's interpretation.<br>
><br>
> > and I am pretty sure the resource agent was not called (no logs,<br>
> > exec-time=0)<br>
><br>
> Normally this could only come from the resource agent.<br>
><br>
> However there are two cases where pacemaker generates this error itself:<br>
> if the resource definition in the CIB is invalid; and if your version of<br>
> pacemaker was compiled with support for reading sensitive parameter<br>
> values from a file, but that file could not be read.<br>
><br>
> It doesn't sound like your case is either one of those though, since<br>
> they would prevent the resource from even starting. Most likely it's<br>
> coming from the resource agent. I'd look at the resource agent source<br>
> code and see where it can return OCF_ERR_CONFIGURED.<br>
><br>
> > There are two aspects of this:<br>
> ><br>
> > 1) harmless (pacemaker seems to not bother about it), which I guess<br>
> > indicates cancelled monitoring operations:<br>
> > op-status=4, rc-code=189<br>
><br>
> This error means the connection between the crmd and lrmd daemons was<br>
> lost -- most commonly, that shows up for operations that were pending at<br>
> shutdown.<br>
><br>
> ><br>
> > * Example:<br>
> > <lrm_rsc_op id="dbx_first_datas_last_<wbr>failure_0"<br>
> > operation_key="dbx_first_<wbr>datas_monitor_0" operation="monitor"<br>
> > crm-debug-origin="do_update_<wbr>resource" crm_feature_set="3.0.12"<br>
> > transition-key="38:0:7:<wbr>c8b63d9d-9c70-4f99-aa1b-<wbr>e993de6e4739"<br>
> > transition-magic="4:189;38:0:<wbr>7:c8b63d9d-9c70-4f99-aa1b-<wbr>e993de6e4739"<br>
> > on_node="olegdbx61-vm000001" call-id="10" rc-code="189" op-status="4"<br>
> > interval="0" last-run="1495057378" last-rc-change="1495057378"<br>
> > exec-time="0" queue-time="0"<br>
> op-digest="<wbr>f6bd1386a336e8e6ee25ecb651a9ef<wbr>b6"/><br>
> ><br>
> ><br>
> > 2) error level one (op-status=4, rc-code=6), which generates logs:<br>
> > crm_mon: error: unpack_rsc_op: Preventing dbx_head_head from<br>
> > re-starting anywhere: operation monitor failed 'not configured' (6)<br>
> ><br>
> > * Example:<br>
> > <lrm_rsc_op id="dbx_head_head_last_<wbr>failure_0"<br>
> > operation_key="dbx_head_head_<wbr>monitor_0" operation="monitor"<br>
> > crm-debug-origin="do_update_<wbr>resource" crm_feature_set="3.0.12"<br>
> > transition-key="39:0:7:<wbr>c8b63d9d-9c70-4f99-aa1b-<wbr>e993de6e4739"<br>
> > transition-magic="4:6;39:0:7:<wbr>c8b63d9d-9c70-4f99-aa1b-<wbr>e993de6e4739"<br>
> > on_node="olegdbx61-vm000001" call-id="999999999" rc-code="6"<br>
> > op-status="4" interval="0" last-run="1495057389"<br>
> > last-rc-change="1495057389" exec-time="0" queue-time="0"<br>
> > op-digest="<wbr>60cdc9db1c5b77e8dba698d3d0c8cd<wbr>a8"/><br>
> ><br>
> ><br>
> > Could it be some hardware (VM hyperviser) issue?<br>
> ><br>
> ><br>
> > Thanks in advance,<br>
> ><br>
> > --<br>
> > Best Regards,<br>
> ><br>
> > Radoslaw Garbacz<br>
> > XtremeData Incorporated<br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Best Regards,<br><br>Radoslaw Garbacz<br></div>XtremeData Incorporated<br></div></div></div></div>
</div>