<html><head></head><body><div style="font-family: Verdana;font-size: 12.0px;"><div>Hi<br/>
<br/>
I setup a two node cluster based on kvm (it is a test).<br/>
<br/>
Now I'm trying to setup fence_scsi to prevent split brain. One of my test scenario is to suspend the kvm instance and to resume it (if I kill corosync the someone restart it). I see that the other node do a failover but after I resume the kvm instance the cluster node join again (the service group switch back ) and the node is not dying (reboot, cluster framework down etc)<br/>
<br/>
I also create a extreme high system load (while true; do yes >/dev/null& done), the resources switched to the other node. After I killed the thounsends of yes processes the pcs service is down but I think that is not caused by the fence_scsi. In the log I found that "notice: too_many_st_failures: No devices found in cluster to fence server1in1, giving up"<br/>
<br/>
What is wrong in my configuration? In need two stonith devices? or what is wrong?<br/>
<br/>
See below for the logs and other output... Many thanks in advance!<br/>
<br/>
Regards<br/>
<br/>
Frank<br/>
<br/>
I created the stonith resource with this command:<br/>
<br/>
pcs stonith create scsi_server fence_scsi pcmk_host_list="server1in1 server2in1" pcmk_monitor_action="metadata" devices="/dev/disk/by-id/wwn-0x6001405516563e3d75b5d3cceda0a1dc" meta provides="unfencing"<br/>
<br/>
I see two keys on my iSCSI device<br/>
<br/>
[root@server1 ~]# sg_persist --in --no-inquiry --read-key --device=/dev/disk/by-id/wwn-0x6001405516563e3d75b5d3cceda0a1dc<br/>
PR generation=0x2, 2 registered reservation keys follow:<br/>
0xfe290001<br/>
0xfe290000<br/>
<br/>
The current cluster state locks good to me.<br/>
<br/>
Cluster name: nonvcs_cl<br/>
Last updated: Tue Jun 16 10:11:41 2015<br/>
Last change: Tue Jun 16 10:11:37 2015<br/>
Stack: corosync<br/>
Current DC: server2in1 (2) - partition with quorum<br/>
Version: 1.1.12-a14efad<br/>
2 Nodes configured<br/>
3 Resources configured<br/>
<br/>
<br/>
Online: [ server1in1 server2in1 ]<br/>
<br/>
Full list of resources:<br/>
<br/>
Resource Group: servicea_sg<br/>
ClusterIP2 (ocf::heartbeat:IPaddr2): Started server1in1<br/>
www2_mnt (ocf::heartbeat:Filesystem): Started server1in1<br/>
scsi_server (stonith:fence_scsi): Started server2in1<br/>
<br/>
PCSD Status:<br/>
server1in1: Online<br/>
server2in1: Online<br/>
<br/>
Daemon Status:<br/>
corosync: active/enabled<br/>
pacemaker: active/disabled<br/>
pcsd: active/enabled<br/>
<br/>
##################<br/>
messages | grep stonith<br/>
<br/>
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: stonith_device_register: Added 'scsi_server' to the device list (1 active devices)<br/>
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: handle_request: Client crmd.1087.b3e11b2e wants to fence (on) 'server2in1' with device '(any)'<br/>
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: initiate_remote_stonith_op: Initiating remote operation on for server2in1: fd8b714f-6ac3-4227-9937-0d4e7b98e454 (0)<br/>
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: handle_request: Client crmd.1087.b3e11b2e wants to fence (on) 'server1in1' with device '(any)'<br/>
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: initiate_remote_stonith_op: Initiating remote operation on for server1in1: e54b60cf-87a3-403f-9061-a4cac2fa7d0d (0)<br/>
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: can_fence_host_with_device: scsi_server can fence (on) server2in1: static-list<br/>
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: can_fence_host_with_device: scsi_server can fence (on) server2in1: static-list<br/>
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: log_operation: Operation 'on' [13198] (call 22 from crmd.1087) for host 'server2in1' with device 'scsi_server' returned: 0 (OK)<br/>
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: remote_op_done: Operation on of server2in1 by <no-one> for <a class="moz-txt-link-abbreviated" href="mailto:crmd.1087@server2in1.fd8b714f">crmd.1087@server2in1.fd8b714f</a>: OK<br/>
Jun 16 10:11:37 server2 crmd[1087]: notice: tengine_stonith_callback: Stonith operation 22/4:91:0:ee8fb283-e55b-40f1-ae89-83b84c76efac: OK (0)<br/>
Jun 16 10:11:37 server2 crmd[1087]: notice: tengine_stonith_notify: server2in1 was successfully unfenced by <anyone> (at the request of server2in1)<br/>
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: remote_op_done: Operation on of server1in1 by <no-one> for <a class="moz-txt-link-abbreviated" href="mailto:crmd.1087@server2in1.e54b60cf">crmd.1087@server2in1.e54b60cf</a>: OK<br/>
Jun 16 10:11:37 server2 crmd[1087]: notice: tengine_stonith_callback: Stonith operation 23/3:91:0:ee8fb283-e55b-40f1-ae89-83b84c76efac: OK (0)<br/>
Jun 16 10:11:37 server2 crmd[1087]: notice: tengine_stonith_notify: server1in1 was successfully unfenced by <anyone> (at the request of server2in1)<br/>
<br/>
##################<br/>
Verify is OK<br/>
<br/>
[root@server1 ~]# crm_verify -L<br/>
[root@server1 ~]#<br/>
<br/>
##################<br/>
After resume one key is deleted<br/>
[root@server1 ~]# sg_persist --in --no-inquiry --read-key --device=/dev/disk/by-id/wwn-0x6001405516563e3d75b5d3cceda0a1dc<br/>
PR generation=0x3, 1 registered reservation key follows:<br/>
0xfe290001<br/>
<br/>
##################<br/>
Log yes > /dev/null extreme system load<br/>
<br/>
Jun 16 10:45:39 server2 corosync[959]: [QUORUM] Members[1]: 2<br/>
Jun 16 10:45:39 server2 corosync[959]: [MAIN ] Completed service synchronization, ready to provide service.<br/>
Jun 16 10:45:41 server2 corosync[959]: [TOTEM ] A new membership (192.168.200.131:1668) was formed. Members joined: 1<br/>
Jun 16 10:45:41 server2 corosync[959]: [QUORUM] Members[2]: 2 1<br/>
Jun 16 10:45:41 server2 corosync[959]: [MAIN ] Completed service synchronization, ready to provide service.<br/>
Jun 16 10:45:41 server2 crmd[1087]: notice: crm_update_peer_state: pcmk_quorum_notification: Node server1in1[1] - state is now member (was lost)<br/>
Jun 16 10:45:41 server2 pacemakerd[1081]: notice: crm_update_peer_state: pcmk_quorum_notification: Node server1in1[1] - state is now member (was lost)<br/>
<br/>
########################<br/>
pcs status "whilte true.. yes > /dev/null&" extreme system load<br/>
<br/>
[root@server1 ~]# pcs status<br/>
Error: cluster is not currently running on this node<br/>
<br/>
[root@server2 ~]# pcs status | grep server1in1<br/>
Node server1in1 (1): pending<br/>
server1in1: Online<br/>
<br/>
########################<br/>
Full log<br/>
<br/>
Jun 16 10:31:04 server2 corosync[959]: [TOTEM ] A processor failed, forming new configuration.<br/>
Jun 16 10:31:05 server2 corosync[959]: [TOTEM ] A new membership (192.168.200.131:1184) was formed. Members left: 1<br/>
Jun 16 10:31:05 server2 corosync[959]: [QUORUM] Members[1]: 2<br/>
Jun 16 10:31:05 server2 corosync[959]: [MAIN ] Completed service synchronization, ready to provide service.<br/>
Jun 16 10:31:05 server2 attrd[1085]: notice: crm_update_peer_state: attrd_peer_change_cb: Node server1in1[1] - state is now lost (was member)<br/>
Jun 16 10:31:05 server2 attrd[1085]: notice: attrd_peer_remove: Removing all server1in1 attributes for attrd_peer_change_cb<br/>
Jun 16 10:31:05 server2 pacemakerd[1081]: notice: crm_update_peer_state: pcmk_quorum_notification: Node server1in1[1] - state is now lost (was member)<br/>
Jun 16 10:31:05 server2 crmd[1087]: notice: crm_update_peer_state: pcmk_quorum_notification: Node server1in1[1] - state is now lost (was member)<br/>
Jun 16 10:31:05 server2 crmd[1087]: warning: match_down_event: No match for shutdown action on 1<br/>
Jun 16 10:31:05 server2 crmd[1087]: notice: peer_update_callback: Stonith/shutdown of server1in1 not matched<br/>
Jun 16 10:31:05 server2 crmd[1087]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]<br/>
Jun 16 10:31:05 server2 crmd[1087]: warning: match_down_event: No match for shutdown action on 1<br/>
Jun 16 10:31:05 server2 crmd[1087]: notice: peer_update_callback: Stonith/shutdown of server1in1 not matched<br/>
Jun 16 10:31:06 server2 pengine[1086]: notice: unpack_config: On loss of CCM Quorum: Ignore<br/>
Jun 16 10:31:06 server2 pengine[1086]: warning: pe_fence_node: Node server1in1 will be fenced because the node is no longer part of the cluster<br/>
Jun 16 10:31:06 server2 pengine[1086]: warning: determine_online_status: Node server1in1 is unclean<br/>
Jun 16 10:31:06 server2 pengine[1086]: warning: custom_action: Action ClusterIP2_stop_0 on server1in1 is unrunnable (offline)<br/>
Jun 16 10:31:06 server2 pengine[1086]: warning: custom_action: Action www2_mnt_stop_0 on server1in1 is unrunnable (offline)<br/>
Jun 16 10:31:06 server2 pengine[1086]: warning: stage6: Scheduling Node server1in1 for STONITH<br/>
Jun 16 10:31:06 server2 pengine[1086]: notice: LogActions: Move ClusterIP2 (Started server1in1 -> server2in1)<br/>
Jun 16 10:31:06 server2 pengine[1086]: notice: LogActions: Move www2_mnt (Started server1in1 -> server2in1)<br/>
Jun 16 10:31:06 server2 pengine[1086]: warning: process_pe_message: Calculated Transition 93: /var/lib/pacemaker/pengine/pe-warn-53.bz2<br/>
Jun 16 10:31:06 server2 crmd[1087]: notice: te_fence_node: Executing reboot fencing operation (20) on server1in1 (timeout=60000)<br/>
Jun 16 10:31:06 server2 stonith-ng[1083]: notice: handle_request: Client crmd.1087.b3e11b2e wants to fence (reboot) 'server1in1' with device '(any)'<br/>
Jun 16 10:31:06 server2 stonith-ng[1083]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for server1in1: 910d86d6-a53c-4d14-8b66-3e8ef2043bbf (0)<br/>
Jun 16 10:31:06 server2 stonith-ng[1083]: notice: can_fence_host_with_device: scsi_server can fence (reboot) server1in1: static-list<br/>
Jun 16 10:31:06 server2 stonith-ng[1083]: notice: can_fence_host_with_device: scsi_server can fence (reboot) server1in1: static-list<br/>
Jun 16 10:31:06 server2 stonith-ng[1083]: warning: stonith_device_execute: Agent 'fence_scsi' does not advertise support for 'reboot', performing 'off' action instead<br/>
Jun 16 10:31:07 server2 stonith-ng[1083]: notice: log_operation: Operation 'reboot' [13384] (call 24 from crmd.1087) for host 'server1in1' with device 'scsi_server' returned: 0 (OK)<br/>
Jun 16 10:31:07 server2 stonith-ng[1083]: notice: remote_op_done: Operation reboot of server1in1 by <no-one> for <a class="moz-txt-link-abbreviated" href="mailto:crmd.1087@server2in1.910d86d6">crmd.1087@server2in1.910d86d6</a>: OK<br/>
Jun 16 10:31:07 server2 crmd[1087]: notice: tengine_stonith_callback: Stonith operation 24/20:93:0:ee8fb283-e55b-40f1-ae89-83b84c76efac: OK (0)<br/>
Jun 16 10:31:07 server2 crmd[1087]: notice: tengine_stonith_notify: Peer server1in1 was terminated (reboot) by <anyone> for server2in1: OK (ref=910d86d6-a53c-4d14-8b66-3e8ef2043bbf) by client crmd.1087<br/>
Jun 16 10:31:07 server2 crmd[1087]: notice: te_rsc_command: Initiating action 8: start ClusterIP2_start_0 on server2in1 (local)<br/>
Jun 16 10:31:07 server2 crmd[1087]: notice: abort_transition_graph: Transition aborted by deletion of lrm[@id='1']: Resource state removal (cib=0.320.17, source=te_update_diff:429, path=/cib/status/node_state[@id='1']/lrm[@id='1'], 0)<br/>
Jun 16 10:31:07 server2 IPaddr2(ClusterIP2)[13411]: INFO: Adding inet address 192.168.122.112/32 with broadcast address 192.168.122.255 to device 122er<br/>
Jun 16 10:31:07 server2 IPaddr2(ClusterIP2)[13411]: INFO: Bringing device 122er up<br/>
Jun 16 10:31:07 server2 IPaddr2(ClusterIP2)[13411]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.122.112 122er 192.168.122.112 auto not_used not_used<br/>
Jun 16 10:31:07 server2 crmd[1087]: notice: process_lrm_event: Operation ClusterIP2_start_0: ok (node=server2in1, call=118, rc=0, cib-update=601, confirmed=true)<br/>
Jun 16 10:31:07 server2 crmd[1087]: notice: run_graph: Transition 93 (Complete=9, Pending=0, Fired=0, Skipped=4, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-53.bz2): Stopped<br/>
Jun 16 10:31:07 server2 pengine[1086]: notice: unpack_config: On loss of CCM Quorum: Ignore<br/>
Jun 16 10:31:07 server2 pengine[1086]: notice: LogActions: Start www2_mnt (server2in1)<br/>
Jun 16 10:31:07 server2 pengine[1086]: notice: process_pe_message: Calculated Transition 94: /var/lib/pacemaker/pengine/pe-input-280.bz2<br/>
Jun 16 10:31:07 server2 crmd[1087]: notice: te_rsc_command: Initiating action 9: monitor ClusterIP2_monitor_30000 on server2in1 (local)<br/>
Jun 16 10:31:07 server2 crmd[1087]: notice: te_rsc_command: Initiating action 10: start www2_mnt_start_0 on server2in1 (local)<br/>
Jun 16 10:31:07 server2 crmd[1087]: notice: process_lrm_event: Operation ClusterIP2_monitor_30000: ok (node=server2in1, call=119, rc=0, cib-update=603, confirmed=false)<br/>
Jun 16 10:31:07 server2 Filesystem(www2_mnt)[13490]: INFO: Running start for /dev/disk/by-id/wwn-0x6001405516563e3d75b5d3cceda0a1dc-part1 on /var/www2<br/>
Jun 16 10:31:07 server2 kernel: EXT4-fs (sda1): recovery complete<br/>
Jun 16 10:31:07 server2 kernel: EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)<br/>
Jun 16 10:31:07 server2 crmd[1087]: notice: process_lrm_event: Operation www2_mnt_start_0: ok (node=server2in1, call=120, rc=0, cib-update=604, confirmed=true)<br/>
Jun 16 10:31:07 server2 crmd[1087]: notice: te_rsc_command: Initiating action 11: monitor www2_mnt_monitor_20000 on server2in1 (local)<br/>
Jun 16 10:31:07 server2 crmd[1087]: notice: process_lrm_event: Operation www2_mnt_monitor_20000: ok (node=server2in1, call=121, rc=0, cib-update=605, confirmed=false)<br/>
Jun 16 10:31:07 server2 crmd[1087]: notice: run_graph: Transition 94 (Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-280.bz2): Complete<br/>
Jun 16 10:31:07 server2 crmd[1087]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]<br/>
Jun 16 10:31:55 server2 corosync[959]: [TOTEM ] A new membership (192.168.200.131:1188) was formed. Members joined: 1<br/>
Jun 16 10:31:55 server2 attrd[1085]: notice: crm_update_peer_state: attrd_peer_change_cb: Node server1in1[1] - state is now member (was lost)<br/>
Jun 16 10:31:55 server2 crmd[1087]: error: pcmk_cpg_membership: Node server1in1[1] appears to be online even though we think it is dead<br/>
Jun 16 10:31:55 server2 crmd[1087]: notice: crm_update_peer_state: pcmk_cpg_membership: Node server1in1[1] - state is now member (was lost)<br/>
Jun 16 10:31:55 server2 corosync[959]: [QUORUM] Members[2]: 2 1<br/>
Jun 16 10:31:55 server2 corosync[959]: [MAIN ] Completed service synchronization, ready to provide service.<br/>
Jun 16 10:31:55 server2 pacemakerd[1081]: notice: crm_update_peer_state: pcmk_quorum_notification: Node server1in1[1] - state is now member (was lost)<br/>
Jun 16 10:31:55 server2 crmd[1087]: notice: do_state_transition: State transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ]<br/>
Jun 16 10:31:56 server2 crmd[1087]: notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED origin=election_timeout_popped ]<br/>
Jun 16 10:31:56 server2 crmd[1087]: warning: do_log: FSA: Input I_ELECTION_DC from do_election_check() received in state S_INTEGRATION<br/>
Jun 16 10:31:58 server2 pengine[1086]: notice: unpack_config: On loss of CCM Quorum: Ignore<br/>
Jun 16 10:31:58 server2 pengine[1086]: error: native_create_actions: Resource ClusterIP2 (ocf::IPaddr2) is active on 2 nodes attempting recovery<br/>
Jun 16 10:31:58 server2 pengine[1086]: warning: native_create_actions: See <a class="moz-txt-link-freetext" href="http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active">http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active</a> for more information.<br/>
Jun 16 10:31:58 server2 pengine[1086]: error: native_create_actions: Resource www2_mnt (ocf::Filesystem) is active on 2 nodes attempting recovery<br/>
Jun 16 10:31:58 server2 pengine[1086]: warning: native_create_actions: See <a class="moz-txt-link-freetext" href="http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active">http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active</a> for more information.<br/>
Jun 16 10:31:58 server2 pengine[1086]: notice: LogActions: Restart ClusterIP2 (Started server1in1)<br/>
Jun 16 10:31:58 server2 pengine[1086]: notice: LogActions: Restart www2_mnt (Started server1in1)<br/>
Jun 16 10:31:58 server2 pengine[1086]: error: process_pe_message: Calculated Transition 95: /var/lib/pacemaker/pengine/pe-error-3.bz2<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating action 16: stop www2_mnt_stop_0 on server2in1 (local)<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating action 15: stop www2_mnt_stop_0 on server1in1<br/>
Jun 16 10:31:58 server2 Filesystem(www2_mnt)[13787]: INFO: Running stop for /dev/disk/by-id/wwn-0x6001405516563e3d75b5d3cceda0a1dc-part1 on /var/www2<br/>
Jun 16 10:31:58 server2 Filesystem(www2_mnt)[13787]: INFO: Trying to unmount /var/www2<br/>
Jun 16 10:31:58 server2 Filesystem(www2_mnt)[13787]: INFO: unmounted /var/www2 successfully<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: process_lrm_event: Operation www2_mnt_stop_0: ok (node=server2in1, call=123, rc=0, cib-update=637, confirmed=true)<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating action 13: stop ClusterIP2_stop_0 on server2in1 (local)<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating action 12: stop ClusterIP2_stop_0 on server1in1<br/>
Jun 16 10:31:58 server2 IPaddr2(ClusterIP2)[13866]: INFO: IP status = ok, IP_CIP=<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: process_lrm_event: Operation ClusterIP2_stop_0: ok (node=server2in1, call=125, rc=0, cib-update=638, confirmed=true)<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating action 14: start ClusterIP2_start_0 on server1in1<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating action 2: monitor ClusterIP2_monitor_30000 on server1in1<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating action 17: start www2_mnt_start_0 on server1in1<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating action 1: monitor www2_mnt_monitor_20000 on server1in1<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: run_graph: Transition 95 (Complete=13, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-error-3.bz2): Complete<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]<br/>
^C<br/>
[root@server2 ~]# tail -f /var/log/messages<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating action 13: stop ClusterIP2_stop_0 on server2in1 (local)<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating action 12: stop ClusterIP2_stop_0 on server1in1<br/>
Jun 16 10:31:58 server2 IPaddr2(ClusterIP2)[13866]: INFO: IP status = ok, IP_CIP=<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: process_lrm_event: Operation ClusterIP2_stop_0: ok (node=server2in1, call=125, rc=0, cib-update=638, confirmed=true)<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating action 14: start ClusterIP2_start_0 on server1in1<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating action 2: monitor ClusterIP2_monitor_30000 on server1in1<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating action 17: start www2_mnt_start_0 on server1in1<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating action 1: monitor www2_mnt_monitor_20000 on server1in1<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: run_graph: Transition 95 (Complete=13, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-error-3.bz2): Complete<br/>
Jun 16 10:31:58 server2 crmd[1087]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]<br/>
<br/>
<br/>
#####################################################<br/>
messages yes > /dev/null&<br/>
<br/>
Jun 16 10:44:37 server2 corosync[959]: [MAIN ] Completed service synchronization, ready to provide service.<br/>
Jun 16 10:44:49 server2 corosync[959]: [TOTEM ] A new membership (192.168.200.131:1612) was formed. Members<br/>
Jun 16 10:44:49 server2 corosync[959]: [QUORUM] Members[1]: 2<br/>
Jun 16 10:44:49 server2 corosync[959]: [MAIN ] Completed service synchronization, ready to provide service.<br/>
Jun 16 10:45:00 server2 corosync[959]: [TOTEM ] A new membership (192.168.200.131:1620) was formed. Members<br/>
Jun 16 10:45:00 server2 corosync[959]: [QUORUM] Members[1]: 2<br/>
Jun 16 10:45:00 server2 corosync[959]: [MAIN ] Completed service synchronization, ready to provide service.<br/>
Jun 16 10:45:12 server2 corosync[959]: [TOTEM ] A new membership (192.168.200.131:1628) was formed. Members<br/>
Jun 16 10:45:12 server2 corosync[959]: [QUORUM] Members[1]: 2<br/>
Jun 16 10:45:12 server2 corosync[959]: [MAIN ] Completed service synchronization, ready to provide service.<br/>
Jun 16 10:45:25 server2 corosync[959]: [TOTEM ] A new membership (192.168.200.131:1636) was formed. Members<br/>
Jun 16 10:45:25 server2 corosync[959]: [QUORUM] Members[1]: 2<br/>
Jun 16 10:45:25 server2 corosync[959]: [MAIN ] Completed service synchronization, ready to provide service.<br/>
Jun 16 10:45:30 server2 corosync[959]: [TOTEM ] A new membership (192.168.200.131:1644) was formed. Members<br/>
Jun 16 10:45:30 server2 corosync[959]: [QUORUM] Members[1]: 2<br/>
Jun 16 10:45:30 server2 corosync[959]: [MAIN ] Completed service synchronization, ready to provide service.<br/>
Jun 16 10:45:35 server2 corosync[959]: [TOTEM ] A new membership (192.168.200.131:1652) was formed. Members<br/>
Jun 16 10:45:35 server2 corosync[959]: [QUORUM] Members[1]: 2<br/>
Jun 16 10:45:35 server2 corosync[959]: [MAIN ] Completed service synchronization, ready to provide service.<br/>
Jun 16 10:45:39 server2 corosync[959]: [TOTEM ] A new membership (192.168.200.131:1660) was formed. Members<br/>
Jun 16 10:45:39 server2 corosync[959]: [QUORUM] Members[1]: 2<br/>
Jun 16 10:45:39 server2 corosync[959]: [MAIN ] Completed service synchronization, ready to provide service.<br/>
Jun 16 10:45:41 server2 corosync[959]: [TOTEM ] A new membership (192.168.200.131:1668) was formed. Members joined: 1<br/>
Jun 16 10:45:41 server2 corosync[959]: [QUORUM] Members[2]: 2 1<br/>
Jun 16 10:45:41 server2 corosync[959]: [MAIN ] Completed service synchronization, ready to provide service.<br/>
Jun 16 10:45:41 server2 crmd[1087]: notice: crm_update_peer_state: pcmk_quorum_notification: Node server1in1[1] - state is now member (was lost)<br/>
Jun 16 10:45:41 server2 pacemakerd[1081]: notice: crm_update_peer_state: pcmk_quorum_notification: Node server1in1[1] - state is now member (was lost)<br/>
Jun 16 10:55:00 server2 crmd[1087]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]<br/>
Jun 16 10:55:00 server2 pengine[1086]: notice: unpack_config: On loss of CCM Quorum: Ignore<br/>
Jun 16 10:55:00 server2 pengine[1086]: warning: custom_action: Action ClusterIP2_monitor_0 on server1in1 is unrunnable (pending)<br/>
Jun 16 10:55:00 server2 pengine[1086]: warning: custom_action: Action www2_mnt_monitor_0 on server1in1 is unrunnable (pending)<br/>
Jun 16 10:55:00 server2 pengine[1086]: warning: custom_action: Action scsi_server_monitor_0 on server1in1 is unrunnable (pending)<br/>
Jun 16 10:55:00 server2 pengine[1086]: notice: trigger_unfencing: Unfencing server1in1: node discovery<br/>
Jun 16 10:55:00 server2 pengine[1086]: notice: process_pe_message: Calculated Transition 101: /var/lib/pacemaker/pengine/pe-input-284.bz2<br/>
Jun 16 10:55:00 server2 crmd[1087]: notice: te_fence_node: Executing on fencing operation (4) on server1in1 (timeout=60000)<br/>
Jun 16 10:55:00 server2 stonith-ng[1083]: notice: handle_request: Client crmd.1087.b3e11b2e wants to fence (on) 'server1in1' with device '(any)'<br/>
Jun 16 10:55:00 server2 stonith-ng[1083]: notice: initiate_remote_stonith_op: Initiating remote operation on for server1in1: 3b0b3967-6f33-4b68-9f4d-246b69e0370a (0)<br/>
Jun 16 10:55:00 server2 stonith-ng[1083]: notice: stonith_choose_peer: Couldn't find anyone to fence server1in1 with <any><br/>
Jun 16 10:55:00 server2 stonith-ng[1083]: error: remote_op_done: Operation on of server1in1 by <no-one> for <a class="moz-txt-link-abbreviated" href="mailto:crmd.1087@server2in1.3b0b3967">crmd.1087@server2in1.3b0b3967</a>: No such device<br/>
Jun 16 10:55:00 server2 crmd[1087]: notice: tengine_stonith_callback: Stonith operation 27/4:101:0:ee8fb283-e55b-40f1-ae89-83b84c76efac: No such device (-19)<br/>
Jun 16 10:55:00 server2 crmd[1087]: notice: tengine_stonith_callback: Stonith operation 27 for server1in1 failed (No such device): aborting transition.<br/>
Jun 16 10:55:00 server2 crmd[1087]: notice: abort_transition_graph: Transition aborted: Stonith failed (source=tengine_stonith_callback:697, 0)<br/>
Jun 16 10:55:00 server2 crmd[1087]: error: tengine_stonith_notify: Unfencing of server1in1 by <anyone> failed: No such device (-19)<br/>
Jun 16 10:55:00 server2 crmd[1087]: notice: run_graph: Transition 101 (Complete=1, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-284.bz2): Stopped<br/>
Jun 16 10:55:00 server2 crmd[1087]: notice: too_many_st_failures: No devices found in cluster to fence server1in1, giving up<br/>
Jun 16 10:55:00 server2 crmd[1087]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]<br/>
<br/>
</div>
<div> </div>
<div class="signature"> </div></div></body></html>