[ClusterLabs] fence_scsi No devices found in cluster to fence

Frank Beckmann Frank.Beckmann at gmx.com
Tue Jun 16 11:09:25 CEST 2015


Hi

I setup a two node cluster based on kvm (it is a test).

Now I'm trying to setup fence_scsi to prevent split brain. One of my 
test scenario is to suspend the kvm instance and to resume it (if I kill 
corosync the someone restart it). I see that the other node do a 
failover but after I resume the kvm instance the cluster node join again 
(the service group switch back :-( ) and the node is not dying (reboot, 
cluster framework down etc) :-)

I also create a extreme high system load (while true; do yes >/dev/null& 
done), the resources switched to the other node. After I killed the 
thounsends of yes processes the pcs service is down but I think that is 
not caused by the fence_scsi. In the log I found that "notice: 
too_many_st_failures: No devices found in cluster to fence server1in1, 
giving up"

What is wrong in my configuration? In need two stonith devices? or what 
is wrong?

See below for the logs and other output... Many thanks in advance!

Regards

Frank

I created the stonith resource with this command:

pcs stonith create scsi_server fence_scsi pcmk_host_list="server1in1 
server2in1" pcmk_monitor_action="metadata" 
devices="/dev/disk/by-id/wwn-0x6001405516563e3d75b5d3cceda0a1dc" meta 
provides="unfencing"

I see two keys on my iSCSI device

[root at server1 ~]# sg_persist --in --no-inquiry --read-key 
--device=/dev/disk/by-id/wwn-0x6001405516563e3d75b5d3cceda0a1dc
   PR generation=0x2, 2 registered reservation keys follow:
     0xfe290001
     0xfe290000

The current cluster state locks good to me.

Cluster name: nonvcs_cl
Last updated: Tue Jun 16 10:11:41 2015
Last change: Tue Jun 16 10:11:37 2015
Stack: corosync
Current DC: server2in1 (2) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
3 Resources configured


Online: [ server1in1 server2in1 ]

Full list of resources:

  Resource Group: servicea_sg
      ClusterIP2    (ocf::heartbeat:IPaddr2):    Started server1in1
      www2_mnt    (ocf::heartbeat:Filesystem):    Started server1in1
  scsi_server    (stonith:fence_scsi):    Started server2in1

PCSD Status:
   server1in1: Online
   server2in1: Online

Daemon Status:
   corosync: active/enabled
   pacemaker: active/disabled
   pcsd: active/enabled

##################
messages | grep stonith

Jun 16 10:11:37 server2 stonith-ng[1083]: notice: 
stonith_device_register: Added 'scsi_server' to the device list (1 
active devices)
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: handle_request: Client 
crmd.1087.b3e11b2e wants to fence (on) 'server2in1' with device '(any)'
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: 
initiate_remote_stonith_op: Initiating remote operation on for 
server2in1: fd8b714f-6ac3-4227-9937-0d4e7b98e454 (0)
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: handle_request: Client 
crmd.1087.b3e11b2e wants to fence (on) 'server1in1' with device '(any)'
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: 
initiate_remote_stonith_op: Initiating remote operation on for 
server1in1: e54b60cf-87a3-403f-9061-a4cac2fa7d0d (0)
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: 
can_fence_host_with_device: scsi_server can fence (on) server2in1: 
static-list
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: 
can_fence_host_with_device: scsi_server can fence (on) server2in1: 
static-list
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: log_operation: 
Operation 'on' [13198] (call 22 from crmd.1087) for host 'server2in1' 
with device 'scsi_server' returned: 0 (OK)
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: remote_op_done: 
Operation on of server2in1 by <no-one> for crmd.1087 at server2in1.fd8b714f: OK
Jun 16 10:11:37 server2 crmd[1087]: notice: tengine_stonith_callback: 
Stonith operation 22/4:91:0:ee8fb283-e55b-40f1-ae89-83b84c76efac: OK (0)
Jun 16 10:11:37 server2 crmd[1087]: notice: tengine_stonith_notify: 
server2in1 was successfully unfenced by <anyone> (at the request of 
server2in1)
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: remote_op_done: 
Operation on of server1in1 by <no-one> for crmd.1087 at server2in1.e54b60cf: OK
Jun 16 10:11:37 server2 crmd[1087]: notice: tengine_stonith_callback: 
Stonith operation 23/3:91:0:ee8fb283-e55b-40f1-ae89-83b84c76efac: OK (0)
Jun 16 10:11:37 server2 crmd[1087]: notice: tengine_stonith_notify: 
server1in1 was successfully unfenced by <anyone> (at the request of 
server2in1)

##################
Verify is OK

[root at server1 ~]# crm_verify -L
[root at server1 ~]#

##################
After resume one key is deleted
[root at server1 ~]# sg_persist --in --no-inquiry --read-key 
--device=/dev/disk/by-id/wwn-0x6001405516563e3d75b5d3cceda0a1dc
   PR generation=0x3, 1 registered reservation key follows:
     0xfe290001

##################
Log yes > /dev/null extreme system load

Jun 16 10:45:39 server2 corosync[959]: [QUORUM] Members[1]: 2
Jun 16 10:45:39 server2 corosync[959]: [MAIN  ] Completed service 
synchronization, ready to provide service.
Jun 16 10:45:41 server2 corosync[959]: [TOTEM ] A new membership 
(192.168.200.131:1668) was formed. Members joined: 1
Jun 16 10:45:41 server2 corosync[959]: [QUORUM] Members[2]: 2 1
Jun 16 10:45:41 server2 corosync[959]: [MAIN  ] Completed service 
synchronization, ready to provide service.
Jun 16 10:45:41 server2 crmd[1087]: notice: crm_update_peer_state: 
pcmk_quorum_notification: Node server1in1[1] - state is now member (was 
lost)
Jun 16 10:45:41 server2 pacemakerd[1081]: notice: crm_update_peer_state: 
pcmk_quorum_notification: Node server1in1[1] - state is now member (was 
lost)

########################
pcs status "whilte true.. yes > /dev/null&"  extreme system load

[root at server1 ~]# pcs status
Error: cluster is not currently running on this node

[root at server2 ~]# pcs status | grep server1in1
Node server1in1 (1): pending
   server1in1: Online

########################
Full log

Jun 16 10:31:04 server2 corosync[959]: [TOTEM ] A processor failed, 
forming new configuration.
Jun 16 10:31:05 server2 corosync[959]: [TOTEM ] A new membership 
(192.168.200.131:1184) was formed. Members left: 1
Jun 16 10:31:05 server2 corosync[959]: [QUORUM] Members[1]: 2
Jun 16 10:31:05 server2 corosync[959]: [MAIN  ] Completed service 
synchronization, ready to provide service.
Jun 16 10:31:05 server2 attrd[1085]: notice: crm_update_peer_state: 
attrd_peer_change_cb: Node server1in1[1] - state is now lost (was member)
Jun 16 10:31:05 server2 attrd[1085]: notice: attrd_peer_remove: Removing 
all server1in1 attributes for attrd_peer_change_cb
Jun 16 10:31:05 server2 pacemakerd[1081]: notice: crm_update_peer_state: 
pcmk_quorum_notification: Node server1in1[1] - state is now lost (was 
member)
Jun 16 10:31:05 server2 crmd[1087]: notice: crm_update_peer_state: 
pcmk_quorum_notification: Node server1in1[1] - state is now lost (was 
member)
Jun 16 10:31:05 server2 crmd[1087]: warning: match_down_event: No match 
for shutdown action on 1
Jun 16 10:31:05 server2 crmd[1087]: notice: peer_update_callback: 
Stonith/shutdown of server1in1 not matched
Jun 16 10:31:05 server2 crmd[1087]: notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jun 16 10:31:05 server2 crmd[1087]: warning: match_down_event: No match 
for shutdown action on 1
Jun 16 10:31:05 server2 crmd[1087]: notice: peer_update_callback: 
Stonith/shutdown of server1in1 not matched
Jun 16 10:31:06 server2 pengine[1086]: notice: unpack_config: On loss of 
CCM Quorum: Ignore
Jun 16 10:31:06 server2 pengine[1086]: warning: pe_fence_node: Node 
server1in1 will be fenced because the node is no longer part of the cluster
Jun 16 10:31:06 server2 pengine[1086]: warning: determine_online_status: 
Node server1in1 is unclean
Jun 16 10:31:06 server2 pengine[1086]: warning: custom_action: Action 
ClusterIP2_stop_0 on server1in1 is unrunnable (offline)
Jun 16 10:31:06 server2 pengine[1086]: warning: custom_action: Action 
www2_mnt_stop_0 on server1in1 is unrunnable (offline)
Jun 16 10:31:06 server2 pengine[1086]: warning: stage6: Scheduling Node 
server1in1 for STONITH
Jun 16 10:31:06 server2 pengine[1086]: notice: LogActions: Move 
ClusterIP2    (Started server1in1 -> server2in1)
Jun 16 10:31:06 server2 pengine[1086]: notice: LogActions: Move 
www2_mnt    (Started server1in1 -> server2in1)
Jun 16 10:31:06 server2 pengine[1086]: warning: process_pe_message: 
Calculated Transition 93: /var/lib/pacemaker/pengine/pe-warn-53.bz2
Jun 16 10:31:06 server2 crmd[1087]: notice: te_fence_node: Executing 
reboot fencing operation (20) on server1in1 (timeout=60000)
Jun 16 10:31:06 server2 stonith-ng[1083]: notice: handle_request: Client 
crmd.1087.b3e11b2e wants to fence (reboot) 'server1in1' with device '(any)'
Jun 16 10:31:06 server2 stonith-ng[1083]: notice: 
initiate_remote_stonith_op: Initiating remote operation reboot for 
server1in1: 910d86d6-a53c-4d14-8b66-3e8ef2043bbf (0)
Jun 16 10:31:06 server2 stonith-ng[1083]: notice: 
can_fence_host_with_device: scsi_server can fence (reboot) server1in1: 
static-list
Jun 16 10:31:06 server2 stonith-ng[1083]: notice: 
can_fence_host_with_device: scsi_server can fence (reboot) server1in1: 
static-list
Jun 16 10:31:06 server2 stonith-ng[1083]: warning: 
stonith_device_execute: Agent 'fence_scsi' does not advertise support 
for 'reboot', performing 'off' action instead
Jun 16 10:31:07 server2 stonith-ng[1083]: notice: log_operation: 
Operation 'reboot' [13384] (call 24 from crmd.1087) for host 
'server1in1' with device 'scsi_server' returned: 0 (OK)
Jun 16 10:31:07 server2 stonith-ng[1083]: notice: remote_op_done: 
Operation reboot of server1in1 by <no-one> for 
crmd.1087 at server2in1.910d86d6: OK
Jun 16 10:31:07 server2 crmd[1087]: notice: tengine_stonith_callback: 
Stonith operation 24/20:93:0:ee8fb283-e55b-40f1-ae89-83b84c76efac: OK (0)
Jun 16 10:31:07 server2 crmd[1087]: notice: tengine_stonith_notify: Peer 
server1in1 was terminated (reboot) by <anyone> for server2in1: OK 
(ref=910d86d6-a53c-4d14-8b66-3e8ef2043bbf) by client crmd.1087
Jun 16 10:31:07 server2 crmd[1087]: notice: te_rsc_command: Initiating 
action 8: start ClusterIP2_start_0 on server2in1 (local)
Jun 16 10:31:07 server2 crmd[1087]: notice: abort_transition_graph: 
Transition aborted by deletion of lrm[@id='1']: Resource state removal 
(cib=0.320.17, source=te_update_diff:429, 
path=/cib/status/node_state[@id='1']/lrm[@id='1'], 0)
Jun 16 10:31:07 server2 IPaddr2(ClusterIP2)[13411]: INFO: Adding inet 
address 192.168.122.112/32 with broadcast address 192.168.122.255 to 
device 122er
Jun 16 10:31:07 server2 IPaddr2(ClusterIP2)[13411]: INFO: Bringing 
device 122er up
Jun 16 10:31:07 server2 IPaddr2(ClusterIP2)[13411]: INFO: 
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p 
/var/run/resource-agents/send_arp-192.168.122.112 122er 192.168.122.112 
auto not_used not_used
Jun 16 10:31:07 server2 crmd[1087]: notice: process_lrm_event: Operation 
ClusterIP2_start_0: ok (node=server2in1, call=118, rc=0, cib-update=601, 
confirmed=true)
Jun 16 10:31:07 server2 crmd[1087]: notice: run_graph: Transition 93 
(Complete=9, Pending=0, Fired=0, Skipped=4, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-warn-53.bz2): Stopped
Jun 16 10:31:07 server2 pengine[1086]: notice: unpack_config: On loss of 
CCM Quorum: Ignore
Jun 16 10:31:07 server2 pengine[1086]: notice: LogActions: Start 
www2_mnt    (server2in1)
Jun 16 10:31:07 server2 pengine[1086]: notice: process_pe_message: 
Calculated Transition 94: /var/lib/pacemaker/pengine/pe-input-280.bz2
Jun 16 10:31:07 server2 crmd[1087]: notice: te_rsc_command: Initiating 
action 9: monitor ClusterIP2_monitor_30000 on server2in1 (local)
Jun 16 10:31:07 server2 crmd[1087]: notice: te_rsc_command: Initiating 
action 10: start www2_mnt_start_0 on server2in1 (local)
Jun 16 10:31:07 server2 crmd[1087]: notice: process_lrm_event: Operation 
ClusterIP2_monitor_30000: ok (node=server2in1, call=119, rc=0, 
cib-update=603, confirmed=false)
Jun 16 10:31:07 server2 Filesystem(www2_mnt)[13490]: INFO: Running start 
for /dev/disk/by-id/wwn-0x6001405516563e3d75b5d3cceda0a1dc-part1 on 
/var/www2
Jun 16 10:31:07 server2 kernel: EXT4-fs (sda1): recovery complete
Jun 16 10:31:07 server2 kernel: EXT4-fs (sda1): mounted filesystem with 
ordered data mode. Opts: (null)
Jun 16 10:31:07 server2 crmd[1087]: notice: process_lrm_event: Operation 
www2_mnt_start_0: ok (node=server2in1, call=120, rc=0, cib-update=604, 
confirmed=true)
Jun 16 10:31:07 server2 crmd[1087]: notice: te_rsc_command: Initiating 
action 11: monitor www2_mnt_monitor_20000 on server2in1 (local)
Jun 16 10:31:07 server2 crmd[1087]: notice: process_lrm_event: Operation 
www2_mnt_monitor_20000: ok (node=server2in1, call=121, rc=0, 
cib-update=605, confirmed=false)
Jun 16 10:31:07 server2 crmd[1087]: notice: run_graph: Transition 94 
(Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-280.bz2): Complete
Jun 16 10:31:07 server2 crmd[1087]: notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jun 16 10:31:55 server2 corosync[959]: [TOTEM ] A new membership 
(192.168.200.131:1188) was formed. Members joined: 1
Jun 16 10:31:55 server2 attrd[1085]: notice: crm_update_peer_state: 
attrd_peer_change_cb: Node server1in1[1] - state is now member (was lost)
Jun 16 10:31:55 server2 crmd[1087]: error: pcmk_cpg_membership: Node 
server1in1[1] appears to be online even though we think it is dead
Jun 16 10:31:55 server2 crmd[1087]: notice: crm_update_peer_state: 
pcmk_cpg_membership: Node server1in1[1] - state is now member (was lost)
Jun 16 10:31:55 server2 corosync[959]: [QUORUM] Members[2]: 2 1
Jun 16 10:31:55 server2 corosync[959]: [MAIN  ] Completed service 
synchronization, ready to provide service.
Jun 16 10:31:55 server2 pacemakerd[1081]: notice: crm_update_peer_state: 
pcmk_quorum_notification: Node server1in1[1] - state is now member (was 
lost)
Jun 16 10:31:55 server2 crmd[1087]: notice: do_state_transition: State 
transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL 
origin=do_election_count_vote ]
Jun 16 10:31:56 server2 crmd[1087]: notice: do_state_transition: State 
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC 
cause=C_TIMER_POPPED origin=election_timeout_popped ]
Jun 16 10:31:56 server2 crmd[1087]: warning: do_log: FSA: Input 
I_ELECTION_DC from do_election_check() received in state S_INTEGRATION
Jun 16 10:31:58 server2 pengine[1086]: notice: unpack_config: On loss of 
CCM Quorum: Ignore
Jun 16 10:31:58 server2 pengine[1086]: error: native_create_actions: 
Resource ClusterIP2 (ocf::IPaddr2) is active on 2 nodes attempting recovery
Jun 16 10:31:58 server2 pengine[1086]: warning: native_create_actions: 
See http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more 
information.
Jun 16 10:31:58 server2 pengine[1086]: error: native_create_actions: 
Resource www2_mnt (ocf::Filesystem) is active on 2 nodes attempting recovery
Jun 16 10:31:58 server2 pengine[1086]: warning: native_create_actions: 
See http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more 
information.
Jun 16 10:31:58 server2 pengine[1086]: notice: LogActions: Restart 
ClusterIP2    (Started server1in1)
Jun 16 10:31:58 server2 pengine[1086]: notice: LogActions: Restart 
www2_mnt    (Started server1in1)
Jun 16 10:31:58 server2 pengine[1086]: error: process_pe_message: 
Calculated Transition 95: /var/lib/pacemaker/pengine/pe-error-3.bz2
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating 
action 16: stop www2_mnt_stop_0 on server2in1 (local)
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating 
action 15: stop www2_mnt_stop_0 on server1in1
Jun 16 10:31:58 server2 Filesystem(www2_mnt)[13787]: INFO: Running stop 
for /dev/disk/by-id/wwn-0x6001405516563e3d75b5d3cceda0a1dc-part1 on 
/var/www2
Jun 16 10:31:58 server2 Filesystem(www2_mnt)[13787]: INFO: Trying to 
unmount /var/www2
Jun 16 10:31:58 server2 Filesystem(www2_mnt)[13787]: INFO: unmounted 
/var/www2 successfully
Jun 16 10:31:58 server2 crmd[1087]: notice: process_lrm_event: Operation 
www2_mnt_stop_0: ok (node=server2in1, call=123, rc=0, cib-update=637, 
confirmed=true)
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating 
action 13: stop ClusterIP2_stop_0 on server2in1 (local)
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating 
action 12: stop ClusterIP2_stop_0 on server1in1
Jun 16 10:31:58 server2 IPaddr2(ClusterIP2)[13866]: INFO: IP status = 
ok, IP_CIP=
Jun 16 10:31:58 server2 crmd[1087]: notice: process_lrm_event: Operation 
ClusterIP2_stop_0: ok (node=server2in1, call=125, rc=0, cib-update=638, 
confirmed=true)
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating 
action 14: start ClusterIP2_start_0 on server1in1
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating 
action 2: monitor ClusterIP2_monitor_30000 on server1in1
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating 
action 17: start www2_mnt_start_0 on server1in1
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating 
action 1: monitor www2_mnt_monitor_20000 on server1in1
Jun 16 10:31:58 server2 crmd[1087]: notice: run_graph: Transition 95 
(Complete=13, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-error-3.bz2): Complete
Jun 16 10:31:58 server2 crmd[1087]: notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
^C
[root at server2 ~]# tail -f /var/log/messages
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating 
action 13: stop ClusterIP2_stop_0 on server2in1 (local)
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating 
action 12: stop ClusterIP2_stop_0 on server1in1
Jun 16 10:31:58 server2 IPaddr2(ClusterIP2)[13866]: INFO: IP status = 
ok, IP_CIP=
Jun 16 10:31:58 server2 crmd[1087]: notice: process_lrm_event: Operation 
ClusterIP2_stop_0: ok (node=server2in1, call=125, rc=0, cib-update=638, 
confirmed=true)
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating 
action 14: start ClusterIP2_start_0 on server1in1
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating 
action 2: monitor ClusterIP2_monitor_30000 on server1in1
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating 
action 17: start www2_mnt_start_0 on server1in1
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating 
action 1: monitor www2_mnt_monitor_20000 on server1in1
Jun 16 10:31:58 server2 crmd[1087]: notice: run_graph: Transition 95 
(Complete=13, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-error-3.bz2): Complete
Jun 16 10:31:58 server2 crmd[1087]: notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]


#####################################################
messages yes > /dev/null&

Jun 16 10:44:37 server2 corosync[959]: [MAIN  ] Completed service 
synchronization, ready to provide service.
Jun 16 10:44:49 server2 corosync[959]: [TOTEM ] A new membership 
(192.168.200.131:1612) was formed. Members
Jun 16 10:44:49 server2 corosync[959]: [QUORUM] Members[1]: 2
Jun 16 10:44:49 server2 corosync[959]: [MAIN  ] Completed service 
synchronization, ready to provide service.
Jun 16 10:45:00 server2 corosync[959]: [TOTEM ] A new membership 
(192.168.200.131:1620) was formed. Members
Jun 16 10:45:00 server2 corosync[959]: [QUORUM] Members[1]: 2
Jun 16 10:45:00 server2 corosync[959]: [MAIN  ] Completed service 
synchronization, ready to provide service.
Jun 16 10:45:12 server2 corosync[959]: [TOTEM ] A new membership 
(192.168.200.131:1628) was formed. Members
Jun 16 10:45:12 server2 corosync[959]: [QUORUM] Members[1]: 2
Jun 16 10:45:12 server2 corosync[959]: [MAIN  ] Completed service 
synchronization, ready to provide service.
Jun 16 10:45:25 server2 corosync[959]: [TOTEM ] A new membership 
(192.168.200.131:1636) was formed. Members
Jun 16 10:45:25 server2 corosync[959]: [QUORUM] Members[1]: 2
Jun 16 10:45:25 server2 corosync[959]: [MAIN  ] Completed service 
synchronization, ready to provide service.
Jun 16 10:45:30 server2 corosync[959]: [TOTEM ] A new membership 
(192.168.200.131:1644) was formed. Members
Jun 16 10:45:30 server2 corosync[959]: [QUORUM] Members[1]: 2
Jun 16 10:45:30 server2 corosync[959]: [MAIN  ] Completed service 
synchronization, ready to provide service.
Jun 16 10:45:35 server2 corosync[959]: [TOTEM ] A new membership 
(192.168.200.131:1652) was formed. Members
Jun 16 10:45:35 server2 corosync[959]: [QUORUM] Members[1]: 2
Jun 16 10:45:35 server2 corosync[959]: [MAIN  ] Completed service 
synchronization, ready to provide service.
Jun 16 10:45:39 server2 corosync[959]: [TOTEM ] A new membership 
(192.168.200.131:1660) was formed. Members
Jun 16 10:45:39 server2 corosync[959]: [QUORUM] Members[1]: 2
Jun 16 10:45:39 server2 corosync[959]: [MAIN  ] Completed service 
synchronization, ready to provide service.
Jun 16 10:45:41 server2 corosync[959]: [TOTEM ] A new membership 
(192.168.200.131:1668) was formed. Members joined: 1
Jun 16 10:45:41 server2 corosync[959]: [QUORUM] Members[2]: 2 1
Jun 16 10:45:41 server2 corosync[959]: [MAIN  ] Completed service 
synchronization, ready to provide service.
Jun 16 10:45:41 server2 crmd[1087]: notice: crm_update_peer_state: 
pcmk_quorum_notification: Node server1in1[1] - state is now member (was 
lost)
Jun 16 10:45:41 server2 pacemakerd[1081]: notice: crm_update_peer_state: 
pcmk_quorum_notification: Node server1in1[1] - state is now member (was 
lost)
Jun 16 10:55:00 server2 crmd[1087]: notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_TIMER_POPPED origin=crm_timer_popped ]
Jun 16 10:55:00 server2 pengine[1086]: notice: unpack_config: On loss of 
CCM Quorum: Ignore
Jun 16 10:55:00 server2 pengine[1086]: warning: custom_action: Action 
ClusterIP2_monitor_0 on server1in1 is unrunnable (pending)
Jun 16 10:55:00 server2 pengine[1086]: warning: custom_action: Action 
www2_mnt_monitor_0 on server1in1 is unrunnable (pending)
Jun 16 10:55:00 server2 pengine[1086]: warning: custom_action: Action 
scsi_server_monitor_0 on server1in1 is unrunnable (pending)
Jun 16 10:55:00 server2 pengine[1086]: notice: trigger_unfencing: 
Unfencing server1in1: node discovery
Jun 16 10:55:00 server2 pengine[1086]: notice: process_pe_message: 
Calculated Transition 101: /var/lib/pacemaker/pengine/pe-input-284.bz2
Jun 16 10:55:00 server2 crmd[1087]: notice: te_fence_node: Executing on 
fencing operation (4) on server1in1 (timeout=60000)
Jun 16 10:55:00 server2 stonith-ng[1083]: notice: handle_request: Client 
crmd.1087.b3e11b2e wants to fence (on) 'server1in1' with device '(any)'
Jun 16 10:55:00 server2 stonith-ng[1083]: notice: 
initiate_remote_stonith_op: Initiating remote operation on for 
server1in1: 3b0b3967-6f33-4b68-9f4d-246b69e0370a (0)
Jun 16 10:55:00 server2 stonith-ng[1083]: notice: stonith_choose_peer: 
Couldn't find anyone to fence server1in1 with <any>
Jun 16 10:55:00 server2 stonith-ng[1083]: error: remote_op_done: 
Operation on of server1in1 by <no-one> for 
crmd.1087 at server2in1.3b0b3967: No such device
Jun 16 10:55:00 server2 crmd[1087]: notice: tengine_stonith_callback: 
Stonith operation 27/4:101:0:ee8fb283-e55b-40f1-ae89-83b84c76efac: No 
such device (-19)
Jun 16 10:55:00 server2 crmd[1087]: notice: tengine_stonith_callback: 
Stonith operation 27 for server1in1 failed (No such device): aborting 
transition.
Jun 16 10:55:00 server2 crmd[1087]: notice: abort_transition_graph: 
Transition aborted: Stonith failed (source=tengine_stonith_callback:697, 0)
Jun 16 10:55:00 server2 crmd[1087]: error: tengine_stonith_notify: 
Unfencing of server1in1 by <anyone> failed: No such device (-19)
Jun 16 10:55:00 server2 crmd[1087]: notice: run_graph: Transition 101 
(Complete=1, Pending=0, Fired=0, Skipped=1, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-284.bz2): Stopped
Jun 16 10:55:00 server2 crmd[1087]: notice: too_many_st_failures: No 
devices found in cluster to fence server1in1, giving up
Jun 16 10:55:00 server2 crmd[1087]: notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]







More information about the Users mailing list