[ClusterLabs] fence_scsi No devices found in cluster to fence
Frank Beckmann
Frank.Beckmann at gmx.com
Tue Jun 16 09:09:25 UTC 2015
Hi
I setup a two node cluster based on kvm (it is a test).
Now I'm trying to setup fence_scsi to prevent split brain. One of my
test scenario is to suspend the kvm instance and to resume it (if I kill
corosync the someone restart it). I see that the other node do a
failover but after I resume the kvm instance the cluster node join again
(the service group switch back :-( ) and the node is not dying (reboot,
cluster framework down etc) :-)
I also create a extreme high system load (while true; do yes >/dev/null&
done), the resources switched to the other node. After I killed the
thounsends of yes processes the pcs service is down but I think that is
not caused by the fence_scsi. In the log I found that "notice:
too_many_st_failures: No devices found in cluster to fence server1in1,
giving up"
What is wrong in my configuration? In need two stonith devices? or what
is wrong?
See below for the logs and other output... Many thanks in advance!
Regards
Frank
I created the stonith resource with this command:
pcs stonith create scsi_server fence_scsi pcmk_host_list="server1in1
server2in1" pcmk_monitor_action="metadata"
devices="/dev/disk/by-id/wwn-0x6001405516563e3d75b5d3cceda0a1dc" meta
provides="unfencing"
I see two keys on my iSCSI device
[root at server1 ~]# sg_persist --in --no-inquiry --read-key
--device=/dev/disk/by-id/wwn-0x6001405516563e3d75b5d3cceda0a1dc
PR generation=0x2, 2 registered reservation keys follow:
0xfe290001
0xfe290000
The current cluster state locks good to me.
Cluster name: nonvcs_cl
Last updated: Tue Jun 16 10:11:41 2015
Last change: Tue Jun 16 10:11:37 2015
Stack: corosync
Current DC: server2in1 (2) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
3 Resources configured
Online: [ server1in1 server2in1 ]
Full list of resources:
Resource Group: servicea_sg
ClusterIP2 (ocf::heartbeat:IPaddr2): Started server1in1
www2_mnt (ocf::heartbeat:Filesystem): Started server1in1
scsi_server (stonith:fence_scsi): Started server2in1
PCSD Status:
server1in1: Online
server2in1: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/disabled
pcsd: active/enabled
##################
messages | grep stonith
Jun 16 10:11:37 server2 stonith-ng[1083]: notice:
stonith_device_register: Added 'scsi_server' to the device list (1
active devices)
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: handle_request: Client
crmd.1087.b3e11b2e wants to fence (on) 'server2in1' with device '(any)'
Jun 16 10:11:37 server2 stonith-ng[1083]: notice:
initiate_remote_stonith_op: Initiating remote operation on for
server2in1: fd8b714f-6ac3-4227-9937-0d4e7b98e454 (0)
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: handle_request: Client
crmd.1087.b3e11b2e wants to fence (on) 'server1in1' with device '(any)'
Jun 16 10:11:37 server2 stonith-ng[1083]: notice:
initiate_remote_stonith_op: Initiating remote operation on for
server1in1: e54b60cf-87a3-403f-9061-a4cac2fa7d0d (0)
Jun 16 10:11:37 server2 stonith-ng[1083]: notice:
can_fence_host_with_device: scsi_server can fence (on) server2in1:
static-list
Jun 16 10:11:37 server2 stonith-ng[1083]: notice:
can_fence_host_with_device: scsi_server can fence (on) server2in1:
static-list
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: log_operation:
Operation 'on' [13198] (call 22 from crmd.1087) for host 'server2in1'
with device 'scsi_server' returned: 0 (OK)
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: remote_op_done:
Operation on of server2in1 by <no-one> for crmd.1087 at server2in1.fd8b714f: OK
Jun 16 10:11:37 server2 crmd[1087]: notice: tengine_stonith_callback:
Stonith operation 22/4:91:0:ee8fb283-e55b-40f1-ae89-83b84c76efac: OK (0)
Jun 16 10:11:37 server2 crmd[1087]: notice: tengine_stonith_notify:
server2in1 was successfully unfenced by <anyone> (at the request of
server2in1)
Jun 16 10:11:37 server2 stonith-ng[1083]: notice: remote_op_done:
Operation on of server1in1 by <no-one> for crmd.1087 at server2in1.e54b60cf: OK
Jun 16 10:11:37 server2 crmd[1087]: notice: tengine_stonith_callback:
Stonith operation 23/3:91:0:ee8fb283-e55b-40f1-ae89-83b84c76efac: OK (0)
Jun 16 10:11:37 server2 crmd[1087]: notice: tengine_stonith_notify:
server1in1 was successfully unfenced by <anyone> (at the request of
server2in1)
##################
Verify is OK
[root at server1 ~]# crm_verify -L
[root at server1 ~]#
##################
After resume one key is deleted
[root at server1 ~]# sg_persist --in --no-inquiry --read-key
--device=/dev/disk/by-id/wwn-0x6001405516563e3d75b5d3cceda0a1dc
PR generation=0x3, 1 registered reservation key follows:
0xfe290001
##################
Log yes > /dev/null extreme system load
Jun 16 10:45:39 server2 corosync[959]: [QUORUM] Members[1]: 2
Jun 16 10:45:39 server2 corosync[959]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 16 10:45:41 server2 corosync[959]: [TOTEM ] A new membership
(192.168.200.131:1668) was formed. Members joined: 1
Jun 16 10:45:41 server2 corosync[959]: [QUORUM] Members[2]: 2 1
Jun 16 10:45:41 server2 corosync[959]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 16 10:45:41 server2 crmd[1087]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node server1in1[1] - state is now member (was
lost)
Jun 16 10:45:41 server2 pacemakerd[1081]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node server1in1[1] - state is now member (was
lost)
########################
pcs status "whilte true.. yes > /dev/null&" extreme system load
[root at server1 ~]# pcs status
Error: cluster is not currently running on this node
[root at server2 ~]# pcs status | grep server1in1
Node server1in1 (1): pending
server1in1: Online
########################
Full log
Jun 16 10:31:04 server2 corosync[959]: [TOTEM ] A processor failed,
forming new configuration.
Jun 16 10:31:05 server2 corosync[959]: [TOTEM ] A new membership
(192.168.200.131:1184) was formed. Members left: 1
Jun 16 10:31:05 server2 corosync[959]: [QUORUM] Members[1]: 2
Jun 16 10:31:05 server2 corosync[959]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 16 10:31:05 server2 attrd[1085]: notice: crm_update_peer_state:
attrd_peer_change_cb: Node server1in1[1] - state is now lost (was member)
Jun 16 10:31:05 server2 attrd[1085]: notice: attrd_peer_remove: Removing
all server1in1 attributes for attrd_peer_change_cb
Jun 16 10:31:05 server2 pacemakerd[1081]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node server1in1[1] - state is now lost (was
member)
Jun 16 10:31:05 server2 crmd[1087]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node server1in1[1] - state is now lost (was
member)
Jun 16 10:31:05 server2 crmd[1087]: warning: match_down_event: No match
for shutdown action on 1
Jun 16 10:31:05 server2 crmd[1087]: notice: peer_update_callback:
Stonith/shutdown of server1in1 not matched
Jun 16 10:31:05 server2 crmd[1087]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jun 16 10:31:05 server2 crmd[1087]: warning: match_down_event: No match
for shutdown action on 1
Jun 16 10:31:05 server2 crmd[1087]: notice: peer_update_callback:
Stonith/shutdown of server1in1 not matched
Jun 16 10:31:06 server2 pengine[1086]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Jun 16 10:31:06 server2 pengine[1086]: warning: pe_fence_node: Node
server1in1 will be fenced because the node is no longer part of the cluster
Jun 16 10:31:06 server2 pengine[1086]: warning: determine_online_status:
Node server1in1 is unclean
Jun 16 10:31:06 server2 pengine[1086]: warning: custom_action: Action
ClusterIP2_stop_0 on server1in1 is unrunnable (offline)
Jun 16 10:31:06 server2 pengine[1086]: warning: custom_action: Action
www2_mnt_stop_0 on server1in1 is unrunnable (offline)
Jun 16 10:31:06 server2 pengine[1086]: warning: stage6: Scheduling Node
server1in1 for STONITH
Jun 16 10:31:06 server2 pengine[1086]: notice: LogActions: Move
ClusterIP2 (Started server1in1 -> server2in1)
Jun 16 10:31:06 server2 pengine[1086]: notice: LogActions: Move
www2_mnt (Started server1in1 -> server2in1)
Jun 16 10:31:06 server2 pengine[1086]: warning: process_pe_message:
Calculated Transition 93: /var/lib/pacemaker/pengine/pe-warn-53.bz2
Jun 16 10:31:06 server2 crmd[1087]: notice: te_fence_node: Executing
reboot fencing operation (20) on server1in1 (timeout=60000)
Jun 16 10:31:06 server2 stonith-ng[1083]: notice: handle_request: Client
crmd.1087.b3e11b2e wants to fence (reboot) 'server1in1' with device '(any)'
Jun 16 10:31:06 server2 stonith-ng[1083]: notice:
initiate_remote_stonith_op: Initiating remote operation reboot for
server1in1: 910d86d6-a53c-4d14-8b66-3e8ef2043bbf (0)
Jun 16 10:31:06 server2 stonith-ng[1083]: notice:
can_fence_host_with_device: scsi_server can fence (reboot) server1in1:
static-list
Jun 16 10:31:06 server2 stonith-ng[1083]: notice:
can_fence_host_with_device: scsi_server can fence (reboot) server1in1:
static-list
Jun 16 10:31:06 server2 stonith-ng[1083]: warning:
stonith_device_execute: Agent 'fence_scsi' does not advertise support
for 'reboot', performing 'off' action instead
Jun 16 10:31:07 server2 stonith-ng[1083]: notice: log_operation:
Operation 'reboot' [13384] (call 24 from crmd.1087) for host
'server1in1' with device 'scsi_server' returned: 0 (OK)
Jun 16 10:31:07 server2 stonith-ng[1083]: notice: remote_op_done:
Operation reboot of server1in1 by <no-one> for
crmd.1087 at server2in1.910d86d6: OK
Jun 16 10:31:07 server2 crmd[1087]: notice: tengine_stonith_callback:
Stonith operation 24/20:93:0:ee8fb283-e55b-40f1-ae89-83b84c76efac: OK (0)
Jun 16 10:31:07 server2 crmd[1087]: notice: tengine_stonith_notify: Peer
server1in1 was terminated (reboot) by <anyone> for server2in1: OK
(ref=910d86d6-a53c-4d14-8b66-3e8ef2043bbf) by client crmd.1087
Jun 16 10:31:07 server2 crmd[1087]: notice: te_rsc_command: Initiating
action 8: start ClusterIP2_start_0 on server2in1 (local)
Jun 16 10:31:07 server2 crmd[1087]: notice: abort_transition_graph:
Transition aborted by deletion of lrm[@id='1']: Resource state removal
(cib=0.320.17, source=te_update_diff:429,
path=/cib/status/node_state[@id='1']/lrm[@id='1'], 0)
Jun 16 10:31:07 server2 IPaddr2(ClusterIP2)[13411]: INFO: Adding inet
address 192.168.122.112/32 with broadcast address 192.168.122.255 to
device 122er
Jun 16 10:31:07 server2 IPaddr2(ClusterIP2)[13411]: INFO: Bringing
device 122er up
Jun 16 10:31:07 server2 IPaddr2(ClusterIP2)[13411]: INFO:
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
/var/run/resource-agents/send_arp-192.168.122.112 122er 192.168.122.112
auto not_used not_used
Jun 16 10:31:07 server2 crmd[1087]: notice: process_lrm_event: Operation
ClusterIP2_start_0: ok (node=server2in1, call=118, rc=0, cib-update=601,
confirmed=true)
Jun 16 10:31:07 server2 crmd[1087]: notice: run_graph: Transition 93
(Complete=9, Pending=0, Fired=0, Skipped=4, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-53.bz2): Stopped
Jun 16 10:31:07 server2 pengine[1086]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Jun 16 10:31:07 server2 pengine[1086]: notice: LogActions: Start
www2_mnt (server2in1)
Jun 16 10:31:07 server2 pengine[1086]: notice: process_pe_message:
Calculated Transition 94: /var/lib/pacemaker/pengine/pe-input-280.bz2
Jun 16 10:31:07 server2 crmd[1087]: notice: te_rsc_command: Initiating
action 9: monitor ClusterIP2_monitor_30000 on server2in1 (local)
Jun 16 10:31:07 server2 crmd[1087]: notice: te_rsc_command: Initiating
action 10: start www2_mnt_start_0 on server2in1 (local)
Jun 16 10:31:07 server2 crmd[1087]: notice: process_lrm_event: Operation
ClusterIP2_monitor_30000: ok (node=server2in1, call=119, rc=0,
cib-update=603, confirmed=false)
Jun 16 10:31:07 server2 Filesystem(www2_mnt)[13490]: INFO: Running start
for /dev/disk/by-id/wwn-0x6001405516563e3d75b5d3cceda0a1dc-part1 on
/var/www2
Jun 16 10:31:07 server2 kernel: EXT4-fs (sda1): recovery complete
Jun 16 10:31:07 server2 kernel: EXT4-fs (sda1): mounted filesystem with
ordered data mode. Opts: (null)
Jun 16 10:31:07 server2 crmd[1087]: notice: process_lrm_event: Operation
www2_mnt_start_0: ok (node=server2in1, call=120, rc=0, cib-update=604,
confirmed=true)
Jun 16 10:31:07 server2 crmd[1087]: notice: te_rsc_command: Initiating
action 11: monitor www2_mnt_monitor_20000 on server2in1 (local)
Jun 16 10:31:07 server2 crmd[1087]: notice: process_lrm_event: Operation
www2_mnt_monitor_20000: ok (node=server2in1, call=121, rc=0,
cib-update=605, confirmed=false)
Jun 16 10:31:07 server2 crmd[1087]: notice: run_graph: Transition 94
(Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-280.bz2): Complete
Jun 16 10:31:07 server2 crmd[1087]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jun 16 10:31:55 server2 corosync[959]: [TOTEM ] A new membership
(192.168.200.131:1188) was formed. Members joined: 1
Jun 16 10:31:55 server2 attrd[1085]: notice: crm_update_peer_state:
attrd_peer_change_cb: Node server1in1[1] - state is now member (was lost)
Jun 16 10:31:55 server2 crmd[1087]: error: pcmk_cpg_membership: Node
server1in1[1] appears to be online even though we think it is dead
Jun 16 10:31:55 server2 crmd[1087]: notice: crm_update_peer_state:
pcmk_cpg_membership: Node server1in1[1] - state is now member (was lost)
Jun 16 10:31:55 server2 corosync[959]: [QUORUM] Members[2]: 2 1
Jun 16 10:31:55 server2 corosync[959]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 16 10:31:55 server2 pacemakerd[1081]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node server1in1[1] - state is now member (was
lost)
Jun 16 10:31:55 server2 crmd[1087]: notice: do_state_transition: State
transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
origin=do_election_count_vote ]
Jun 16 10:31:56 server2 crmd[1087]: notice: do_state_transition: State
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_TIMER_POPPED origin=election_timeout_popped ]
Jun 16 10:31:56 server2 crmd[1087]: warning: do_log: FSA: Input
I_ELECTION_DC from do_election_check() received in state S_INTEGRATION
Jun 16 10:31:58 server2 pengine[1086]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Jun 16 10:31:58 server2 pengine[1086]: error: native_create_actions:
Resource ClusterIP2 (ocf::IPaddr2) is active on 2 nodes attempting recovery
Jun 16 10:31:58 server2 pengine[1086]: warning: native_create_actions:
See http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more
information.
Jun 16 10:31:58 server2 pengine[1086]: error: native_create_actions:
Resource www2_mnt (ocf::Filesystem) is active on 2 nodes attempting recovery
Jun 16 10:31:58 server2 pengine[1086]: warning: native_create_actions:
See http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more
information.
Jun 16 10:31:58 server2 pengine[1086]: notice: LogActions: Restart
ClusterIP2 (Started server1in1)
Jun 16 10:31:58 server2 pengine[1086]: notice: LogActions: Restart
www2_mnt (Started server1in1)
Jun 16 10:31:58 server2 pengine[1086]: error: process_pe_message:
Calculated Transition 95: /var/lib/pacemaker/pengine/pe-error-3.bz2
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating
action 16: stop www2_mnt_stop_0 on server2in1 (local)
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating
action 15: stop www2_mnt_stop_0 on server1in1
Jun 16 10:31:58 server2 Filesystem(www2_mnt)[13787]: INFO: Running stop
for /dev/disk/by-id/wwn-0x6001405516563e3d75b5d3cceda0a1dc-part1 on
/var/www2
Jun 16 10:31:58 server2 Filesystem(www2_mnt)[13787]: INFO: Trying to
unmount /var/www2
Jun 16 10:31:58 server2 Filesystem(www2_mnt)[13787]: INFO: unmounted
/var/www2 successfully
Jun 16 10:31:58 server2 crmd[1087]: notice: process_lrm_event: Operation
www2_mnt_stop_0: ok (node=server2in1, call=123, rc=0, cib-update=637,
confirmed=true)
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating
action 13: stop ClusterIP2_stop_0 on server2in1 (local)
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating
action 12: stop ClusterIP2_stop_0 on server1in1
Jun 16 10:31:58 server2 IPaddr2(ClusterIP2)[13866]: INFO: IP status =
ok, IP_CIP=
Jun 16 10:31:58 server2 crmd[1087]: notice: process_lrm_event: Operation
ClusterIP2_stop_0: ok (node=server2in1, call=125, rc=0, cib-update=638,
confirmed=true)
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating
action 14: start ClusterIP2_start_0 on server1in1
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating
action 2: monitor ClusterIP2_monitor_30000 on server1in1
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating
action 17: start www2_mnt_start_0 on server1in1
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating
action 1: monitor www2_mnt_monitor_20000 on server1in1
Jun 16 10:31:58 server2 crmd[1087]: notice: run_graph: Transition 95
(Complete=13, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-error-3.bz2): Complete
Jun 16 10:31:58 server2 crmd[1087]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
^C
[root at server2 ~]# tail -f /var/log/messages
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating
action 13: stop ClusterIP2_stop_0 on server2in1 (local)
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating
action 12: stop ClusterIP2_stop_0 on server1in1
Jun 16 10:31:58 server2 IPaddr2(ClusterIP2)[13866]: INFO: IP status =
ok, IP_CIP=
Jun 16 10:31:58 server2 crmd[1087]: notice: process_lrm_event: Operation
ClusterIP2_stop_0: ok (node=server2in1, call=125, rc=0, cib-update=638,
confirmed=true)
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating
action 14: start ClusterIP2_start_0 on server1in1
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating
action 2: monitor ClusterIP2_monitor_30000 on server1in1
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating
action 17: start www2_mnt_start_0 on server1in1
Jun 16 10:31:58 server2 crmd[1087]: notice: te_rsc_command: Initiating
action 1: monitor www2_mnt_monitor_20000 on server1in1
Jun 16 10:31:58 server2 crmd[1087]: notice: run_graph: Transition 95
(Complete=13, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-error-3.bz2): Complete
Jun 16 10:31:58 server2 crmd[1087]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
#####################################################
messages yes > /dev/null&
Jun 16 10:44:37 server2 corosync[959]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 16 10:44:49 server2 corosync[959]: [TOTEM ] A new membership
(192.168.200.131:1612) was formed. Members
Jun 16 10:44:49 server2 corosync[959]: [QUORUM] Members[1]: 2
Jun 16 10:44:49 server2 corosync[959]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 16 10:45:00 server2 corosync[959]: [TOTEM ] A new membership
(192.168.200.131:1620) was formed. Members
Jun 16 10:45:00 server2 corosync[959]: [QUORUM] Members[1]: 2
Jun 16 10:45:00 server2 corosync[959]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 16 10:45:12 server2 corosync[959]: [TOTEM ] A new membership
(192.168.200.131:1628) was formed. Members
Jun 16 10:45:12 server2 corosync[959]: [QUORUM] Members[1]: 2
Jun 16 10:45:12 server2 corosync[959]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 16 10:45:25 server2 corosync[959]: [TOTEM ] A new membership
(192.168.200.131:1636) was formed. Members
Jun 16 10:45:25 server2 corosync[959]: [QUORUM] Members[1]: 2
Jun 16 10:45:25 server2 corosync[959]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 16 10:45:30 server2 corosync[959]: [TOTEM ] A new membership
(192.168.200.131:1644) was formed. Members
Jun 16 10:45:30 server2 corosync[959]: [QUORUM] Members[1]: 2
Jun 16 10:45:30 server2 corosync[959]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 16 10:45:35 server2 corosync[959]: [TOTEM ] A new membership
(192.168.200.131:1652) was formed. Members
Jun 16 10:45:35 server2 corosync[959]: [QUORUM] Members[1]: 2
Jun 16 10:45:35 server2 corosync[959]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 16 10:45:39 server2 corosync[959]: [TOTEM ] A new membership
(192.168.200.131:1660) was formed. Members
Jun 16 10:45:39 server2 corosync[959]: [QUORUM] Members[1]: 2
Jun 16 10:45:39 server2 corosync[959]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 16 10:45:41 server2 corosync[959]: [TOTEM ] A new membership
(192.168.200.131:1668) was formed. Members joined: 1
Jun 16 10:45:41 server2 corosync[959]: [QUORUM] Members[2]: 2 1
Jun 16 10:45:41 server2 corosync[959]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 16 10:45:41 server2 crmd[1087]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node server1in1[1] - state is now member (was
lost)
Jun 16 10:45:41 server2 pacemakerd[1081]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node server1in1[1] - state is now member (was
lost)
Jun 16 10:55:00 server2 crmd[1087]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_TIMER_POPPED origin=crm_timer_popped ]
Jun 16 10:55:00 server2 pengine[1086]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Jun 16 10:55:00 server2 pengine[1086]: warning: custom_action: Action
ClusterIP2_monitor_0 on server1in1 is unrunnable (pending)
Jun 16 10:55:00 server2 pengine[1086]: warning: custom_action: Action
www2_mnt_monitor_0 on server1in1 is unrunnable (pending)
Jun 16 10:55:00 server2 pengine[1086]: warning: custom_action: Action
scsi_server_monitor_0 on server1in1 is unrunnable (pending)
Jun 16 10:55:00 server2 pengine[1086]: notice: trigger_unfencing:
Unfencing server1in1: node discovery
Jun 16 10:55:00 server2 pengine[1086]: notice: process_pe_message:
Calculated Transition 101: /var/lib/pacemaker/pengine/pe-input-284.bz2
Jun 16 10:55:00 server2 crmd[1087]: notice: te_fence_node: Executing on
fencing operation (4) on server1in1 (timeout=60000)
Jun 16 10:55:00 server2 stonith-ng[1083]: notice: handle_request: Client
crmd.1087.b3e11b2e wants to fence (on) 'server1in1' with device '(any)'
Jun 16 10:55:00 server2 stonith-ng[1083]: notice:
initiate_remote_stonith_op: Initiating remote operation on for
server1in1: 3b0b3967-6f33-4b68-9f4d-246b69e0370a (0)
Jun 16 10:55:00 server2 stonith-ng[1083]: notice: stonith_choose_peer:
Couldn't find anyone to fence server1in1 with <any>
Jun 16 10:55:00 server2 stonith-ng[1083]: error: remote_op_done:
Operation on of server1in1 by <no-one> for
crmd.1087 at server2in1.3b0b3967: No such device
Jun 16 10:55:00 server2 crmd[1087]: notice: tengine_stonith_callback:
Stonith operation 27/4:101:0:ee8fb283-e55b-40f1-ae89-83b84c76efac: No
such device (-19)
Jun 16 10:55:00 server2 crmd[1087]: notice: tengine_stonith_callback:
Stonith operation 27 for server1in1 failed (No such device): aborting
transition.
Jun 16 10:55:00 server2 crmd[1087]: notice: abort_transition_graph:
Transition aborted: Stonith failed (source=tengine_stonith_callback:697, 0)
Jun 16 10:55:00 server2 crmd[1087]: error: tengine_stonith_notify:
Unfencing of server1in1 by <anyone> failed: No such device (-19)
Jun 16 10:55:00 server2 crmd[1087]: notice: run_graph: Transition 101
(Complete=1, Pending=0, Fired=0, Skipped=1, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-284.bz2): Stopped
Jun 16 10:55:00 server2 crmd[1087]: notice: too_many_st_failures: No
devices found in cluster to fence server1in1, giving up
Jun 16 10:55:00 server2 crmd[1087]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
More information about the Users
mailing list