[ClusterLabs Developers] When I pull out all heartbeat cables, Active-node and Passive-node are both fenced(reboot) by each other at the same time
zhongbin
zhongbin314 at 163.com
Tue Sep 25 12:38:26 UTC 2018
Hi,
I created Active/Passive Clusters on Debian 6.0.
nodes: linx60147 linx60149
corosync 2.3.4 + pacemaker 1.1.17
crm configure show:
node 3232244115: linx60147 \
attributes standby=off
node 3232244117: linx60149 \
attributes standby=off
primitive rsc-cpu ocf:pacemaker:HealthCPU \
params yellow_limit=60 red_limit=20 \
op monitor interval=30s timeout=3m \
op start interval=0 timeout=3m \
op stop interval=0 timeout=3m \
meta target-role=Started
primitive rsc-vip-public IPaddr \
op monitor interval=30s timeout=3m start-delay=15 \
op start interval=0 timeout=3m \
op stop interval=0 timeout=3m \
params ip=192.168.22.224 cidr_netmask=255.255.255.0 \
meta migration-threshold=10
primitive st-lxha stonith:external/ssh \
params hostlist="linx60147 linx60149" \
meta target-role=Started is-managed=true
group rsc-group rsc-vip-public rsc-cpu \
meta target-role=Started
location rsc-loc1 rsc-group 200: linx60147
location rsc-loc2 rsc-group 100: linx60149
location rsc-loc3 st-lxha 100: linx60147
location rsc-loc4 st-lxha 200: linx60149
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.17-b36b869ca8 \
cluster-infrastructure=corosync \
expected-quorum-votes=2 \
start-failure-is-fatal=false \
stonith-enabled=true \
stonith-action=reboot \
no-quorum-policy=ignore \
last-lrm-refresh=1536225282
When I pull out all heartbeat cables,Active-node and Passive-node are both fenced(reboot) by each other at the same time.
linux60147 corosync.log:
Sep 25 19:34:08 [2198] linx60147 pengine: notice: unpack_config: On loss of CCM Quorum: Ignore
Sep 25 19:34:08 [2198] linx60147 pengine: warning: pe_fence_node: Cluster node linx60149 will be fenced: peer is no longer part of the cluster
Sep 25 19:34:08 [2198] linx60147 pengine: warning: determine_online_status: Node linx60149 is unclean
Sep 25 19:34:08 [2198] linx60147 pengine: info: determine_online_status_fencing: Node linx60147 is active
Sep 25 19:34:08 [2198] linx60147 pengine: info: determine_online_status: Node linx60147 is online
Sep 25 19:34:08 [2198] linx60147 pengine: info: unpack_node_loop: Node 3232244117 is already processed
Sep 25 19:34:08 [2198] linx60147 pengine: info: unpack_node_loop: Node 3232244115 is already processed
Sep 25 19:34:08 [2198] linx60147 pengine: info: unpack_node_loop: Node 3232244117 is already processed
Sep 25 19:34:08 [2198] linx60147 pengine: info: unpack_node_loop: Node 3232244115 is already processed
Sep 25 19:34:08 [2198] linx60147 pengine: info: group_print: Resource Group: rsc-group
Sep 25 19:34:08 [2198] linx60147 pengine: info: common_print: rsc-vip-public (ocf::heartbeat:IPaddr): Started linx60147
Sep 25 19:34:08 [2198] linx60147 pengine: info: common_print: rsc-cpu (ocf::pacemaker:HealthCPU): Started linx60147
Sep 25 19:34:08 [2198] linx60147 pengine: info: common_print: st-lxha (stonith:external/ssh): Started linx60149 (UNCLEAN)
Sep 25 19:34:08 [2198] linx60147 pengine: warning: custom_action: Action st-lxha_stop_0 on linx60149 is unrunnable (offline)
Sep 25 19:34:08 [2198] linx60147 pengine: warning: stage6: Scheduling Node linx60149 for STONITH
Sep 25 19:34:08 [2198] linx60147 pengine: info: native_stop_constraints: st-lxha_stop_0 is implicit after linx60149 is fenced
Sep 25 19:34:08 [2198] linx60147 pengine: notice: LogNodeActions: * Fence linx60149
Sep 25 19:34:08 [2198] linx60147 pengine: info: LogActions: Leave rsc-vip-public (Started linx60147)
Sep 25 19:34:08 [2198] linx60147 pengine: info: LogActions: Leave rsc-cpu (Started linx60147)
Sep 25 19:34:08 [2198] linx60147 pengine: notice: LogActions: Move st-lxha (Started linx60149 -> linx60147)
Sep 25 19:34:08 [2198] linx60147 pengine: warning: process_pe_message: Calculated transition 2 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-64.bz2
Sep 25 19:34:08 [2199] linx60147 crmd: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response
Sep 25 19:34:08 [2199] linx60147 crmd: info: do_te_invoke: Processing graph 2 (ref=pe_calc-dc-1537875248-29) derived from /var/lib/pacemaker/pengine/pe-warn-64.bz2
Sep 25 19:34:08 [2199] linx60147 crmd: notice: te_fence_node: Requesting fencing (reboot) of node linx60149 | action=15 timeout=60000
Sep 25 19:34:08 [2194] linx60147 stonith-ng: notice: handle_request: Client crmd.2199.76b55dfe wants to fence (reboot) 'linx60149' with device '(any)'
Sep 25 19:34:08 [2194] linx60147 stonith-ng: notice: initiate_remote_stonith_op: Requesting peer fencing (reboot) of linx60149 | id=07b318da-0c28-476a-a9f3-d73d7a5142dc state=0
Sep 25 19:34:08 [2199] linx60147 crmd: notice: te_rsc_command: Initiating start operation st-lxha_start_0 locally on linx60147 | action 13
Sep 25 19:34:08 [2199] linx60147 crmd: info: do_lrm_rsc_op: Performing key=13:2:0:05c1e621-d48e-4854-a666-4c664da9e32d op=st-lxha_start_0
Sep 25 19:34:08 [2195] linx60147 lrmd: info: log_execute: executing - rsc:st-lxha action:start call_id:18
Sep 25 19:34:08 [2194] linx60147 stonith-ng: info: dynamic_list_search_cb: Refreshing port list for st-lxha
Sep 25 19:34:08 [2194] linx60147 stonith-ng: info: process_remote_stonith_query: Query result 1 of 1 from linx60147 for linx60149/reboot (1 devices) 07b318da-0c28-476a-a9f3-d73d7a5142dc
Sep 25 19:34:08 [2194] linx60147 stonith-ng: info: process_remote_stonith_query: All query replies have arrived, continuing (1 expected/1 received)
Sep 25 19:34:08 [2194] linx60147 stonith-ng: info: call_remote_stonith: Total timeout set to 60 for peer's fencing of linx60149 for crmd.2199|id=07b318da-0c28-476a-a9f3-d73d7a5142dc
Sep 25 19:34:08 [2194] linx60147 stonith-ng: info: call_remote_stonith: Requesting that 'linx60147' perform op 'linx60149 reboot' for crmd.2199 (72s, 0s)
Sep 25 19:34:08 [2194] linx60147 stonith-ng: notice: can_fence_host_with_device: st-lxha can fence (reboot) linx60149: dynamic-list
Sep 25 19:34:08 [2194] linx60147 stonith-ng: info: stonith_fence_get_devices_cb: Found 1 matching devices for 'linx60149'
Sep 25 19:34:09 [2195] linx60147 lrmd: info: log_finished: finished - rsc:st-lxha action:start call_id:18 exit-code:0 exec-time:1024ms queue-time:0ms
Sep 25 19:34:09 [2199] linx60147 crmd: notice: process_lrm_event: Result of start operation for st-lxha on linx60147: 0 (ok) | call=18 key=st-lxha_start_0 confirmed=true cib-update=51
Sep 25 19:34:09 [2193] linx60147 cib: info: cib_process_request: Forwarding cib_modify operation for section status to all (origin=local/crmd/51)
Sep 25 19:34:09 [2193] linx60147 cib: info: cib_perform_op: Diff: --- 0.102.21 2
Sep 25 19:34:09 [2193] linx60147 cib: info: cib_perform_op: Diff: +++ 0.102.22 (null)
Sep 25 19:34:09 [2193] linx60147 cib: info: cib_perform_op: + /cib: @num_updates=22
Sep 25 19:34:09 [2193] linx60147 cib: info: cib_perform_op: + /cib/status/node_state[@id='3232244115']: @crm-debug-origin=do_update_resource
Sep 25 19:34:09 [2193] linx60147 cib: info: cib_perform_op: + /cib/status/node_state[@id='3232244115']/lrm[@id='3232244115']/lrm_resources/lrm_resource[@id='st-lxha']/lrm_rsc_op[@id='st-lxha_last_0']: @operation_key=st-lxha_start_0, @operation=start, @transition-key=13:2:0:05c1e621-d48e-4854-a666-4c664da9e32d, @transition-magic=0:0;13:2:0:05c1e621-d48e-4854-a666-4c664da9e32d, @call-id=18, @rc-code=0, @last-run=1537875248, @last-rc-change=1537875248, @exec-time=1024
Sep 25 19:34:09 [2193] linx60147 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=linx60147/crmd/51, version=0.102.22)
Sep 25 19:34:09 [2199] linx60147 crmd: info: match_graph_event: Action st-lxha_start_0 (13) confirmed on linx60147 (rc=0)
linux60149 corosync.log:
Sep 25 19:34:07 [2144] linx60149 pengine: notice: unpack_config: On loss of CCM Quorum: Ignore
Sep 25 19:34:07 [2144] linx60149 pengine: info: determine_online_status_fencing: Node linx60149 is active
Sep 25 19:34:07 [2144] linx60149 pengine: info: determine_online_status: Node linx60149 is online
Sep 25 19:34:07 [2144] linx60149 pengine: warning: pe_fence_node: Cluster node linx60147 will be fenced: peer is no longer part of the cluster
Sep 25 19:34:07 [2144] linx60149 pengine: warning: determine_online_status: Node linx60147 is unclean
Sep 25 19:34:07 [2144] linx60149 pengine: info: unpack_node_loop: Node 3232244117 is already processed
Sep 25 19:34:07 [2144] linx60149 pengine: info: unpack_node_loop: Node 3232244115 is already processed
Sep 25 19:34:07 [2144] linx60149 pengine: info: unpack_node_loop: Node 3232244117 is already processed
Sep 25 19:34:07 [2144] linx60149 pengine: info: unpack_node_loop: Node 3232244115 is already processed
Sep 25 19:34:07 [2144] linx60149 pengine: info: group_print: Resource Group: rsc-group
Sep 25 19:34:07 [2144] linx60149 pengine: info: common_print: rsc-vip-public (ocf::heartbeat:IPaddr): Started linx60147 (UNCLEAN)
Sep 25 19:34:07 [2144] linx60149 pengine: info: common_print: rsc-cpu (ocf::pacemaker:HealthCPU): Started linx60147 (UNCLEAN)
Sep 25 19:34:07 [2144] linx60149 pengine: info: common_print: st-lxha (stonith:external/ssh): Started linx60149
Sep 25 19:34:07 [2144] linx60149 pengine: warning: custom_action: Action rsc-vip-public_stop_0 on linx60147 is unrunnable (offline)
Sep 25 19:34:07 [2144] linx60149 pengine: info: RecurringOp: Start recurring monitor (30s) for rsc-vip-public on linx60149
Sep 25 19:34:07 [2144] linx60149 pengine: warning: custom_action: Action rsc-cpu_stop_0 on linx60147 is unrunnable (offline)
Sep 25 19:34:07 [2144] linx60149 pengine: info: RecurringOp: Start recurring monitor (30s) for rsc-cpu on linx60149
Sep 25 19:34:07 [2144] linx60149 pengine: warning: stage6: Scheduling Node linx60147 for STONITH
Sep 25 19:34:07 [2144] linx60149 pengine: info: native_stop_constraints: rsc-vip-public_stop_0 is implicit after linx60147 is fenced
Sep 25 19:34:07 [2144] linx60149 pengine: info: native_stop_constraints: rsc-cpu_stop_0 is implicit after linx60147 is fenced
Sep 25 19:34:07 [2144] linx60149 pengine: notice: LogNodeActions: * Fence linx60147
Sep 25 19:34:07 [2144] linx60149 pengine: notice: LogActions: Move rsc-vip-public (Started linx60147 -> linx60149)
Sep 25 19:34:07 [2144] linx60149 pengine: notice: LogActions: Move rsc-cpu (Started linx60147 -> linx60149)
Sep 25 19:34:07 [2144] linx60149 pengine: info: LogActions: Leave st-lxha (Started linx60149)
Sep 25 19:34:07 [2144] linx60149 pengine: warning: process_pe_message: Calculated transition 0 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-52.bz2
Sep 25 19:34:07 [2145] linx60149 crmd: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response
Sep 25 19:34:07 [2145] linx60149 crmd: info: do_te_invoke: Processing graph 0 (ref=pe_calc-dc-1537875247-15) derived from /var/lib/pacemaker/pengine/pe-warn-52.bz2
Sep 25 19:34:07 [2145] linx60149 crmd: notice: te_fence_node: Requesting fencing (reboot) of node linx60147 | action=15 timeout=60000
Sep 25 19:34:07 [2141] linx60149 stonith-ng: notice: handle_request: Client crmd.2145.321125df wants to fence (reboot) 'linx60147' with device '(any)'
Sep 25 19:34:07 [2141] linx60149 stonith-ng: notice: initiate_remote_stonith_op: Requesting peer fencing (reboot) of linx60147 | id=05d67c3b-8ff2-4e8d-b56f-abb305d3133c state=0
Sep 25 19:34:07 [2141] linx60149 stonith-ng: info: dynamic_list_search_cb: Refreshing port list for st-lxha
Sep 25 19:34:07 [2141] linx60149 stonith-ng: info: process_remote_stonith_query: Query result 1 of 1 from linx60149 for linx60147/reboot (1 devices) 05d67c3b-8ff2-4e8d-b56f-abb305d3133c
Sep 25 19:34:07 [2141] linx60149 stonith-ng: info: call_remote_stonith: Total timeout set to 60 for peer's fencing of linx60147 for crmd.2145|id=05d67c3b-8ff2-4e8d-b56f-abb305d3133c
Sep 25 19:34:07 [2141] linx60149 stonith-ng: info: call_remote_stonith: Requesting that 'linx60149' perform op 'linx60147 reboot' for crmd.2145 (72s, 0s)
Sep 25 19:34:07 [2141] linx60149 stonith-ng: notice: can_fence_host_with_device: st-lxha can fence (reboot) linx60147: dynamic-list
Sep 25 19:34:07 [2141] linx60149 stonith-ng: info: stonith_fence_get_devices_cb: Found 1 matching devices for 'linx60147'
Is this behavior of cluster normal? Or is it configured with errors? How can I avoid it?
Thanks,
zhongbin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/developers/attachments/20180925/72f4bb0a/attachment-0002.html>
More information about the Developers
mailing list