[ClusterLabs] VM failure during shutdown
Vaggelis Papastavros
psvaggelis at gmail.com
Mon Jun 25 04:33:14 EDT 2018
Dear friends ,
We have the following configuration :
CentOS7 , pacemaker 0.9.152 and Corosync 2.4.0, storage with DRBD and
stonith eanbled with APC PDU devices.
I have a windows VM configured as cluster resource with the following
attributes :
Resource: WindowSentinelOne_res (class=ocf provider=heartbeat
type=VirtualDomain)
Attributes: hypervisor=qemu:///system
config=/opt/customer_vms/conf/WindowSentinelOne/WindowSentinelOne.xml
migration_transport=ssh
Utilization: cpu=8 hv_memory=8192
Operations: start interval=0s timeout=120s
(WindowSentinelOne_res-start-interval-0s)
stop interval=0s timeout=120s
(WindowSentinelOne_res-stop-interval-0s)
monitor interval=10s timeout=30s
(WindowSentinelOne_res-monitor-interval-10s)
under some circumstances (which i try to identify) the VM fails and
disappears under virsh list --all and also pacemaker reports the VM as
stopped .
If run pcs resource cleanup windows_wm everything is OK, but i can't
identify the reason of failure.
For example when shutdown the VM (with windows shutdown) the cluster
reports the following :
WindowSentinelOne_res (ocf::heartbeat:VirtualDomain): Started sgw-02
(failure ignored)
Failed Actions:
* WindowSentinelOne_res_monitor_10000 on sgw-02 'not running' (7):
call=67, status=complete, exitreason='none',
last-rc-change='Mon Jun 25 07:41:37 2018', queued=0ms, exec=0ms.
My questions are
1) why the VM shutdown is reported as (FailedAction) from cluster ? Its
a worthy operation during VM life cycle .
2) why sometimes the resource is marked as stopped (the VM is healthy)
and needs cleanup ?
3) I can't understand the corosync logs ... during the the VM shutdown
corosync logs is the following
Jun 25 07:41:37 [5140] sgw-02 crmd: info:
process_lrm_event: Result of monitor operation for
WindowSentinelOne_res on sgw-02: 7 (not running) | call=67
key=WindowSentinelOne_res_monitor_10000 confirmed=false cib-update=36
Jun 25 07:41:37 [5130] sgw-02 cib: info:
cib_process_request: Forwarding cib_modify operation for section
status to all (origin=local/crmd/36)
Jun 25 07:41:37 [5130] sgw-02 cib: info: cib_perform_op:
Diff: --- 0.4704.67 2
Jun 25 07:41:37 [5130] sgw-02 cib: info: cib_perform_op:
Diff: +++ 0.4704.68 (null)
Jun 25 07:41:37 [5130] sgw-02 cib: info: cib_perform_op:
+ /cib: @num_updates=68
Jun 25 07:41:37 [5130] sgw-02 cib: info: cib_perform_op:
+ /cib/status/node_state[@id='2']: @crm-debug-origin=do_update_resource
Jun 25 07:41:37 [5130] sgw-02 cib: info: cib_perform_op:
++
/cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='WindowSentinelOne_res']:
<lrm_rsc_op id="WindowSentinelOne_res_last_failure_0"
operation_key="WindowSentinelOne_res_monitor_10000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="84:3:0:f910c793-a714-4e24-80d1-b0ec66275491"
transition-magic="0:7;84:3:0:f910c793-a714-4e24-80d1-b0ec66275491"
on_node="sgw-02" cal
Jun 25 07:41:37 [5130] sgw-02 cib: info:
cib_process_request: Completed cib_modify operation for section
status: OK (rc=0, origin=sgw-02/crmd/36, version=0.4704.68)
Jun 25 07:41:37 [5137] sgw-02 attrd: info:
attrd_peer_update: Setting fail-count-WindowSentinelOne_res[sgw-02]:
(null) -> 1 from sgw-01
Jun 25 07:41:37 [5137] sgw-02 attrd: info: write_attribute:
Sent update 10 with 1 changes for fail-count-WindowSentinelOne_res,
id=<n/a>, set=(null)
Jun 25 07:41:37 [5130] sgw-02 cib: info:
cib_process_request: Forwarding cib_modify operation for section
status to all (origin=local/attrd/10)
Jun 25 07:41:37 [5137] sgw-02 attrd: info:
attrd_peer_update: Setting
last-failure-WindowSentinelOne_res[sgw-02]: (null) -> 1529912497 from sgw-01
Jun 25 07:41:37 [5137] sgw-02 attrd: info: write_attribute:
Sent update 11 with 1 changes for last-failure-WindowSentinelOne_res,
id=<n/a>, set=(null)
Jun 25 07:41:37 [5130] sgw-02 cib: info:
cib_process_request: Forwarding cib_modify operation for section
status to all (origin=local/attrd/11)
Jun 25 07:41:37 [5130] sgw-02 cib: info: cib_perform_op:
Diff: --- 0.4704.68 2
Jun 25 07:41:37 [5130] sgw-02 cib: info: cib_perform_op:
Diff: +++ 0.4704.69 (null)
Jun 25 07:41:37 [5130] sgw-02 cib: info: cib_perform_op:
+ /cib: @num_updates=69
Jun 25 07:41:37 [5130] sgw-02 cib: info: cib_perform_op:
++
/cib/status/node_state[@id='2']/transient_attributes[@id='2']/instance_attributes[@id='status-2']:
<nvpair id="status-2-fail-count-WindowSentinelOne_res"
name="fail-count-WindowSentinelOne_res" value="1"/>
Jun 25 07:41:37 [5130] sgw-02 cib: info:
cib_process_request: Completed cib_modify operation for section
status: OK (rc=0, origin=sgw-02/attrd/10, version=0.4704.69)
Jun 25 07:41:37 [5137] sgw-02 attrd: info:
attrd_cib_callback: Update 10 for fail-count-WindowSentinelOne_res:
OK (0)
Jun 25 07:41:37 [5137] sgw-02 attrd: info:
attrd_cib_callback: Update 10 for
fail-count-WindowSentinelOne_res[sgw-02]=1: OK (0)
Jun 25 07:41:37 [5130] sgw-02 cib: info: cib_perform_op:
Diff: --- 0.4704.69 2
Jun 25 07:41:37 [5130] sgw-02 cib: info: cib_perform_op:
Diff: +++ 0.4704.70 (null)
Jun 25 07:41:37 [5130] sgw-02 cib: info: cib_perform_op:
+ /cib: @num_updates=70
Jun 25 07:41:37 [5130] sgw-02 cib: info: cib_perform_op:
++
/cib/status/node_state[@id='2']/transient_attributes[@id='2']/instance_attributes[@id='status-2']:
<nvpair id="status-2-last-failure-WindowSentinelOne_res"
name="last-failure-WindowSentinelOne_res" value="1529912497"/>
Jun 25 07:41:37 [5130] sgw-02 cib: info:
cib_process_request: Completed cib_modify operation for section
status: OK (rc=0, origin=sgw-02/attrd/11, version=0.4704.70)
Jun 25 07:41:37 [5137] sgw-02 attrd: info:
attrd_cib_callback: Update 11 for last-failure-WindowSentinelOne_res:
OK (0)
Jun 25 07:41:37 [5137] sgw-02 attrd: info:
attrd_cib_callback: Update 11 for
last-failure-WindowSentinelOne_res[sgw-02]=1529912497: OK (0)
Jun 25 07:41:42 [5130] sgw-02 cib: info: cib_process_ping:
Reporting our current digest to sgw-01: 3e27415fcb003ef3373b47ffa6c5f358
for 0.4704.70 (0x7faac1729720 0)
Sincerely ,
Vaggelis Papastavros
More information about the Users
mailing list