[ClusterLabs] VM failure during shutdown

Vaggelis Papastavros psvaggelis at gmail.com
Mon Jun 25 04:33:14 EDT 2018


Dear friends ,

We have the following configuration :

CentOS7 , pacemaker 0.9.152 and Corosync 2.4.0, storage with DRBD and 
stonith eanbled with APC PDU devices.

I have a windows VM configured as cluster resource with the following 
attributes :

Resource: WindowSentinelOne_res (class=ocf provider=heartbeat 
type=VirtualDomain)
Attributes: hypervisor=qemu:///system 
config=/opt/customer_vms/conf/WindowSentinelOne/WindowSentinelOne.xml 
migration_transport=ssh
Utilization: cpu=8 hv_memory=8192
Operations: start interval=0s timeout=120s 
(WindowSentinelOne_res-start-interval-0s)
                     stop interval=0s timeout=120s 
(WindowSentinelOne_res-stop-interval-0s)
                     monitor interval=10s timeout=30s 
(WindowSentinelOne_res-monitor-interval-10s)

under some circumstances  (which i try to identify) the VM fails and 
disappears under virsh list --all and also pacemaker reports the VM as 
stopped .

If run pcs resource cleanup windows_wm everything is OK, but i can't 
identify the reason of failure.

For example when shutdown the VM (with windows shutdown)  the cluster 
reports the following :

WindowSentinelOne_res    (ocf::heartbeat:VirtualDomain): Started sgw-02 
(failure ignored)

Failed Actions:
* WindowSentinelOne_res_monitor_10000 on sgw-02 'not running' (7): 
call=67, status=complete, exitreason='none',
     last-rc-change='Mon Jun 25 07:41:37 2018', queued=0ms, exec=0ms.


My questions are

1) why the VM shutdown is reported as (FailedAction) from cluster ? Its 
a worthy operation during VM life cycle .

2) why sometimes the resource is marked as stopped (the VM is healthy) 
and needs cleanup ?

3) I can't understand the corosync logs ... during the the VM shutdown 
corosync logs is the following


Jun 25 07:41:37 [5140] sgw-02       crmd:     info: 
process_lrm_event:    Result of monitor operation for 
WindowSentinelOne_res on sgw-02: 7 (not running) | call=67 
key=WindowSentinelOne_res_monitor_10000 confirmed=false cib-update=36
Jun 25 07:41:37 [5130] sgw-02        cib:     info: 
cib_process_request:    Forwarding cib_modify operation for section 
status to all (origin=local/crmd/36)
Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    
Diff: --- 0.4704.67 2
Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    
Diff: +++ 0.4704.68 (null)
Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    
+  /cib:  @num_updates=68
Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    
+  /cib/status/node_state[@id='2']: @crm-debug-origin=do_update_resource
Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    
++ 
/cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='WindowSentinelOne_res']: 
<lrm_rsc_op id="WindowSentinelOne_res_last_failure_0" 
operation_key="WindowSentinelOne_res_monitor_10000" operation="monitor" 
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" 
transition-key="84:3:0:f910c793-a714-4e24-80d1-b0ec66275491" 
transition-magic="0:7;84:3:0:f910c793-a714-4e24-80d1-b0ec66275491" 
on_node="sgw-02" cal
Jun 25 07:41:37 [5130] sgw-02        cib:     info: 
cib_process_request:    Completed cib_modify operation for section 
status: OK (rc=0, origin=sgw-02/crmd/36, version=0.4704.68)
Jun 25 07:41:37 [5137] sgw-02      attrd:     info: 
attrd_peer_update:    Setting fail-count-WindowSentinelOne_res[sgw-02]: 
(null) -> 1 from sgw-01
Jun 25 07:41:37 [5137] sgw-02      attrd:     info: write_attribute:    
Sent update 10 with 1 changes for fail-count-WindowSentinelOne_res, 
id=<n/a>, set=(null)
Jun 25 07:41:37 [5130] sgw-02        cib:     info: 
cib_process_request:    Forwarding cib_modify operation for section 
status to all (origin=local/attrd/10)
Jun 25 07:41:37 [5137] sgw-02      attrd:     info: 
attrd_peer_update:    Setting 
last-failure-WindowSentinelOne_res[sgw-02]: (null) -> 1529912497 from sgw-01
Jun 25 07:41:37 [5137] sgw-02      attrd:     info: write_attribute:    
Sent update 11 with 1 changes for last-failure-WindowSentinelOne_res, 
id=<n/a>, set=(null)
Jun 25 07:41:37 [5130] sgw-02        cib:     info: 
cib_process_request:    Forwarding cib_modify operation for section 
status to all (origin=local/attrd/11)
Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    
Diff: --- 0.4704.68 2
Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    
Diff: +++ 0.4704.69 (null)
Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    
+  /cib:  @num_updates=69
Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    
++ 
/cib/status/node_state[@id='2']/transient_attributes[@id='2']/instance_attributes[@id='status-2']: 
<nvpair id="status-2-fail-count-WindowSentinelOne_res" 
name="fail-count-WindowSentinelOne_res" value="1"/>
Jun 25 07:41:37 [5130] sgw-02        cib:     info: 
cib_process_request:    Completed cib_modify operation for section 
status: OK (rc=0, origin=sgw-02/attrd/10, version=0.4704.69)
Jun 25 07:41:37 [5137] sgw-02      attrd:     info: 
attrd_cib_callback:    Update 10 for fail-count-WindowSentinelOne_res: 
OK (0)
Jun 25 07:41:37 [5137] sgw-02      attrd:     info: 
attrd_cib_callback:    Update 10 for 
fail-count-WindowSentinelOne_res[sgw-02]=1: OK (0)
Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    
Diff: --- 0.4704.69 2
Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    
Diff: +++ 0.4704.70 (null)
Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    
+  /cib:  @num_updates=70
Jun 25 07:41:37 [5130] sgw-02        cib:     info: cib_perform_op:    
++ 
/cib/status/node_state[@id='2']/transient_attributes[@id='2']/instance_attributes[@id='status-2']: 
<nvpair id="status-2-last-failure-WindowSentinelOne_res" 
name="last-failure-WindowSentinelOne_res" value="1529912497"/>
Jun 25 07:41:37 [5130] sgw-02        cib:     info: 
cib_process_request:    Completed cib_modify operation for section 
status: OK (rc=0, origin=sgw-02/attrd/11, version=0.4704.70)
Jun 25 07:41:37 [5137] sgw-02      attrd:     info: 
attrd_cib_callback:    Update 11 for last-failure-WindowSentinelOne_res: 
OK (0)
Jun 25 07:41:37 [5137] sgw-02      attrd:     info: 
attrd_cib_callback:    Update 11 for 
last-failure-WindowSentinelOne_res[sgw-02]=1529912497: OK (0)
Jun 25 07:41:42 [5130] sgw-02        cib:     info: cib_process_ping:    
Reporting our current digest to sgw-01: 3e27415fcb003ef3373b47ffa6c5f358 
for 0.4704.70 (0x7faac1729720 0)

Sincerely ,

Vaggelis Papastavros



More information about the Users mailing list