[ClusterLabs] VM failure during shutdown

Tue Jun 26 12:36:35 EDT 2018

On Tue, 2018-06-26 at 18:24 +0300, Vaggelis Papastavros wrote:
> Many thanks for the excellent answer ,
> Ken after investigation of the log files :
> In our environment we have two drbd partitions one for customer_vms
> and on for sigma_vms 
> For the customer_vms the active node is node2 and for the sigma_vms
> the active node is node1 .
> [root at sgw-01 drbd.d]# drbdadm status
> customer_vms role:Secondary
>   disk:UpToDate
>   sgw-02 role:Primary
>     peer-disk:UpToDate
> 
> sigma_vms role:Primary
>   disk:UpToDate
>   sgw-02 role:Secondary
>     peer-disk:UpToDate
> 
> when i create a new VM i can't force the resource creation to take
> place on a specific node , the cluster places the resource 
> spontaneously on one of the two nodes (if the node happens to be the
> drbd Primary then is ok, else the pacemaker raise a failure fro the
> node) .
> My solution is the following  :
> pcs resource create windows_VM_res VirtualDomain
> hypervisor="qemu:///system"
> config="/opt/sigma_vms/xml_definitions/windows_VM.xml" 
> (the cluster arbitrarily try to place the above resource on node 2
> who is currently the secondary for the corresponding partition.
> Personally 
> i assume that the VirtualDomain agent should be able to read the
> correct disk location from the xml defintion and then try to find the
> correct drbd node)      
> pcs constraint colocation add windows_VM_res with
> StorageDRBD_SigmaVMs INFINITY
> 
> pcs constraint order start StorageDRBD_SigmaVMs_rers then start
> windows_VM

Two things will help:

One problem is that you are creating the VM, and then later adding
constraints about what the cluster can do with it. Therefore there is a
time in between where the cluster can start it without any constraint.
The solution is to make your changes all at once. Both pcs and crm have
a way to do this; with pcs, it's:

  pcs cluster cib <filename>
  pcs -f <filename> ...whatever command you want...
  ...repeat...
  pcs cluster cib-push --config <filename>

The second problem is that you have an ordering constraint but no
colocation constraint. With your current setup, windows_VM has to start
after the storage, but it doesn't have to start on the same node. You
need a colocation constraint as well, to ensure they start on the same
node.

> 
> pcs resource cleanup windows_VM_res
> After the above steps the VM is located on the correct node and
> everything is ok.
> 
> Is my approach correct ?
> 
> Your opinion would be valuable,
> Sincerely 
> 
> 
> On 06/25/2018 07:15 PM, Ken Gaillot wrote:
> > On Mon, 2018-06-25 at 09:47 -0500, Ken Gaillot wrote:
> > > On Mon, 2018-06-25 at 11:33 +0300, Vaggelis Papastavros wrote:
> > > > Dear friends ,
> > > > 
> > > > We have the following configuration :
> > > > 
> > > > CentOS7 , pacemaker 0.9.152 and Corosync 2.4.0, storage with
> > > > DRBD
> > > > and 
> > > > stonith eanbled with APC PDU devices.
> > > > 
> > > > I have a windows VM configured as cluster resource with the
> > > > following 
> > > > attributes :
> > > > 
> > > > Resource: WindowSentinelOne_res (class=ocf provider=heartbeat 
> > > > type=VirtualDomain)
> > > > Attributes: hypervisor=qemu:///system 
> > > > config=/opt/customer_vms/conf/WindowSentinelOne/WindowSentinelO
> > > > ne.x
> > > > ml
> > > >  
> > > > migration_transport=ssh
> > > > Utilization: cpu=8 hv_memory=8192
> > > > Operations: start interval=0s timeout=120s 
> > > > (WindowSentinelOne_res-start-interval-0s)
> > > >                      stop interval=0s timeout=120s 
> > > > (WindowSentinelOne_res-stop-interval-0s)
> > > >                      monitor interval=10s timeout=30s 
> > > > (WindowSentinelOne_res-monitor-interval-10s)
> > > > 
> > > > under some circumstances  (which i try to identify) the VM
> > > > fails
> > > > and 
> > > > disappears under virsh list --all and also pacemaker reports
> > > > the VM
> > > > as 
> > > > stopped .
> > > > 
> > > > If run pcs resource cleanup windows_wm everything is OK, but i
> > > > can't 
> > > > identify the reason of failure.
> > > > 
> > > > For example when shutdown the VM (with windows shutdown)  the
> > > > cluster 
> > > > reports the following :
> > > > 
> > > > WindowSentinelOne_res    (ocf::heartbeat:VirtualDomain):
> > > > Started
> > > > sgw-
> > > > 02 
> > > > (failure ignored)
> > > > 
> > > > Failed Actions:
> > > > * WindowSentinelOne_res_monitor_10000 on sgw-02 'not running'
> > > > (7): 
> > > > call=67, status=complete, exitreason='none',
> > > >      last-rc-change='Mon Jun 25 07:41:37 2018', queued=0ms,
> > > > exec=0ms.
> > > > 
> > > > 
> > > > My questions are
> > > > 
> > > > 1) why the VM shutdown is reported as (FailedAction) from
> > > > cluster ?
> > > > Its 
> > > > a worthy operation during VM life cycle .
> > > 
> > > Pacemaker has no way of knowing that the VM was intentionally
> > > shut
> > > down, vs crashed.
> > > 
> > > When some resource is managed by the cluster, all starts and
> > > stops of
> > > the resource have to go through the cluster. You can either set
> > > target-
> > > role=Stopped in the resource configuration, or if it's a
> > > temporary
> > > issue (e.g. rebooting for some OS updates), you could set is-
> > > managed=false to take it out of cluster control, do the work,
> > > then
> > > set
> > > is-managed=true again.
> > 
> > Also, a nice feature is that you can use rules to set a maintenance
> > window ahead of time (especially helpful if the person who
> > maintains
> > the cluster isn't the same person who needs to do the VM
> > updates). For
> > example, you could set a rule that the resource's is-managed option
> > will be false from 9pm to midnight on Fridays. See:
> > 
> > http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-singl
> > e/Pa
> > cemaker_Explained/index.html#idm140583511697312
> > 
> > particularly the parts about time/date expressions and using rules
> > to
> > control resource options.
> > 
> > > > 2) why sometimes the resource is marked as stopped (the VM is
> > > > healthy) 
> > > > and needs cleanup ?
> > > 
> > > That's a problem. If the VM is truly healthy, it sounds like
> > > there's
> > > an
> > > issue with the resource agent. You'd have to look at the logs to
> > > see
> > > if
> > > it gave any more information (e.g. if it's a timeout, raising the
> > > timeout might be sufficient).
> > > 
> > > > 3) I can't understand the corosync logs ... during the the VM
> > > > shutdown 
> > > > corosync logs is the following
> > > 
> > > FYI, the system log will have the most important messages.
> > > corosync.log
> > > will additionally have info-level messages -- potentially helpful
> > > but
> > > definitely difficult to follow.
> > > 
> > > > Jun 25 07:41:37 [5140] sgw-02       crmd:     info: 
> > > > process_lrm_event:    Result of monitor operation for 
> > > > WindowSentinelOne_res on sgw-02: 7 (not running) | call=67 
> > > > key=WindowSentinelOne_res_monitor_10000 confirmed=false cib-
> > > > update=36
> > > 
> > > This is really the only important message. It says that a
> > > recurring
> > > monitor on the WindowSentinelOne_res resource on node sgw-02
> > > exited
> > > with status code 7 (which means the resource agent thinks the
> > > resource
> > > is not running).
> > > 
> > > 'key=WindowSentinelOne_res_monitor_10000' is how pacemaker
> > > identifies
> > > resource agent actions. The format is <resource-name>_<action-
> > > name>_<action-interval-in-milliseconds>
> > > 
> > > This is the only information Pacemaker will get from the resource
> > > agent. To investigate more deeply, you'll have to check for log
> > > messages from the agent itself.
> > > 
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info: 
> > > > cib_process_request:    Forwarding cib_modify operation for
> > > > section 
> > > > status to all (origin=local/crmd/36)
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info:
> > > > cib_perform_op:    
> > > > Diff: --- 0.4704.67 2
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info:
> > > > cib_perform_op:    
> > > > Diff: +++ 0.4704.68 (null)
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info:
> > > > cib_perform_op:    
> > > > +  /cib:  @num_updates=68
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info:
> > > > cib_perform_op:    
> > > > +  /cib/status/node_state[@id='2']: @crm-debug-
> > > > origin=do_update_resource
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info:
> > > > cib_perform_op:    
> > > > ++ 
> > > > /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_
> > > > reso
> > > > ur
> > > > ce[@id='WindowSentinelOne_res']: 
> > > > <lrm_rsc_op id="WindowSentinelOne_res_last_failure_0" 
> > > > operation_key="WindowSentinelOne_res_monitor_10000"
> > > > operation="monitor" 
> > > > crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" 
> > > > transition-key="84:3:0:f910c793-a714-4e24-80d1-b0ec66275491" 
> > > > transition-magic="0:7;84:3:0:f910c793-a714-4e24-80d1-
> > > > b0ec66275491" 
> > > > on_node="sgw-02" cal
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info: 
> > > > cib_process_request:    Completed cib_modify operation for
> > > > section 
> > > > status: OK (rc=0, origin=sgw-02/crmd/36, version=0.4704.68)
> > > 
> > > You can usually ignore the 'cib' messages. This just means
> > > Pacemaker
> > > recorded the result on disk.
> > > 
> > > > Jun 25 07:41:37 [5137] sgw-02      attrd:     info: 
> > > > attrd_peer_update:    Setting fail-count-
> > > > WindowSentinelOne_res[sgw-
> > > > 02]: 
> > > > (null) -> 1 from sgw-01
> > > 
> > > Since the cluster expected the resource to be running, this
> > > result is
> > > a
> > > failure. Failures are counted using special node attributes that
> > > start
> > > with "fail-count-". This is what Pacemaker uses to determine if a
> > > resource has reached its migration-threshold.
> > > 
> > > > Jun 25 07:41:37 [5137] sgw-02      attrd:     info:
> > > > write_attribute:    
> > > > Sent update 10 with 1 changes for fail-count-
> > > > WindowSentinelOne_res, 
> > > > id=<n/a>, set=(null)
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info: 
> > > > cib_process_request:    Forwarding cib_modify operation for
> > > > section 
> > > > status to all (origin=local/attrd/10)
> > > > Jun 25 07:41:37 [5137] sgw-02      attrd:     info: 
> > > > attrd_peer_update:    Setting 
> > > > last-failure-WindowSentinelOne_res[sgw-02]: (null) ->
> > > > 1529912497
> > > > from
> > > > sgw-01
> > > 
> > > Similarly, the time the failure occurred is stored in a 'last-
> > > failure-' 
> > > node attribute, which Pacemaker uses to determine if a resource
> > > has
> > > reached its failure-timeout.
> > > 
> > > > Jun 25 07:41:37 [5137] sgw-02      attrd:     info:
> > > > write_attribute:    
> > > > Sent update 11 with 1 changes for last-failure-
> > > > WindowSentinelOne_res, 
> > > > id=<n/a>, set=(null)
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info: 
> > > > cib_process_request:    Forwarding cib_modify operation for
> > > > section 
> > > > status to all (origin=local/attrd/11)
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info:
> > > > cib_perform_op:    
> > > > Diff: --- 0.4704.68 2
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info:
> > > > cib_perform_op:    
> > > > Diff: +++ 0.4704.69 (null)
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info:
> > > > cib_perform_op:    
> > > > +  /cib:  @num_updates=69
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info:
> > > > cib_perform_op:    
> > > > ++ 
> > > > /cib/status/node_state[@id='2']/transient_attributes[@id='2']/i
> > > > nsta
> > > > nc
> > > > e_attributes[@id='status-2']: 
> > > > <nvpair id="status-2-fail-count-WindowSentinelOne_res" 
> > > > name="fail-count-WindowSentinelOne_res" value="1"/>
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info: 
> > > > cib_process_request:    Completed cib_modify operation for
> > > > section 
> > > > status: OK (rc=0, origin=sgw-02/attrd/10, version=0.4704.69)
> > > > Jun 25 07:41:37 [5137] sgw-02      attrd:     info: 
> > > > attrd_cib_callback:    Update 10 for fail-count-
> > > > WindowSentinelOne_res: 
> > > > OK (0)
> > > > Jun 25 07:41:37 [5137] sgw-02      attrd:     info: 
> > > > attrd_cib_callback:    Update 10 for 
> > > > fail-count-WindowSentinelOne_res[sgw-02]=1: OK (0)
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info:
> > > > cib_perform_op:    
> > > > Diff: --- 0.4704.69 2
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info:
> > > > cib_perform_op:    
> > > > Diff: +++ 0.4704.70 (null)
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info:
> > > > cib_perform_op:    
> > > > +  /cib:  @num_updates=70
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info:
> > > > cib_perform_op:    
> > > > ++ 
> > > > /cib/status/node_state[@id='2']/transient_attributes[@id='2']/i
> > > > nsta
> > > > nc
> > > > e_attributes[@id='status-2']: 
> > > > <nvpair id="status-2-last-failure-WindowSentinelOne_res" 
> > > > name="last-failure-WindowSentinelOne_res" value="1529912497"/>
> > > > Jun 25 07:41:37 [5130] sgw-02        cib:     info: 
> > > > cib_process_request:    Completed cib_modify operation for
> > > > section 
> > > > status: OK (rc=0, origin=sgw-02/attrd/11, version=0.4704.70)
> > > > Jun 25 07:41:37 [5137] sgw-02      attrd:     info: 
> > > > attrd_cib_callback:    Update 11 for last-failure-
> > > > WindowSentinelOne_res: 
> > > > OK (0)
> > > > Jun 25 07:41:37 [5137] sgw-02      attrd:     info: 
> > > > attrd_cib_callback:    Update 11 for 
> > > > last-failure-WindowSentinelOne_res[sgw-02]=1529912497: OK (0)
> > > > Jun 25 07:41:42 [5130] sgw-02        cib:     info:
> > > > cib_process_ping:    
> > > > Reporting our current digest to sgw-01:
> > > > 3e27415fcb003ef3373b47ffa6c5f358 
> > > > for 0.4704.70 (0x7faac1729720 0)
> > > > 
> > > > Sincerely ,
> > > > 
> > > > Vaggelis Papastavros
>  
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org
-- 
Ken Gaillot <kgaillot at redhat.com>