[ClusterLabs] VM failure during shutdown

Vaggelis Papastavros psvaggelis at gmail.com
Wed Jun 27 04:09:35 EDT 2018


Many Thanks for your brilliant answers ,

Ken your suggestion :

"The second problem is that you have an ordering constraint but no 
colocation constraint. With your current setup, windows_VM has to start 
after the storage, but it doesn't have to start on the same node. You 
need a colocation constraint as well, to ensure they start on the same 
node."

*for the storage i have the following complete steps*

pcs resource create ProcDRBD_SigmaVMs ocf:linbit:drbd 
drbd_resource=sigma_vms drbdconf=/etc/drbd.conf op monitor interval=10s

pcs resource master clone_ProcDRBD_SigmaVMs ProcDRBD_SigmaVMs 
master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

pcs resource create StorageDRBD_SigmaVMs Filesystem device="/dev/drbd1" 
directory="/opt/sigma_vms/" fstype="ext4"

pcs constraint location clone_ProcDRBD_SigmaVMs prefers sgw-01

pcs constraint colocation add StorageDRBD_SigmaVMs with 
clone_ProcDRBD_SigmaVMs INFINITY with-rsc-role=Master

pcs constraint order promote clone_ProcDRBD_SigmaVMs then start 
StorageDRBD_SigmaVMs
*
**and when i create the VM*

pcs resource create windows_VM_res VirtualDomain 
hypervisor="qemu:///system" 
config="/opt/sigma_vms/xml_definitions/windows_VM.xml"

pcs constraint colocation add windows_VM_res with StorageDRBD_SigmaVMs 
INFINITY

pcs constraint order start StorageDRBD_SigmaVMs_rers then start windows_VM


*My question Ken is : are the below steps (in red enough) to ensure that 
the new VM will be placed on the node 1 ?*

( storage process prefers node 1 (the primary of drbd) with weight 
INFINITY, windows_VM should be placed with StorageDRBD_SigmaVMs always

and from transitive rule windows_VM should be placed on node1 (assume 
that ---> means prefer) storage --> node1 , windows ---> storage thus 
from transitive rule windows_VM ---> node1

pcs constraint location clone_ProcDRBD_SigmaVMs prefers sgw-01

pcs constraint colocation add windows_VM_res with StorageDRBD_SigmaVMs 
INFINITY

pcs constraint order start StorageDRBD_SigmaVMs_rers then start windows_VM



On 06/26/2018 07:36 PM, Ken Gaillot wrote:
> On Tue, 2018-06-26 at 18:24 +0300, Vaggelis Papastavros wrote:
>> Many thanks for the excellent answer ,
>> Ken after investigation of the log files :
>> In our environment we have two drbd partitions one for customer_vms
>> and on for sigma_vms
>> For the customer_vms the active node is node2 and for the sigma_vms
>> the active node is node1 .
>> [root at sgw-01 drbd.d]# drbdadm status
>> customer_vms role:Secondary
>>    disk:UpToDate
>>    sgw-02 role:Primary
>>      peer-disk:UpToDate
>>
>> sigma_vms role:Primary
>>    disk:UpToDate
>>    sgw-02 role:Secondary
>>      peer-disk:UpToDate
>>
>> when i create a new VM i can't force the resource creation to take
>> place on a specific node , the cluster places the resource
>> spontaneously on one of the two nodes (if the node happens to be the
>> drbd Primary then is ok, else the pacemaker raise a failure fro the
>> node) .
>> My solution is the following  :
>> pcs resource create windows_VM_res VirtualDomain
>> hypervisor="qemu:///system"
>> config="/opt/sigma_vms/xml_definitions/windows_VM.xml"
>> (the cluster arbitrarily try to place the above resource on node 2
>> who is currently the secondary for the corresponding partition.
>> Personally
>> i assume that the VirtualDomain agent should be able to read the
>> correct disk location from the xml defintion and then try to find the
>> correct drbd node)
>> pcs constraint colocation add windows_VM_res with
>> StorageDRBD_SigmaVMs INFINITY
>>
>> pcs constraint order start StorageDRBD_SigmaVMs_rers then start
>> windows_VM
> Two things will help:
>
> One problem is that you are creating the VM, and then later adding
> constraints about what the cluster can do with it. Therefore there is a
> time in between where the cluster can start it without any constraint.
> The solution is to make your changes all at once. Both pcs and crm have
> a way to do this; with pcs, it's:
>
>    pcs cluster cib <filename>
>    pcs -f <filename> ...whatever command you want...
>    ...repeat...
>    pcs cluster cib-push --config <filename>
>
> The second problem is that you have an ordering constraint but no
> colocation constraint. With your current setup, windows_VM has to start
> after the storage, but it doesn't have to start on the same node. You
> need a colocation constraint as well, to ensure they start on the same
> node.
>
>> pcs resource cleanup windows_VM_res
>> After the above steps the VM is located on the correct node and
>> everything is ok.
>>
>> Is my approach correct ?
>>
>> Your opinion would be valuable,
>> Sincerely
>>
>>
>> On 06/25/2018 07:15 PM, Ken Gaillot wrote:
>>> On Mon, 2018-06-25 at 09:47 -0500, Ken Gaillot wrote:
>>>> On Mon, 2018-06-25 at 11:33 +0300, Vaggelis Papastavros wrote:
>>>>> Dear friends ,
>>>>>
>>>>> We have the following configuration :
>>>>>
>>>>> CentOS7 , pacemaker 0.9.152 and Corosync 2.4.0, storage with
>>>>> DRBD
>>>>> and
>>>>> stonith eanbled with APC PDU devices.
>>>>>
>>>>> I have a windows VM configured as cluster resource with the
>>>>> following
>>>>> attributes :
>>>>>
>>>>> Resource: WindowSentinelOne_res (class=ocf provider=heartbeat
>>>>> type=VirtualDomain)
>>>>> Attributes: hypervisor=qemu:///system
>>>>> config=/opt/customer_vms/conf/WindowSentinelOne/WindowSentinelO
>>>>> ne.x
>>>>> ml
>>>>>   
>>>>> migration_transport=ssh
>>>>> Utilization: cpu=8 hv_memory=8192
>>>>> Operations: start interval=0s timeout=120s
>>>>> (WindowSentinelOne_res-start-interval-0s)
>>>>>                       stop interval=0s timeout=120s
>>>>> (WindowSentinelOne_res-stop-interval-0s)
>>>>>                       monitor interval=10s timeout=30s
>>>>> (WindowSentinelOne_res-monitor-interval-10s)
>>>>>
>>>>> under some circumstances  (which i try to identify) the VM
>>>>> fails
>>>>> and
>>>>> disappears under virsh list --all and also pacemaker reports
>>>>> the VM
>>>>> as
>>>>> stopped .
>>>>>
>>>>> If run pcs resource cleanup windows_wm everything is OK, but i
>>>>> can't
>>>>> identify the reason of failure.
>>>>>
>>>>> For example when shutdown the VM (with windows shutdown)  the
>>>>> cluster
>>>>> reports the following :
>>>>>
>>>>> WindowSentinelOne_res    (ocf::heartbeat:VirtualDomain):
>>>>> Started
>>>>> sgw-
>>>>> 02
>>>>> (failure ignored)
>>>>>
>>>>> Failed Actions:
>>>>> * WindowSentinelOne_res_monitor_10000 on sgw-02 'not running'
>>>>> (7):
>>>>> call=67, status=complete, exitreason='none',
>>>>>       last-rc-change='Mon Jun 25 07:41:37 2018', queued=0ms,
>>>>> exec=0ms.
>>>>>
>>>>>
>>>>> My questions are
>>>>>
>>>>> 1) why the VM shutdown is reported as (FailedAction) from
>>>>> cluster ?
>>>>> Its
>>>>> a worthy operation during VM life cycle .
>>>> Pacemaker has no way of knowing that the VM was intentionally
>>>> shut
>>>> down, vs crashed.
>>>>
>>>> When some resource is managed by the cluster, all starts and
>>>> stops of
>>>> the resource have to go through the cluster. You can either set
>>>> target-
>>>> role=Stopped in the resource configuration, or if it's a
>>>> temporary
>>>> issue (e.g. rebooting for some OS updates), you could set is-
>>>> managed=false to take it out of cluster control, do the work,
>>>> then
>>>> set
>>>> is-managed=true again.
>>> Also, a nice feature is that you can use rules to set a maintenance
>>> window ahead of time (especially helpful if the person who
>>> maintains
>>> the cluster isn't the same person who needs to do the VM
>>> updates). For
>>> example, you could set a rule that the resource's is-managed option
>>> will be false from 9pm to midnight on Fridays. See:
>>>
>>> http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-singl
>>> e/Pa
>>> cemaker_Explained/index.html#idm140583511697312
>>>
>>> particularly the parts about time/date expressions and using rules
>>> to
>>> control resource options.
>>>
>>>>> 2) why sometimes the resource is marked as stopped (the VM is
>>>>> healthy)
>>>>> and needs cleanup ?
>>>> That's a problem. If the VM is truly healthy, it sounds like
>>>> there's
>>>> an
>>>> issue with the resource agent. You'd have to look at the logs to
>>>> see
>>>> if
>>>> it gave any more information (e.g. if it's a timeout, raising the
>>>> timeout might be sufficient).
>>>>
>>>>> 3) I can't understand the corosync logs ... during the the VM
>>>>> shutdown
>>>>> corosync logs is the following
>>>> FYI, the system log will have the most important messages.
>>>> corosync.log
>>>> will additionally have info-level messages -- potentially helpful
>>>> but
>>>> definitely difficult to follow.
>>>>
>>>>> Jun 25 07:41:37 [5140] sgw-02       crmd:     info:
>>>>> process_lrm_event:    Result of monitor operation for
>>>>> WindowSentinelOne_res on sgw-02: 7 (not running) | call=67
>>>>> key=WindowSentinelOne_res_monitor_10000 confirmed=false cib-
>>>>> update=36
>>>> This is really the only important message. It says that a
>>>> recurring
>>>> monitor on the WindowSentinelOne_res resource on node sgw-02
>>>> exited
>>>> with status code 7 (which means the resource agent thinks the
>>>> resource
>>>> is not running).
>>>>
>>>> 'key=WindowSentinelOne_res_monitor_10000' is how pacemaker
>>>> identifies
>>>> resource agent actions. The format is <resource-name>_<action-
>>>> name>_<action-interval-in-milliseconds>
>>>>
>>>> This is the only information Pacemaker will get from the resource
>>>> agent. To investigate more deeply, you'll have to check for log
>>>> messages from the agent itself.
>>>>
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_process_request:    Forwarding cib_modify operation for
>>>>> section
>>>>> status to all (origin=local/crmd/36)
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_perform_op:
>>>>> Diff: --- 0.4704.67 2
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_perform_op:
>>>>> Diff: +++ 0.4704.68 (null)
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_perform_op:
>>>>> +  /cib:  @num_updates=68
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_perform_op:
>>>>> +  /cib/status/node_state[@id='2']: @crm-debug-
>>>>> origin=do_update_resource
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_perform_op:
>>>>> ++
>>>>> /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_
>>>>> reso
>>>>> ur
>>>>> ce[@id='WindowSentinelOne_res']:
>>>>> <lrm_rsc_op id="WindowSentinelOne_res_last_failure_0"
>>>>> operation_key="WindowSentinelOne_res_monitor_10000"
>>>>> operation="monitor"
>>>>> crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
>>>>> transition-key="84:3:0:f910c793-a714-4e24-80d1-b0ec66275491"
>>>>> transition-magic="0:7;84:3:0:f910c793-a714-4e24-80d1-
>>>>> b0ec66275491"
>>>>> on_node="sgw-02" cal
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_process_request:    Completed cib_modify operation for
>>>>> section
>>>>> status: OK (rc=0, origin=sgw-02/crmd/36, version=0.4704.68)
>>>> You can usually ignore the 'cib' messages. This just means
>>>> Pacemaker
>>>> recorded the result on disk.
>>>>
>>>>> Jun 25 07:41:37 [5137] sgw-02      attrd:     info:
>>>>> attrd_peer_update:    Setting fail-count-
>>>>> WindowSentinelOne_res[sgw-
>>>>> 02]:
>>>>> (null) -> 1 from sgw-01
>>>> Since the cluster expected the resource to be running, this
>>>> result is
>>>> a
>>>> failure. Failures are counted using special node attributes that
>>>> start
>>>> with "fail-count-". This is what Pacemaker uses to determine if a
>>>> resource has reached its migration-threshold.
>>>>
>>>>> Jun 25 07:41:37 [5137] sgw-02      attrd:     info:
>>>>> write_attribute:
>>>>> Sent update 10 with 1 changes for fail-count-
>>>>> WindowSentinelOne_res,
>>>>> id=<n/a>, set=(null)
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_process_request:    Forwarding cib_modify operation for
>>>>> section
>>>>> status to all (origin=local/attrd/10)
>>>>> Jun 25 07:41:37 [5137] sgw-02      attrd:     info:
>>>>> attrd_peer_update:    Setting
>>>>> last-failure-WindowSentinelOne_res[sgw-02]: (null) ->
>>>>> 1529912497
>>>>> from
>>>>> sgw-01
>>>> Similarly, the time the failure occurred is stored in a 'last-
>>>> failure-'
>>>> node attribute, which Pacemaker uses to determine if a resource
>>>> has
>>>> reached its failure-timeout.
>>>>
>>>>> Jun 25 07:41:37 [5137] sgw-02      attrd:     info:
>>>>> write_attribute:
>>>>> Sent update 11 with 1 changes for last-failure-
>>>>> WindowSentinelOne_res,
>>>>> id=<n/a>, set=(null)
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_process_request:    Forwarding cib_modify operation for
>>>>> section
>>>>> status to all (origin=local/attrd/11)
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_perform_op:
>>>>> Diff: --- 0.4704.68 2
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_perform_op:
>>>>> Diff: +++ 0.4704.69 (null)
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_perform_op:
>>>>> +  /cib:  @num_updates=69
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_perform_op:
>>>>> ++
>>>>> /cib/status/node_state[@id='2']/transient_attributes[@id='2']/i
>>>>> nsta
>>>>> nc
>>>>> e_attributes[@id='status-2']:
>>>>> <nvpair id="status-2-fail-count-WindowSentinelOne_res"
>>>>> name="fail-count-WindowSentinelOne_res" value="1"/>
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_process_request:    Completed cib_modify operation for
>>>>> section
>>>>> status: OK (rc=0, origin=sgw-02/attrd/10, version=0.4704.69)
>>>>> Jun 25 07:41:37 [5137] sgw-02      attrd:     info:
>>>>> attrd_cib_callback:    Update 10 for fail-count-
>>>>> WindowSentinelOne_res:
>>>>> OK (0)
>>>>> Jun 25 07:41:37 [5137] sgw-02      attrd:     info:
>>>>> attrd_cib_callback:    Update 10 for
>>>>> fail-count-WindowSentinelOne_res[sgw-02]=1: OK (0)
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_perform_op:
>>>>> Diff: --- 0.4704.69 2
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_perform_op:
>>>>> Diff: +++ 0.4704.70 (null)
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_perform_op:
>>>>> +  /cib:  @num_updates=70
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_perform_op:
>>>>> ++
>>>>> /cib/status/node_state[@id='2']/transient_attributes[@id='2']/i
>>>>> nsta
>>>>> nc
>>>>> e_attributes[@id='status-2']:
>>>>> <nvpair id="status-2-last-failure-WindowSentinelOne_res"
>>>>> name="last-failure-WindowSentinelOne_res" value="1529912497"/>
>>>>> Jun 25 07:41:37 [5130] sgw-02        cib:     info:
>>>>> cib_process_request:    Completed cib_modify operation for
>>>>> section
>>>>> status: OK (rc=0, origin=sgw-02/attrd/11, version=0.4704.70)
>>>>> Jun 25 07:41:37 [5137] sgw-02      attrd:     info:
>>>>> attrd_cib_callback:    Update 11 for last-failure-
>>>>> WindowSentinelOne_res:
>>>>> OK (0)
>>>>> Jun 25 07:41:37 [5137] sgw-02      attrd:     info:
>>>>> attrd_cib_callback:    Update 11 for
>>>>> last-failure-WindowSentinelOne_res[sgw-02]=1529912497: OK (0)
>>>>> Jun 25 07:41:42 [5130] sgw-02        cib:     info:
>>>>> cib_process_ping:
>>>>> Reporting our current digest to sgw-01:
>>>>> 3e27415fcb003ef3373b47ffa6c5f358
>>>>> for 0.4704.70 (0x7faac1729720 0)
>>>>>
>>>>> Sincerely ,
>>>>>
>>>>> Vaggelis Papastavros
>>   
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
>> pdf
>> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180627/0de9932c/attachment-0002.html>


More information about the Users mailing list