[ClusterLabs] Antw: Re: Live Guest Migration timeouts for VirtualDomain resources

Fri Jan 27 01:47:41 UTC 2017

Hi guys..

Well, today I confirmed that what Ulrich said is correct.  If I update the
VirtualDomain resource with the operation name  "migrate_to" instead of
"migrate-to",  it effectively overrides and enforces the 1200ms default
value to the new value.

I am wondering how I would have known that I was using the wrong operation
name, when the initial operation name is already incorrect
when the resource is created?

This is what the meta data for my resource looked like after making the
update:

[root at zs95kj VD]# date;pcs resource update zs95kjg110065_res op migrate_to
timeout="360s"
Thu Jan 26 16:43:11 EST 2017
You have new mail in /var/spool/mail/root

[root at zs95kj VD]# date;pcs resource show zs95kjg110065_res
Thu Jan 26 16:43:46 EST 2017
 Resource: zs95kjg110065_res (class=ocf provider=heartbeat
type=VirtualDomain)
  Attributes: config=/guestxml/nfs1/zs95kjg110065.xml
hypervisor=qemu:///system migration_transport=ssh
  Meta Attrs: allow-migrate=true
  Operations: start interval=0s timeout=120
(zs95kjg110065_res-start-interval-0s)
              stop interval=0s timeout=120
(zs95kjg110065_res-stop-interval-0s)
              monitor interval=30s (zs95kjg110065_res-monitor-interval-30s)
              migrate-from interval=0s timeout=1200
(zs95kjg110065_res-migrate-from-interval-0s)
              migrate-to interval=0s timeout=1200
(zs95kjg110065_res-migrate-to-interval-0s)   <<< Original op name / value
              migrate_to interval=0s timeout=360s
(zs95kjg110065_res-migrate_to-interval-0s)  <<< New op name / value

Where does that original op name come from in the VirtualDomain resource
definition?  How can we get the initial meta value changed and shipped with
a valid operation name (i.e. migrate_to), and
maybe a more reasonable migrate_to timeout value... something significantly
higher than 1200ms , i.e. 1.2 seconds ?  Can I report this request as a
bugzilla on the RHEL side, or should this go to my internal IBM bugzilla
for KVM on System Z development?

Anyway, thanks so much for identifying my issue.  I can reconfigure my
resources to make them tolerate longer migration execution times.

Scott Greenlese ... IBM KVM on System Z Solution Test
  INTERNET:  swgreenl at us.ibm.com

From:	Ken Gaillot <kgaillot at redhat.com>
To:	Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>,
            users at clusterlabs.org
Date:	01/19/2017 10:26 AM
Subject:	Re: [ClusterLabs] Antw: Re: Live Guest Migration timeouts for
            VirtualDomain resources

On 01/19/2017 01:36 AM, Ulrich Windl wrote:
>>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 18.01.2017 um 16:32 in
Nachricht
> <4b02d3fa-4693-473b-8bed-dc98f9e3f3f3 at redhat.com>:
>> On 01/17/2017 04:45 PM, Scott Greenlese wrote:
>>> Ken and Co,
>>>
>>> Thanks for the useful information.
>>>
>
> [...]
>>>
>>> Is this internally coded within the class=ocf provider=heartbeat
>>> type=VirtualDomain resource agent?
>>
>> Aha, I just realized what the issue is: the operation name is
>> migrate_to, not migrate-to.
>>
>> For technical reasons, pacemaker can't validate operation names (at the
>> time that the configuration is edited, it does not necessarily have
>> access to the agent metadata).
>
> BUT the set of operations is finite, right? So if those were in some XML
schema, the names could be verified at least (not meaning that the
operation is actually supported).
> BTW: Would a "crm configure verify" detect this kijnd of problem?
>
> [...]
>
> Ulrich

Yes, it's in the resource agent meta-data. While pacemaker itself uses a
small set of well-defined actions, the agent may define any arbitrarily
named actions it desires, and the user could configure one of these as a
recurring action in pacemaker.

Pacemaker itself has to be liberal about where its configuration comes
from -- the configuration can be edited on a separate machine, which
doesn't have resource agents, and then uploaded to the cluster. So
Pacemaker can't do that validation at configuration time. (It could
theoretically do some checking after the fact when the configuration is
loaded, but this could be a lot of overhead, and there are
implementation issues at the moment.)

Higher-level tools like crmsh and pcs, on the other hand, can make
simplifying assumptions. They can require access to the resource agents
so that they can do extra validation.

_______________________________________________
Users mailing list: Users at clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170126/42123c77/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170126/42123c77/attachment-0002.gif>