[ClusterLabs] Antw: Re: Antw: Re: Live Guest Migration timeouts for VirtualDomain resources
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Fri Jan 27 02:32:34 EST 2017
>>> "Scott Greenlese" <swgreenl at us.ibm.com> schrieb am 27.01.2017 um 02:47 in
Nachricht
<OF63CD0E10.D58C4C3D-ON002580B5.0005C410-852580B5.0009DBDE at notes.na.collabserv.c
m>:
> Hi guys..
>
> Well, today I confirmed that what Ulrich said is correct. If I update the
> VirtualDomain resource with the operation name "migrate_to" instead of
> "migrate-to", it effectively overrides and enforces the 1200ms default
> value to the new value.
>
> I am wondering how I would have known that I was using the wrong operation
> name, when the initial operation name is already incorrect
> when the resource is created?
For SLES 11, I made a quick (portable non-portable unstable) try (print the operations known to an RA):
# crm ra info VirtualDomain |sed -n -e "/Operations' defaults/,\$p"
Operations' defaults (advisory minimum):
start timeout=90
stop timeout=90
status timeout=30 interval=10
monitor timeout=30 interval=10
migrate_from timeout=60
migrate_to timeout=120
Regards,
Ulrich
>
> This is what the meta data for my resource looked like after making the
> update:
>
> [root at zs95kj VD]# date;pcs resource update zs95kjg110065_res op migrate_to
> timeout="360s"
> Thu Jan 26 16:43:11 EST 2017
> You have new mail in /var/spool/mail/root
>
> [root at zs95kj VD]# date;pcs resource show zs95kjg110065_res
> Thu Jan 26 16:43:46 EST 2017
> Resource: zs95kjg110065_res (class=ocf provider=heartbeat
> type=VirtualDomain)
> Attributes: config=/guestxml/nfs1/zs95kjg110065.xml
> hypervisor=qemu:///system migration_transport=ssh
> Meta Attrs: allow-migrate=true
> Operations: start interval=0s timeout=120
> (zs95kjg110065_res-start-interval-0s)
> stop interval=0s timeout=120
> (zs95kjg110065_res-stop-interval-0s)
> monitor interval=30s (zs95kjg110065_res-monitor-interval-30s)
> migrate-from interval=0s timeout=1200
> (zs95kjg110065_res-migrate-from-interval-0s)
> migrate-to interval=0s timeout=1200
> (zs95kjg110065_res-migrate-to-interval-0s) <<< Original op name / value
> migrate_to interval=0s timeout=360s
> (zs95kjg110065_res-migrate_to-interval-0s) <<< New op name / value
>
>
> Where does that original op name come from in the VirtualDomain resource
> definition? How can we get the initial meta value changed and shipped with
> a valid operation name (i.e. migrate_to), and
> maybe a more reasonable migrate_to timeout value... something significantly
> higher than 1200ms , i.e. 1.2 seconds ? Can I report this request as a
> bugzilla on the RHEL side, or should this go to my internal IBM bugzilla
> for KVM on System Z development?
>
> Anyway, thanks so much for identifying my issue. I can reconfigure my
> resources to make them tolerate longer migration execution times.
>
>
> Scott Greenlese ... IBM KVM on System Z Solution Test
> INTERNET: swgreenl at us.ibm.com
>
>
>
>
> From: Ken Gaillot <kgaillot at redhat.com>
> To: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>,
> users at clusterlabs.org
> Date: 01/19/2017 10:26 AM
> Subject: Re: [ClusterLabs] Antw: Re: Live Guest Migration timeouts for
> VirtualDomain resources
>
>
>
> On 01/19/2017 01:36 AM, Ulrich Windl wrote:
>>>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 18.01.2017 um 16:32 in
> Nachricht
>> <4b02d3fa-4693-473b-8bed-dc98f9e3f3f3 at redhat.com>:
>>> On 01/17/2017 04:45 PM, Scott Greenlese wrote:
>>>> Ken and Co,
>>>>
>>>> Thanks for the useful information.
>>>>
>>
>> [...]
>>>>
>>>> Is this internally coded within the class=ocf provider=heartbeat
>>>> type=VirtualDomain resource agent?
>>>
>>> Aha, I just realized what the issue is: the operation name is
>>> migrate_to, not migrate-to.
>>>
>>> For technical reasons, pacemaker can't validate operation names (at the
>>> time that the configuration is edited, it does not necessarily have
>>> access to the agent metadata).
>>
>> BUT the set of operations is finite, right? So if those were in some XML
> schema, the names could be verified at least (not meaning that the
> operation is actually supported).
>> BTW: Would a "crm configure verify" detect this kijnd of problem?
>>
>> [...]
>>
>> Ulrich
>
> Yes, it's in the resource agent meta-data. While pacemaker itself uses a
> small set of well-defined actions, the agent may define any arbitrarily
> named actions it desires, and the user could configure one of these as a
> recurring action in pacemaker.
>
> Pacemaker itself has to be liberal about where its configuration comes
> from -- the configuration can be edited on a separate machine, which
> doesn't have resource agents, and then uploaded to the cluster. So
> Pacemaker can't do that validation at configuration time. (It could
> theoretically do some checking after the fact when the configuration is
> loaded, but this could be a lot of overhead, and there are
> implementation issues at the moment.)
>
> Higher-level tools like crmsh and pcs, on the other hand, can make
> simplifying assumptions. They can require access to the resource agents
> so that they can do extra validation.
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list