[ClusterLabs] Live Guest Migration timeouts for VirtualDomain resources
kgaillot at redhat.com
Wed Feb 1 10:31:56 EST 2017
On 02/01/2017 09:15 AM, Scott Greenlese wrote:
> Hi all...
> Just a quick follow-up.
> Thought I should come clean and share with you that the incorrect
> "migrate-to" operation name defined in my VirtualDomain
> resource was my mistake. It was mis-coded in the virtual guest
> provisioning script. I have since changed it to "migrate_to"
> and of course, the specified live migration timeout value is working
> effectively now. (For some reason, I assumed we were letting that
> operation meta value default).
> I was wondering if someone could refer me to the definitive online link
> for pacemaker resource man pages? I don't see any resource man pages
> on my system anywhere. I found this one online:
> https://www.mankier.com/7/ocf_heartbeat_VirtualDomain but is there a
> more 'official' page I should refer our
> Linux KVM on System z customers to?
All distributions that I know of include the man pages with the packages
they distribute. Are you building from source? They are named like "man
FYI after following this thread, the pcs developers are making a change
so that pcs refuses to add an unrecognized operation unless the user
uses --force. Thanks for being involved in the community; this is how we
learn to improve!
> Thanks again for your assistance.
> Scott Greenlese ...IBM KVM on System Z Solution Test Poughkeepsie, N.Y.
> INTERNET: swgreenl at us.ibm.com
> Inactive hide details for "Ulrich Windl" ---01/27/2017 02:32:43 AM--->>>
> "Scott Greenlese" <swgreenl at us.ibm.com> schrieb am 27."Ulrich Windl"
> ---01/27/2017 02:32:43 AM--->>> "Scott Greenlese" <swgreenl at us.ibm.com>
> schrieb am 27.01.2017 um 02:47 in Nachricht
> From: "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de>
> To: <users at clusterlabs.org>, Scott Greenlese/Poughkeepsie/IBM at IBMUS
> Cc: "Si Bo Niu" <niusibo at cn.ibm.com>, Michael Tebolt/Poughkeepsie/IBM at IBMUS
> Date: 01/27/2017 02:32 AM
> Subject: Antw: Re: [ClusterLabs] Antw: Re: Live Guest Migration timeouts
> for VirtualDomain resources
>>>> "Scott Greenlese" <swgreenl at us.ibm.com> schrieb am 27.01.2017 um
> 02:47 in
> <OF63CD0E10.D58C4C3D-ON002580B5.0005C410-852580B5.0009DBDE at notes.na.collabserv.c
>> Hi guys..
>> Well, today I confirmed that what Ulrich said is correct. If I update the
>> VirtualDomain resource with the operation name "migrate_to" instead of
>> "migrate-to", it effectively overrides and enforces the 1200ms default
>> value to the new value.
>> I am wondering how I would have known that I was using the wrong operation
>> name, when the initial operation name is already incorrect
>> when the resource is created?
> For SLES 11, I made a quick (portable non-portable unstable) try (print
> the operations known to an RA):
> # crm ra info VirtualDomain |sed -n -e "/Operations' defaults/,\$p"
> Operations' defaults (advisory minimum):
> start timeout=90
> stop timeout=90
> status timeout=30 interval=10
> monitor timeout=30 interval=10
> migrate_from timeout=60
> migrate_to timeout=120
>> This is what the meta data for my resource looked like after making the
>> [root at zs95kj VD]# date;pcs resource update zs95kjg110065_res op migrate_to
>> Thu Jan 26 16:43:11 EST 2017
>> You have new mail in /var/spool/mail/root
>> [root at zs95kj VD]# date;pcs resource show zs95kjg110065_res
>> Thu Jan 26 16:43:46 EST 2017
>> Resource: zs95kjg110065_res (class=ocf provider=heartbeat
>> Attributes: config=/guestxml/nfs1/zs95kjg110065.xml
>> hypervisor=qemu:///system migration_transport=ssh
>> Meta Attrs: allow-migrate=true
>> Operations: start interval=0s timeout=120
>> stop interval=0s timeout=120
>> monitor interval=30s
>> migrate-from interval=0s timeout=1200
>> migrate-to interval=0s timeout=1200
>> (zs95kjg110065_res-migrate-to-interval-0s) <<< Original op name / value
>> migrate_to interval=0s timeout=360s
>> (zs95kjg110065_res-migrate_to-interval-0s) <<< New op name / value
>> Where does that original op name come from in the VirtualDomain resource
>> definition? How can we get the initial meta value changed and shipped
>> a valid operation name (i.e. migrate_to), and
>> maybe a more reasonable migrate_to timeout value... something
>> higher than 1200ms , i.e. 1.2 seconds ? Can I report this request as a
>> bugzilla on the RHEL side, or should this go to my internal IBM bugzilla
>> for KVM on System Z development?
>> Anyway, thanks so much for identifying my issue. I can reconfigure my
>> resources to make them tolerate longer migration execution times.
>> Scott Greenlese ... IBM KVM on System Z Solution Test
>> INTERNET: swgreenl at us.ibm.com
>> From: Ken Gaillot <kgaillot at redhat.com>
>> To: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>,
>> users at clusterlabs.org
>> Date: 01/19/2017 10:26 AM
>> Subject: Re: [ClusterLabs] Antw: Re: Live Guest Migration timeouts for
>> VirtualDomain resources
>> On 01/19/2017 01:36 AM, Ulrich Windl wrote:
>>>>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 18.01.2017 um 16:32 in
>>> <4b02d3fa-4693-473b-8bed-dc98f9e3f3f3 at redhat.com>:
>>>> On 01/17/2017 04:45 PM, Scott Greenlese wrote:
>>>>> Ken and Co,
>>>>> Thanks for the useful information.
>>>>> Is this internally coded within the class=ocf provider=heartbeat
>>>>> type=VirtualDomain resource agent?
>>>> Aha, I just realized what the issue is: the operation name is
>>>> migrate_to, not migrate-to.
>>>> For technical reasons, pacemaker can't validate operation names (at the
>>>> time that the configuration is edited, it does not necessarily have
>>>> access to the agent metadata).
>>> BUT the set of operations is finite, right? So if those were in some XML
>> schema, the names could be verified at least (not meaning that the
>> operation is actually supported).
>>> BTW: Would a "crm configure verify" detect this kijnd of problem?
>> Yes, it's in the resource agent meta-data. While pacemaker itself uses a
>> small set of well-defined actions, the agent may define any arbitrarily
>> named actions it desires, and the user could configure one of these as a
>> recurring action in pacemaker.
>> Pacemaker itself has to be liberal about where its configuration comes
>> from -- the configuration can be edited on a separate machine, which
>> doesn't have resource agents, and then uploaded to the cluster. So
>> Pacemaker can't do that validation at configuration time. (It could
>> theoretically do some checking after the fact when the configuration is
>> loaded, but this could be a lot of overhead, and there are
>> implementation issues at the moment.)
>> Higher-level tools like crmsh and pcs, on the other hand, can make
>> simplifying assumptions. They can require access to the resource agents
>> so that they can do extra validation.
More information about the Users