[ClusterLabs] Live Guest Migration timeouts for VirtualDomain resources

Wed Feb 1 16:54:39 UTC 2017

Ken (and Ulrich),

Found it!  You're right, we do deliver a man page...

man]# find . -name *Virtual* -print
./man7/ocf_heartbeat_VirtualDomain.7.gz

# rpm -q
--whatprovides /usr/share/man/man7/ocf_heartbeat_VirtualDomain.7.gz
resource-agents-3.9.7-4.el7_2.kvmibm1_1_3.1.s390x

Much obliged, sir(s).

Scott Greenlese ... IBM z/BX Solutions Test,  Poughkeepsie, N.Y.
  INTERNET:  swgreenl at us.ibm.com
  PHONE:  8/293-7301 (845-433-7301)    M/S:  POK 42HA/P966

From:	Ken Gaillot <kgaillot at redhat.com>
To:	users at clusterlabs.org
Date:	02/01/2017 10:33 AM
Subject:	Re: [ClusterLabs] Live Guest Migration timeouts for
            VirtualDomain resources

On 02/01/2017 09:15 AM, Scott Greenlese wrote:
> Hi all...
>
> Just a quick follow-up.
>
> Thought I should come clean and share with you that the incorrect
> "migrate-to" operation name defined in my VirtualDomain
> resource was my mistake. It was mis-coded in the virtual guest
> provisioning script. I have since changed it to "migrate_to"
> and of course, the specified live migration timeout value is working
> effectively now. (For some reason, I assumed we were letting that
> operation meta value default).
>
> I was wondering if someone could refer me to the definitive online link
> for pacemaker resource man pages? I don't see any resource man pages
> installed
> on my system anywhere. I found this one online:
> https://www.mankier.com/7/ocf_heartbeat_VirtualDomain but is there a
> more 'official' page I should refer our
> Linux KVM on System z customers to?

All distributions that I know of include the man pages with the packages
they distribute. Are you building from source? They are named like "man
ocf_heartbeat_IPaddr2".

FYI after following this thread, the pcs developers are making a change
so that pcs refuses to add an unrecognized operation unless the user
uses --force. Thanks for being involved in the community; this is how we
learn to improve!

> Thanks again for your assistance.
>
> Scott Greenlese ...IBM KVM on System Z Solution Test Poughkeepsie, N.Y.
> INTERNET: swgreenl at us.ibm.com
>
>
> Inactive hide details for "Ulrich Windl" ---01/27/2017 02:32:43 AM--->>>
> "Scott Greenlese" <swgreenl at us.ibm.com> schrieb am 27."Ulrich Windl"
> ---01/27/2017 02:32:43 AM--->>> "Scott Greenlese" <swgreenl at us.ibm.com>
> schrieb am 27.01.2017 um 02:47 in Nachricht
>
> From: "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de>
> To: <users at clusterlabs.org>, Scott Greenlese/Poughkeepsie/IBM at IBMUS
> Cc: "Si Bo Niu" <niusibo at cn.ibm.com>, Michael
Tebolt/Poughkeepsie/IBM at IBMUS
> Date: 01/27/2017 02:32 AM
> Subject: Antw: Re: [ClusterLabs] Antw: Re: Live Guest Migration timeouts
> for VirtualDomain resources
>
> ------------------------------------------------------------------------
>
>
>
>>>> "Scott Greenlese" <swgreenl at us.ibm.com> schrieb am 27.01.2017 um
> 02:47 in
> Nachricht
>
<OF63CD0E10.D58C4C3D-ON002580B5.0005C410-852580B5.0009DBDE at notes.na.collabserv.c

>
> m>:
>
>> Hi guys..
>>
>> Well, today I confirmed that what Ulrich said is correct.  If I update
the
>> VirtualDomain resource with the operation name  "migrate_to" instead of
>> "migrate-to",  it effectively overrides and enforces the 1200ms default
>> value to the new value.
>>
>> I am wondering how I would have known that I was using the wrong
operation
>> name, when the initial operation name is already incorrect
>> when the resource is created?
>
> For SLES 11, I made a quick (portable non-portable unstable) try (print
> the operations known to an RA):
> # crm ra info VirtualDomain |sed -n -e "/Operations' defaults/,\$p"
> Operations' defaults (advisory minimum):
>
>    start         timeout=90
>    stop          timeout=90
>    status        timeout=30 interval=10
>    monitor       timeout=30 interval=10
>    migrate_from  timeout=60
>    migrate_to    timeout=120
>
> Regards,
> Ulrich
>
>>
>> This is what the meta data for my resource looked like after making the
>> update:
>>
>> [root at zs95kj VD]# date;pcs resource update zs95kjg110065_res op
migrate_to
>> timeout="360s"
>> Thu Jan 26 16:43:11 EST 2017
>> You have new mail in /var/spool/mail/root
>>
>> [root at zs95kj VD]# date;pcs resource show zs95kjg110065_res
>> Thu Jan 26 16:43:46 EST 2017
>>  Resource: zs95kjg110065_res (class=ocf provider=heartbeat
>> type=VirtualDomain)
>>   Attributes: config=/guestxml/nfs1/zs95kjg110065.xml
>> hypervisor=qemu:///system migration_transport=ssh
>>   Meta Attrs: allow-migrate=true
>>   Operations: start interval=0s timeout=120
>> (zs95kjg110065_res-start-interval-0s)
>>               stop interval=0s timeout=120
>> (zs95kjg110065_res-stop-interval-0s)
>>               monitor interval=30s
> (zs95kjg110065_res-monitor-interval-30s)
>>               migrate-from interval=0s timeout=1200
>> (zs95kjg110065_res-migrate-from-interval-0s)
>>               migrate-to interval=0s timeout=1200
>> (zs95kjg110065_res-migrate-to-interval-0s)   <<< Original op name /
value
>>               migrate_to interval=0s timeout=360s
>> (zs95kjg110065_res-migrate_to-interval-0s)  <<< New op name / value
>>
>>
>> Where does that original op name come from in the VirtualDomain resource
>> definition?  How can we get the initial meta value changed and shipped
> with
>> a valid operation name (i.e. migrate_to), and
>> maybe a more reasonable migrate_to timeout value... something
> significantly
>> higher than 1200ms , i.e. 1.2 seconds ?  Can I report this request as a
>> bugzilla on the RHEL side, or should this go to my internal IBM bugzilla
>> for KVM on System Z development?
>>
>> Anyway, thanks so much for identifying my issue.  I can reconfigure my
>> resources to make them tolerate longer migration execution times.
>>
>>
>> Scott Greenlese ... IBM KVM on System Z Solution Test
>>   INTERNET:  swgreenl at us.ibm.com
>>
>>
>>
>>
>> From: Ken Gaillot <kgaillot at redhat.com>
>> To: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>,
>>             users at clusterlabs.org
>> Date: 01/19/2017 10:26 AM
>> Subject: Re: [ClusterLabs] Antw: Re: Live Guest Migration timeouts for
>>             VirtualDomain resources
>>
>>
>>
>> On 01/19/2017 01:36 AM, Ulrich Windl wrote:
>>>>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 18.01.2017 um 16:32 in
>> Nachricht
>>> <4b02d3fa-4693-473b-8bed-dc98f9e3f3f3 at redhat.com>:
>>>> On 01/17/2017 04:45 PM, Scott Greenlese wrote:
>>>>> Ken and Co,
>>>>>
>>>>> Thanks for the useful information.
>>>>>
>>>
>>> [...]
>>>>>
>>>>> Is this internally coded within the class=ocf provider=heartbeat
>>>>> type=VirtualDomain resource agent?
>>>>
>>>> Aha, I just realized what the issue is: the operation name is
>>>> migrate_to, not migrate-to.
>>>>
>>>> For technical reasons, pacemaker can't validate operation names (at
the
>>>> time that the configuration is edited, it does not necessarily have
>>>> access to the agent metadata).
>>>
>>> BUT the set of operations is finite, right? So if those were in some
XML
>> schema, the names could be verified at least (not meaning that the
>> operation is actually supported).
>>> BTW: Would a "crm configure verify" detect this kijnd of problem?
>>>
>>> [...]
>>>
>>> Ulrich
>>
>> Yes, it's in the resource agent meta-data. While pacemaker itself uses a
>> small set of well-defined actions, the agent may define any arbitrarily
>> named actions it desires, and the user could configure one of these as a
>> recurring action in pacemaker.
>>
>> Pacemaker itself has to be liberal about where its configuration comes
>> from -- the configuration can be edited on a separate machine, which
>> doesn't have resource agents, and then uploaded to the cluster. So
>> Pacemaker can't do that validation at configuration time. (It could
>> theoretically do some checking after the fact when the configuration is
>> loaded, but this could be a lot of overhead, and there are
>> implementation issues at the moment.)
>>
>> Higher-level tools like crmsh and pcs, on the other hand, can make
>> simplifying assumptions. They can require access to the resource agents
>> so that they can do extra validation.

_______________________________________________
Users mailing list: Users at clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170201/a2280239/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170201/a2280239/attachment-0002.gif>