[ClusterLabs] Live Guest Migration timeouts for VirtualDomain resources

Wed Feb 1 10:15:49 EST 2017

Hi all...

Just a quick follow-up.

Thought I should come clean and share with you that the incorrect
"migrate-to" operation name defined in my VirtualDomain
resource was my mistake.  It was mis-coded in the virtual guest
provisioning script.  I have since changed it to "migrate_to"
and of course, the specified live migration timeout value is working
effectively now.  (For some reason, I assumed we were letting that
operation meta value default).

I was wondering if someone could refer me to the definitive online link for
pacemaker resource man pages?  I don't see any resource man pages installed
on my system anywhere.   I found this one online:
https://www.mankier.com/7/ocf_heartbeat_VirtualDomain  but is there a more
'official' page I should refer our
Linux KVM on System z customers to?

Thanks again for your assistance.

Scott Greenlese ...IBM KVM on System Z Solution Test Poughkeepsie, N.Y.
  INTERNET:  swgreenl at us.ibm.com

From:	"Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de>
To:	<users at clusterlabs.org>, Scott Greenlese/Poughkeepsie/IBM at IBMUS
Cc:	"Si Bo Niu" <niusibo at cn.ibm.com>, Michael
            Tebolt/Poughkeepsie/IBM at IBMUS
Date:	01/27/2017 02:32 AM
Subject:	Antw: Re: [ClusterLabs] Antw: Re: Live Guest Migration timeouts
            for VirtualDomain resources

>>> "Scott Greenlese" <swgreenl at us.ibm.com> schrieb am 27.01.2017 um 02:47
in
Nachricht
<OF63CD0E10.D58C4C3D-ON002580B5.0005C410-852580B5.0009DBDE at notes.na.collabserv.c

m>:

> Hi guys..
>
> Well, today I confirmed that what Ulrich said is correct.  If I update
the
> VirtualDomain resource with the operation name  "migrate_to" instead of
> "migrate-to",  it effectively overrides and enforces the 1200ms default
> value to the new value.
>
> I am wondering how I would have known that I was using the wrong
operation
> name, when the initial operation name is already incorrect
> when the resource is created?

For SLES 11, I made a quick (portable non-portable unstable) try (print the
operations known to an RA):
 # crm ra info VirtualDomain |sed -n -e "/Operations' defaults/,\$p"
Operations' defaults (advisory minimum):

    start         timeout=90
    stop          timeout=90
    status        timeout=30 interval=10
    monitor       timeout=30 interval=10
    migrate_from  timeout=60
    migrate_to    timeout=120

Regards,
Ulrich

>
> This is what the meta data for my resource looked like after making the
> update:
>
> [root at zs95kj VD]# date;pcs resource update zs95kjg110065_res op
migrate_to
> timeout="360s"
> Thu Jan 26 16:43:11 EST 2017
> You have new mail in /var/spool/mail/root
>
> [root at zs95kj VD]# date;pcs resource show zs95kjg110065_res
> Thu Jan 26 16:43:46 EST 2017
>  Resource: zs95kjg110065_res (class=ocf provider=heartbeat
> type=VirtualDomain)
>   Attributes: config=/guestxml/nfs1/zs95kjg110065.xml
> hypervisor=qemu:///system migration_transport=ssh
>   Meta Attrs: allow-migrate=true
>   Operations: start interval=0s timeout=120
> (zs95kjg110065_res-start-interval-0s)
>               stop interval=0s timeout=120
> (zs95kjg110065_res-stop-interval-0s)
>               monitor interval=30s
(zs95kjg110065_res-monitor-interval-30s)
>               migrate-from interval=0s timeout=1200
> (zs95kjg110065_res-migrate-from-interval-0s)
>               migrate-to interval=0s timeout=1200
> (zs95kjg110065_res-migrate-to-interval-0s)   <<< Original op name / value
>               migrate_to interval=0s timeout=360s
> (zs95kjg110065_res-migrate_to-interval-0s)  <<< New op name / value
>
>
> Where does that original op name come from in the VirtualDomain resource
> definition?  How can we get the initial meta value changed and shipped
with
> a valid operation name (i.e. migrate_to), and
> maybe a more reasonable migrate_to timeout value... something
significantly
> higher than 1200ms , i.e. 1.2 seconds ?  Can I report this request as a
> bugzilla on the RHEL side, or should this go to my internal IBM bugzilla
> for KVM on System Z development?
>
> Anyway, thanks so much for identifying my issue.  I can reconfigure my
> resources to make them tolerate longer migration execution times.
>
>
> Scott Greenlese ... IBM KVM on System Z Solution Test
>   INTERNET:  swgreenl at us.ibm.com
>
>
>
>
> From:		 Ken Gaillot <kgaillot at redhat.com>
> To:		 Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>,
>             users at clusterlabs.org
> Date:		 01/19/2017 10:26 AM
> Subject:		 Re: [ClusterLabs] Antw: Re: Live Guest Migration
timeouts for
>             VirtualDomain resources
>
>
>
> On 01/19/2017 01:36 AM, Ulrich Windl wrote:
>>>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 18.01.2017 um 16:32 in
> Nachricht
>> <4b02d3fa-4693-473b-8bed-dc98f9e3f3f3 at redhat.com>:
>>> On 01/17/2017 04:45 PM, Scott Greenlese wrote:
>>>> Ken and Co,
>>>>
>>>> Thanks for the useful information.
>>>>
>>
>> [...]
>>>>
>>>> Is this internally coded within the class=ocf provider=heartbeat
>>>> type=VirtualDomain resource agent?
>>>
>>> Aha, I just realized what the issue is: the operation name is
>>> migrate_to, not migrate-to.
>>>
>>> For technical reasons, pacemaker can't validate operation names (at the
>>> time that the configuration is edited, it does not necessarily have
>>> access to the agent metadata).
>>
>> BUT the set of operations is finite, right? So if those were in some XML
> schema, the names could be verified at least (not meaning that the
> operation is actually supported).
>> BTW: Would a "crm configure verify" detect this kijnd of problem?
>>
>> [...]
>>
>> Ulrich
>
> Yes, it's in the resource agent meta-data. While pacemaker itself uses a
> small set of well-defined actions, the agent may define any arbitrarily
> named actions it desires, and the user could configure one of these as a
> recurring action in pacemaker.
>
> Pacemaker itself has to be liberal about where its configuration comes
> from -- the configuration can be edited on a separate machine, which
> doesn't have resource agents, and then uploaded to the cluster. So
> Pacemaker can't do that validation at configuration time. (It could
> theoretically do some checking after the fact when the configuration is
> loaded, but this could be a lot of overhead, and there are
> implementation issues at the moment.)
>
> Higher-level tools like crmsh and pcs, on the other hand, can make
> simplifying assumptions. They can require access to the resource agents
> so that they can do extra validation.
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170201/86e8c173/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170201/86e8c173/attachment-0002.gif>