[ClusterLabs] manner in which cluster migrates VirtualDomain - ?

Wed Apr 19 13:32:35 EDT 2023

On 19/04/2023 16:16, Ken Gaillot wrote:
> On Wed, 2023-04-19 at 08:00 +0200, lejeczek via Users wrote:
>> On 18/04/2023 21:02, Ken Gaillot wrote:
>>> On Tue, 2023-04-18 at 19:36 +0200, lejeczek via Users wrote:
>>>> On 18/04/2023 18:22, Ken Gaillot wrote:
>>>>> On Tue, 2023-04-18 at 14:58 +0200, lejeczek via Users wrote:
>>>>>> Hi guys.
>>>>>>
>>>>>> When it's done by the cluster itself, eg. a node goes
>>>>>> 'standby' -
>>>>>> how
>>>>>> do clusters migrate VirtualDomain resources?
>>>>> 1. Call resource agent migrate_to action on original node
>>>>> 2. Call resource agent migrate_from action on new node
>>>>> 3. Call resource agent stop action on original node
>>>>>
>>>>>> Do users have any control over it and if so then how?
>>>>> The allow-migrate resource meta-attribute (true/false)
>>>>>
>>>>>> I'd imagine there must be some docs - I failed to find
>>>>> It's sort of scattered throughout Pacemaker Explained -- the
>>>>> main
>>>>> one
>>>>> is:
>>>>>
>>>>> https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/html/advanced-options.html#migrating-resources
>>>>>
>>>>>> Especially in large deployments one obvious question would be
>>>>>> -
>>>>>> I'm
>>>>>> guessing as my setup is rather SOHO - can VMs migrate in
>>>>>> sequence
>>>>>> or
>>>>>> it is(always?) a kind of 'swarm' migration?
>>>>> The migration-limit cluster property specifies how many live
>>>>> migrations
>>>>> may be initiated at once (the default of -1 means unlimited).
>>>> But if this is cluster property - unless I got it wrong,
>>>> hopefully - then this govern any/all resources.
>>>> If so, can such a limit be rounded down to RA type or
>>>> perhaps group of resources?
>>>>
>>>> many thanks, L.
>>> No, it's global
>> To me it feels so intuitive, so natural & obvious that I
>> will ask - nobody yet suggested that such feature be
>> available to smaller divisions of cluster independently of
>> global rule?
>> In the vastness of resource types many are polar opposites
>> and to treat them all the same?
>> Would be great to have some way to tell cluster to run
>> different migration/relocation limits on for eg.
>> compute-heavy resources VS light-weight ones - where to
>> "file" such a enhancement suggestion, Bugzilla?
>>
>> many thanks, L.
> Looking at the code, I see it's a little different than I originally
> thought.
>
> First, I overlooked that it's correctly documented as a per-node limit
> rather than a cluster-wide limit.
>
> That highlights the complexity of allowing different values for
> different resources; if rscA has a migration limit of 2, and rscB has a
> migration limit of 5, do we allow up to 2 rscA migrations and 5 rscB
> migrations simultaneously, or do we weight them relative to each other
> so the total capacity is still constrained (for example limiting it to
> 1 rscA migration and 2 rscB migrations together)?
My first thoughts were - I cannot comment on the code, only 
inasmuch as an admin would care - perhaps to introduce, if 
would not require business logic total overhaul, "migration 
groups"(while not being another resource type) whose such 
groups then a resource could be member.
Or perhaps marry 'migration-limit' to 'resource group' which 
would take priority over global/node-wide rule.
One way or another, simple to end-users - then user/admin 
sets N-limit of resources which in such group can be 
live-migrated at one time, say...

in this-given-group only 2 resources can cluster attempt to 
live-migrate simultaneously, then wait for success or 
failure but wait for result and only then proceed to next & ...

>
> We would almost need something like the node utilization feature, being
> able to define a node's total migration capacity and then how much of
> that capacity is taken up by the migration of a specific resource. That
> seems overcomplicated to me, especially since there aren't that many
> resource types that support live migration.
Those types which do support live migration and are 
compute-heavy, then I really wonder how large consumers do 
VirtualDomain migration, as one good example.
Say a Virtual/Cloud provides - there a chunky host node 
might host hundreds VMs - there, but anywhere else timeouts, 
all/any, must be some real, fixed number.
As of right now, how intuitive is what cluster does when it 
swarms - say equally - those hundreds of VMs to 
remaining-available nodes...
... even with fast inner-node connectivity many - without 
migration-limit - live-migrations will timeout.
Is cluster capable of some very clever heuristics so humans 
could leave it to the machine to ensure that such 
mass-migration will not fail simply due to overall 
bottleneck of the underlying infrastructure?
... and could the cluster alone do that? Would not 
VirtualDomain agent have to gather comprehensive metric data 
on each VM in the first place, to feed it to the cluster 
internal logic..?
I would see some way similar to these which I mentioned 
above, as relatively effective and surely down-to-earth, 
practical aid to alleviate cases such as VMs "mass-migration".

>
> Second, any actions on a Pacemaker Remote node count toward the
> throttling limit of its connection host, and aren't checked for
> migration-limit at all. That's an interesting design choice, and it's
> not clear what the ideal would be. For a VM or container, it kind of
> makes sense to count against the host's throttling. For a remote node,
> not so much. And I'm guessing not checking migration-limit in this case
> is an oversight.