[ClusterLabs] Colocation and ordering with live migration

Mon Oct 10 16:54:10 CEST 2016

On 10/10/2016 07:36 AM, Pavel Levshin wrote:
> 10.10.2016 15:11, Klaus Wenninger:
>> On 10/10/2016 02:00 PM, Pavel Levshin wrote:
>>> 10.10.2016 14:32, Klaus Wenninger:
>>>> Why are the order-constraints between libvirt & vms optional?
>>> If they were mandatory, then all the virtual machines would be
>>> restarted when libvirtd restarts. This is not desired nor needed. When
>>> this happens, the node is fenced because it is unable to restart VM in
>>> absence of working libvirtd.
>> Was guessing something like that ...
>> So let me reformulate my question:
>>    Why does libvirtd have to be restarted?
>> If it is because of config-changes making it reloadable might be a
>> solution ...
>>
> 
> Right, config changes come to my mind first of all. But sometimes a
> service, including libvirtd, may fail unexpectedly. In this case I would
> prefer to restart it without disturbing VirtualDomains, which will fail
> eternally.

I think the mandatory colocation of VMs with libvirtd negates your goal.
If libvirtd stops, the VMs will have to stop anyway because they can't
be colocated with libvirtd. Making the colocation optional should fix that.

> The question is, why the cluster does not obey optional constraint, when
> both libvirtd and VM stop in a single transition?

If it truly is in the same transition, then it should be honored.

You have *mandatory* constraints for DLM -> CLVMd -> cluster-config ->
libvirtd, but only an *optional* constraint for libvirtd -> VMs.
Therefore, libvirtd will generally have to wait longer than the VMs to
be started.

It might help to add mandatory constraints for cluster-config -> VMs.
That way, they have the same requirements as libvirtd, and are more
likely to start in the same transition.

However I'm sure there are still problematic situations. What you want
is a simple idea, but a rather complex specification: "If rsc1 fails,
block any instances of this other RA on the same node."

It might be possible to come up with some node attribute magic to
enforce this. You'd need some custom RAs. I imagine something like one
RA that sets a node attribute, and another RA that checks it.

The setter would be grouped with libvirtd. Anytime that libvirtd starts,
the setter would set a node attribute on the local node. Anytime that
libvirtd stopped or failed, the setter would unset the attribute value.

The checker would simply monitor the attribute, and fail if the
attribute is unset. The group would have on-fail=block. So anytime the
the attribute was unset, the VM would not be started or stopped. (There
would be no constraints between the two groups -- the checker RA would
take the place of constraints.)

I haven't thought through all possible scenarios, but it seems feasible
to me.

> In my eyes, these services are bound by a HARD obvious colocation
> constraint: VirtualDomain should never ever be touched in absence of
> working libvirtd. Unfortunately, I cannot figure out a way to reflect
> this constraint in the cluster.
> 
> 
> -- 
> Pavel Levshin