[ClusterLabs] Colocation and ordering with live migration

Mon Oct 10 17:21:50 CEST 2016

On 10/10/2016 04:54 PM, Ken Gaillot wrote:
> On 10/10/2016 07:36 AM, Pavel Levshin wrote:
>> 10.10.2016 15:11, Klaus Wenninger:
>>> On 10/10/2016 02:00 PM, Pavel Levshin wrote:
>>>> 10.10.2016 14:32, Klaus Wenninger:
>>>>> Why are the order-constraints between libvirt & vms optional?
>>>> If they were mandatory, then all the virtual machines would be
>>>> restarted when libvirtd restarts. This is not desired nor needed. When
>>>> this happens, the node is fenced because it is unable to restart VM in
>>>> absence of working libvirtd.
>>> Was guessing something like that ...
>>> So let me reformulate my question:
>>>    Why does libvirtd have to be restarted?
>>> If it is because of config-changes making it reloadable might be a
>>> solution ...
>>>
>> Right, config changes come to my mind first of all. But sometimes a
>> service, including libvirtd, may fail unexpectedly. In this case I would
>> prefer to restart it without disturbing VirtualDomains, which will fail
>> eternally.
> I think the mandatory colocation of VMs with libvirtd negates your goal.
> If libvirtd stops, the VMs will have to stop anyway because they can't
> be colocated with libvirtd. Making the colocation optional should fix that.
>
>> The question is, why the cluster does not obey optional constraint, when
>> both libvirtd and VM stop in a single transition?
> If it truly is in the same transition, then it should be honored.
>
> You have *mandatory* constraints for DLM -> CLVMd -> cluster-config ->
> libvirtd, but only an *optional* constraint for libvirtd -> VMs.
> Therefore, libvirtd will generally have to wait longer than the VMs to
> be started.
>
> It might help to add mandatory constraints for cluster-config -> VMs.
> That way, they have the same requirements as libvirtd, and are more
> likely to start in the same transition.
>
> However I'm sure there are still problematic situations. What you want
> is a simple idea, but a rather complex specification: "If rsc1 fails,
> block any instances of this other RA on the same node."
>
> It might be possible to come up with some node attribute magic to
> enforce this. You'd need some custom RAs. I imagine something like one
> RA that sets a node attribute, and another RA that checks it.
>
> The setter would be grouped with libvirtd. Anytime that libvirtd starts,
> the setter would set a node attribute on the local node. Anytime that
> libvirtd stopped or failed, the setter would unset the attribute value.
>
> The checker would simply monitor the attribute, and fail if the
> attribute is unset. The group would have on-fail=block. So anytime the
> the attribute was unset, the VM would not be started or stopped. (There
> would be no constraints between the two groups -- the checker RA would
> take the place of constraints.)

In how far would that behave differently to just putting libvirtd
into this on-fail=block group? (apart from of course the
possibility to group the vms into more than one group ...)

>
> I haven't thought through all possible scenarios, but it seems feasible
> to me.
>
>> In my eyes, these services are bound by a HARD obvious colocation
>> constraint: VirtualDomain should never ever be touched in absence of
>> working libvirtd. Unfortunately, I cannot figure out a way to reflect
>> this constraint in the cluster.
>>
>>
>> -- 
>> Pavel Levshin
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org