[ClusterLabs] Colocation and ordering with live migration

Mon Oct 10 21:15:22 CEST 2016

Thanks for all suggestions. It is really odd for me that this usecase, 
which is very basic for simple virtualization cluster, is not described 
in every FAQ out there...

It appears that my setup is working correctly with non-symmetrical 
ordering constraints:

Ordering Constraints:

   start dlm-clone then start clvmd-clone (kind:Mandatory)

   start clvmd-clone then start cluster-config-clone (kind:Mandatory)

   start cluster-config-clone then start libvirtd-clone (kind:Mandatory)

   stop vm_smartbv2 then stop libvirtd-clone (kind:Mandatory) 
(non-symmetrical)

   stop vm_smartbv1 then stop libvirtd-clone (kind:Mandatory) 
(non-symmetrical)

   start libvirtd-clone then start vm_smartbv2 (kind:Optional) 
(non-symmetrical)

   start libvirtd-clone then start vm_smartbv1 (kind:Optional) 
(non-symmetrical)

Colocation Constraints:

   clvmd-clone with dlm-clone (score:INFINITY)

   cluster-config-clone with clvmd-clone (score:INFINITY)

   libvirtd-clone with cluster-config-clone (score:INFINITY)

   vm_smartbv1 with libvirtd-clone (score:INFINITY)

   vm_smartbv2 with libvirtd-clone (score:INFINITY)

This is strange, I could swear I've tried this before without success...

It could be possible to modify VirtualDomain RA to include additional 
monitor call, which would block the agent when libvirtd does not work. 
On the other side, if VirtualDomain is able to monitor VM state without 
libvirtd, using emulator process, then it can be an obvious extension to 
issue forced stop just by killing the process. At least this could save 
us a fencing.

Still I do not understand why optional constraint has not effect when 
both VM and libvirtd are scheduled to stop, if there is live migration 
in place. It looks like a bug.

--
Pavel Levshin

10.10.2016 20:58, Klaus Wenninger:

> On 10/10/2016 06:56 PM, Ken Gaillot wrote:
>> On 10/10/2016 10:21 AM, Klaus Wenninger wrote:
>>> On 10/10/2016 04:54 PM, Ken Gaillot wrote:
>>>> On 10/10/2016 07:36 AM, Pavel Levshin wrote:
>>>>> 10.10.2016 15:11, Klaus Wenninger:
>>>>>> On 10/10/2016 02:00 PM, Pavel Levshin wrote:
>>>>>>> 10.10.2016 14:32, Klaus Wenninger:
>>>>>>>> Why are the order-constraints between libvirt & vms optional?
>>>>>>> If they were mandatory, then all the virtual machines would be
>>>>>>> restarted when libvirtd restarts. This is not desired nor needed. When
>>>>>>> this happens, the node is fenced because it is unable to restart VM in
>>>>>>> absence of working libvirtd.
>>>>>> Was guessing something like that ...
>>>>>> So let me reformulate my question:
>>>>>>     Why does libvirtd have to be restarted?
>>>>>> If it is because of config-changes making it reloadable might be a
>>>>>> solution ...
>>>>>>
>>>>> Right, config changes come to my mind first of all. But sometimes a
>>>>> service, including libvirtd, may fail unexpectedly. In this case I would
>>>>> prefer to restart it without disturbing VirtualDomains, which will fail
>>>>> eternally.
>>>> I think the mandatory colocation of VMs with libvirtd negates your goal.
>>>> If libvirtd stops, the VMs will have to stop anyway because they can't
>>>> be colocated with libvirtd. Making the colocation optional should fix that.
>>>>
>>>>> The question is, why the cluster does not obey optional constraint, when
>>>>> both libvirtd and VM stop in a single transition?
>>>> If it truly is in the same transition, then it should be honored.
>>>>
>>>> You have *mandatory* constraints for DLM -> CLVMd -> cluster-config ->
>>>> libvirtd, but only an *optional* constraint for libvirtd -> VMs.
>>>> Therefore, libvirtd will generally have to wait longer than the VMs to
>>>> be started.
>>>>
>>>> It might help to add mandatory constraints for cluster-config -> VMs.
>>>> That way, they have the same requirements as libvirtd, and are more
>>>> likely to start in the same transition.
>>>>
>>>> However I'm sure there are still problematic situations. What you want
>>>> is a simple idea, but a rather complex specification: "If rsc1 fails,
>>>> block any instances of this other RA on the same node."
>>>>
>>>> It might be possible to come up with some node attribute magic to
>>>> enforce this. You'd need some custom RAs. I imagine something like one
>>>> RA that sets a node attribute, and another RA that checks it.
>>>>
>>>> The setter would be grouped with libvirtd. Anytime that libvirtd starts,
>>>> the setter would set a node attribute on the local node. Anytime that
>>>> libvirtd stopped or failed, the setter would unset the attribute value.
>>>>
>>>> The checker would simply monitor the attribute, and fail if the
>>>> attribute is unset. The group would have on-fail=block. So anytime the
>>>> the attribute was unset, the VM would not be started or stopped. (There
>>>> would be no constraints between the two groups -- the checker RA would
>>>> take the place of constraints.)
>>> In how far would that behave differently to just putting libvirtd
>>> into this on-fail=block group? (apart from of course the
>>> possibility to group the vms into more than one group ...)
>> You could stop or restart libvirtd without stopping the VMs. It would
>> cause a "failure" of the checker that would need to be cleaned later,
>> but the VMs wouldn't stop.
> Ah, yes forgot about the manual restart case. Had already
> turned that into a reload in my mind ;-)
> As long as libvirtd is a systemd-unit ... would a restart
> via systemd create a similar behavior?
> But forget about it ... with pacemaker enabling to receive
> systemd-events we should probably not foster this
> use-case ;-)
>
>>>> I haven't thought through all possible scenarios, but it seems feasible
>>>> to me.
>>>>
>>>>> In my eyes, these services are bound by a HARD obvious colocation
>>>>> constraint: VirtualDomain should never ever be touched in absence of
>>>>> working libvirtd. Unfortunately, I cannot figure out a way to reflect
>>>>> this constraint in the cluster.
>>>>>
>>>>>
>>>>> -- 
>>>>> Pavel Levshin
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://clusterlabs.org/pipermail/users/attachments/20161010/c8583c70/attachment.html>