[ClusterLabs] Antw: Re: VirtualDomain started in two hosts

Tue Jan 17 10:38:35 EST 2017

On 01/17/2017 08:52 AM, Ulrich Windl wrote:
>>>> Oscar Segarra <oscar.segarra at gmail.com> schrieb am 17.01.2017 um 10:15 in
> Nachricht
> <CAJq8taG8VhX5J1xQpqMRQ-9omFNXKHQs54mBzz491_6df9akzA at mail.gmail.com>:
>> Hi,
>>
>> Yes, I will try to explain myself better.
>>
>> *Initially*
>> On node1 (vdicnode01-priv)
>>> virsh list
>> ==============
>> vdicdb01     started
>>
>> On node2 (vdicnode02-priv)
>>> virsh list
>> ==============
>> vdicdb02     started
>>
>> --> Now, I execute the migrate command (outside the cluster <-- not using
>> pcs resource move)
>> virsh migrate --live vdicdb01 qemu:/// qemu+ssh://vdicnode02-priv
>> tcp://vdicnode02-priv
> 
> One of the rules of successful clustering is: If resurces are managed by the cluster, they are managed by the cluster only! ;-)
> 
> I guess one node is trying to restart the VM once it vanished, and the other node might try to shut down the VM while it's being migrated.
> Or any other undesired combination...

As Ulrich says here, you can't use virsh to manage VMs once they are
managed by the cluster. Instead, configure your cluster to support live
migration:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-migrating-resources

and then use pcs resource move (which is just location constraints under
the hood) to move VMs.

What's happening in your example is:

* Your VM cluster resource has a monitor operation ensuring that is
running properly on the desired node.

* It is also possible to configure a monitor to ensure that the resource
is not running on nodes where it's not supposed to be (a monitor with
role="Stopped"). You don't have one of these (which is fine, and common).

* When you move the VM, the cluster detects that it is not running on
the node you told it to keep it running on. Because there is no
"Stopped" monitor, the cluster doesn't immediately realize that a new
rogue instance is running on another node. So, the cluster thinks the VM
crashed on the original node, and recovers it by starting it again.

If your goal is to take a VM out of cluster management without stopping
it, you can "unmanage" the resource.

>> *Finally*
>> On node1 (vdicnode01-priv)
>>> virsh list
>> ==============
>> *vdicdb01     started*
>>
>> On node2 (vdicnode02-priv)
>>> virsh list
>> ==============
>> vdicdb02     started
>> vdicdb01     started
>>
>> If I query cluster pcs status, cluster thinks resource vm-vdicdb01 is only
>> started on node vdicnode01-priv.
>>
>> Thanks a lot.
>>
>>
>>
>> 2017-01-17 10:03 GMT+01:00 emmanuel segura <emi2fast at gmail.com>:
>>
>>> sorry,
>>>
>>> But do you mean, when you say, you migrated the vm outside of the
>>> cluster? one server out side of you cluster?
>>>
>>> 2017-01-17 9:27 GMT+01:00 Oscar Segarra <oscar.segarra at gmail.com>:
>>>> Hi,
>>>>
>>>> I have configured a two node cluster whewe run 4 kvm guests on.
>>>>
>>>> The hosts are:
>>>> vdicnode01
>>>> vdicnode02
>>>>
>>>> And I have created a dedicated network card for cluster management. I
>>> have
>>>> created required entries in /etc/hosts:
>>>> vdicnode01-priv
>>>> vdicnode02-priv
>>>>
>>>> The four guests have collocation rules in order to make them distribute
>>>> proportionally between my two nodes.
>>>>
>>>> The problem I have is that if I migrate a guest outside the cluster, I
>>> mean
>>>> using the virsh migrate - - live...  Cluster,  instead of moving back the
>>>> guest to its original node (following collocation sets),  Cluster starts
>>>> again the guest and suddenly I have the same guest running on both nodes
>>>> causing xfs corruption in guest.
>>>>
>>>> Is there any configuration applicable to avoid this unwanted behavior?
>>>>
>>>> Thanks a lot