[ClusterLabs] Antw: Re: VirtualDomain started in two hosts

Tue Jan 17 11:27:24 EST 2017

On 01/17/2017 10:05 AM, Oscar Segarra wrote:
> Hi, 
> 
> * It is also possible to configure a monitor to ensure that the resource
> is not running on nodes where it's not supposed to be (a monitor with
> role="Stopped"). You don't have one of these (which is fine, and common).
> 
> Can you provide more information/documentation about role="Stopped"

Since you're using pcs, you can either configure monitors when you
create the resource with pcs resource create, or you can add/remove
monitors later with pcs resource op add/remove.

For example:

pcs resource op add my-resource-name op monitor interval=10s role="Stopped"

With a normal monitor op (role="Started" or omitted), the cluster will
run the resource agent's monitor command on any node that's supposed to
be running the resource. With the above example, it will additionally
run a monitor on all other nodes, so that if it finds the resource
running somewhere it's not supposed to be, it can stop it.

Note that each monitor op must have a unique timeout. So if your
existing monitor runs every 10s, you need to pick a different value for
the new monitor.

> And, please, can you explain how VirtualDomain resource agents manages
> the scenario I've presented?
> 
> /What happens If I stop pacemaker and corosync services in all nodes and
> I start them again... ¿will I have all guests running twice?/
> 
> Thanks a lot

If you stop cluster services, by default the cluster will first stop all
resources. You can set maintenance mode, or unmanage one or more
resources, to prevent the stops.

When cluster services first start on a node, the cluster "probes" the
status of all resources on that node, by running a one-time monitor. So
it will detect anything running at that time, and start or stop services
as needed to meet the configured requirements.

> 2017-01-17 16:38 GMT+01:00 Ken Gaillot <kgaillot at redhat.com
> <mailto:kgaillot at redhat.com>>:
> 
>     On 01/17/2017 08:52 AM, Ulrich Windl wrote:
>     >>>> Oscar Segarra <oscar.segarra at gmail.com <mailto:oscar.segarra at gmail.com>> schrieb am
>     17.01.2017 um 10:15 in
>     > Nachricht
>     > <CAJq8taG8VhX5J1xQpqMRQ-9omFNXKHQs54mBzz491_6df9akzA at mail.gmail.com
>     <mailto:CAJq8taG8VhX5J1xQpqMRQ-9omFNXKHQs54mBzz491_6df9akzA at mail.gmail.com>>:
>     >> Hi,
>     >>
>     >> Yes, I will try to explain myself better.
>     >>
>     >> *Initially*
>     >> On node1 (vdicnode01-priv)
>     >>> virsh list
>     >> ==============
>     >> vdicdb01     started
>     >>
>     >> On node2 (vdicnode02-priv)
>     >>> virsh list
>     >> ==============
>     >> vdicdb02     started
>     >>
>     >> --> Now, I execute the migrate command (outside the cluster <-- not using
>     >> pcs resource move)
>     >> virsh migrate --live vdicdb01 qemu:/// qemu+ssh://vdicnode02-priv
>     >> tcp://vdicnode02-priv
>     >
>     > One of the rules of successful clustering is: If resurces are managed by the cluster, they are managed by the cluster only! ;-)
>     >
>     > I guess one node is trying to restart the VM once it vanished, and the other node might try to shut down the VM while it's being migrated.
>     > Or any other undesired combination...
> 
> 
>     As Ulrich says here, you can't use virsh to manage VMs once they are
>     managed by the cluster. Instead, configure your cluster to support live
>     migration:
> 
>     http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-migrating-resources
>     <http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-migrating-resources>
> 
>     and then use pcs resource move (which is just location constraints under
>     the hood) to move VMs.
> 
>     What's happening in your example is:
> 
>     * Your VM cluster resource has a monitor operation ensuring that is
>     running properly on the desired node.
> 
>     * It is also possible to configure a monitor to ensure that the resource
>     is not running on nodes where it's not supposed to be (a monitor with
>     role="Stopped"). You don't have one of these (which is fine, and
>     common).
> 
>     * When you move the VM, the cluster detects that it is not running on
>     the node you told it to keep it running on. Because there is no
>     "Stopped" monitor, the cluster doesn't immediately realize that a new
>     rogue instance is running on another node. So, the cluster thinks the VM
>     crashed on the original node, and recovers it by starting it again.
> 
>     If your goal is to take a VM out of cluster management without stopping
>     it, you can "unmanage" the resource.
> 
> 
>     >> *Finally*
>     >> On node1 (vdicnode01-priv)
>     >>> virsh list
>     >> ==============
>     >> *vdicdb01     started*
>     >>
>     >> On node2 (vdicnode02-priv)
>     >>> virsh list
>     >> ==============
>     >> vdicdb02     started
>     >> vdicdb01     started
>     >>
>     >> If I query cluster pcs status, cluster thinks resource
>     vm-vdicdb01 is only
>     >> started on node vdicnode01-priv.
>     >>
>     >> Thanks a lot.
>     >>
>     >>
>     >>
>     >> 2017-01-17 10:03 GMT+01:00 emmanuel segura <emi2fast at gmail.com
>     <mailto:emi2fast at gmail.com>>:
>     >>
>     >>> sorry,
>     >>>
>     >>> But do you mean, when you say, you migrated the vm outside of the
>     >>> cluster? one server out side of you cluster?
>     >>>
>     >>> 2017-01-17 9:27 GMT+01:00 Oscar Segarra <oscar.segarra at gmail.com
>     <mailto:oscar.segarra at gmail.com>>:
>     >>>> Hi,
>     >>>>
>     >>>> I have configured a two node cluster whewe run 4 kvm guests on.
>     >>>>
>     >>>> The hosts are:
>     >>>> vdicnode01
>     >>>> vdicnode02
>     >>>>
>     >>>> And I have created a dedicated network card for cluster
>     management. I
>     >>> have
>     >>>> created required entries in /etc/hosts:
>     >>>> vdicnode01-priv
>     >>>> vdicnode02-priv
>     >>>>
>     >>>> The four guests have collocation rules in order to make them
>     distribute
>     >>>> proportionally between my two nodes.
>     >>>>
>     >>>> The problem I have is that if I migrate a guest outside the
>     cluster, I
>     >>> mean
>     >>>> using the virsh migrate - - live...  Cluster,  instead of
>     moving back the
>     >>>> guest to its original node (following collocation sets), 
>     Cluster starts
>     >>>> again the guest and suddenly I have the same guest running on
>     both nodes
>     >>>> causing xfs corruption in guest.
>     >>>>
>     >>>> Is there any configuration applicable to avoid this unwanted
>     behavior?
>     >>>>
>     >>>> Thanks a lot