[ClusterLabs] Antw: Re: VirtualDomain started in two hosts

Wed Jan 18 10:51:04 EST 2017

On 01/18/2017 03:49 AM, Ferenc Wágner wrote:
> Ken Gaillot <kgaillot at redhat.com> writes:
> 
>> * When you move the VM, the cluster detects that it is not running on
>> the node you told it to keep it running on. Because there is no
>> "Stopped" monitor, the cluster doesn't immediately realize that a new
>> rogue instance is running on another node. So, the cluster thinks the VM
>> crashed on the original node, and recovers it by starting it again.
> 
> Ken, do you mean that if a periodic "stopped" monitor is configured, it
> is forced to run immediately (out of schedule) when the regular periodic
> monitor unexpectedly returns with stopped status?  That is, before the
> cluster takes the recovery action?  Conceptually, that would be similar
> to the probe run on node startup.  If not, then maybe it would be a
> useful resource option to have (I mean running cluster-wide probes on an
> unexpected monitor failure, before recovery).  An optional safety check.

No, there is nothing like that currently. The regular and "Stopped"
monitors run independently. Because they must have different intervals,
that does mean that the two sides of the issue may be detected at
different times.

It is an interesting idea to have an option to reprobe on operation
failure. I think it may be overkill; the only failure situation it would
be good for is one like this, where a resource was moved out of cluster
control. The vast majority of failure scenarios wouldn't be helped. If
that sort of thing happens a lot in your cluster, you really need to
figure out how to stop doing that. :)