[ClusterLabs] Default Behavior

Wed Jun 29 05:00:39 EDT 2016

Thanks a lot.
We also thought to use Fencing (stonith).
But production cluster works in the cloud, node1 and node2 is virtual machines without any hardware fencing devices.
We looked in the direction of the SBR, but its use as far as we understand is not justified without shared storage in two-node cluster:
http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit
Are there any ways to do fencing?
Specifically for our situation, we have found another workaround - use DR instead of NAT in IPVS.
In the case of DR, even if both servers are active at the same time it does not matter which of them serve the connection from the client. Web servers responds to the client directly.
This workaround has a right to life?

Kind regards,

Vladimir Pavlov

Message: 2
Date: Tue, 28 Jun 2016 18:53:38 +0300
From: "Pavlov, Vladimir" <Vladimir.Pavlov at tns-global.ru>
To: "'Users at clusterlabs.org'" <Users at clusterlabs.org>
Subject: [ClusterLabs] Default Behavior
Message-ID:
	<B38B34EC5621E34DABCE13E8B18936E6033F0B17C556 at EXSERV.Gallup.tns>
Content-Type: text/plain; charset="koi8-r"

Hello!
We have Pacemaker cluster of two node Active/Backup (OS Centos 6.7), with resources IPaddr2 and ldirectord.
Cluster Properties:
cluster-infrastructure: cman
dc-version: 1.1.11-97629de
no-quorum-policy: ignore
stonith-enabled: false
The cluster has been configured for this documentation: http://clusterlabs.org/quickstart-redhat-6.html
Recently, there was a communication failure between cluster nodes and the behavior was like this:

-        During a network failure, each server has become the Master.

-        After the restoration of the network, one node killing services of Pacemaker on the second node.

-        The second node was not available for the cluster, but all resources remain active (Ldirectord,ipvs,ip address). That is, both nodes continue to be active.
We decided to create a test stand and play the situation, but with current version of Pacemaker in CentOS repos, ?luster behaves differently:

-        During a network failure, each server has become the Master.

-        After the restoration of the network, all resources are stopped.

-        Then the resources are run only on one node. - This behavior seems to be more logical.
Current Cluster Properties on test stand:
cluster-infrastructure: cman
dc-version: 1.1.14-8.el6-70404b0
have-watchdog: false
no-quorum-policy: ignore
stonith-enabled: false
Changed the behavior of the cluster in the new version or accident is not fully emulated?
Thank you.

Kind regards,

Vladimir Pavlov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://clusterlabs.org/pipermail/users/attachments/20160628/b340b971/attachment-0001.html>

------------------------------

Message: 3
Date: Tue, 28 Jun 2016 12:07:36 -0500
From: Ken Gaillot <kgaillot at redhat.com>
To: users at clusterlabs.org
Subject: Re: [ClusterLabs] Default Behavior
Message-ID: <5772AED8.6060308 at redhat.com>
Content-Type: text/plain; charset=UTF-8

On 06/28/2016 10:53 AM, Pavlov, Vladimir wrote:
> Hello!
> 
> We have Pacemaker cluster of two node Active/Backup (OS Centos 6.7),
> with resources IPaddr2 and ldirectord.
> 
> Cluster Properties:
> 
> cluster-infrastructure: cman
> 
> dc-version: 1.1.11-97629de
> 
> no-quorum-policy: ignore
> 
> stonith-enabled: false
> 
> The cluster has been configured for this documentation:
> http://clusterlabs.org/quickstart-redhat-6.html
> 
> Recently, there was a communication failure between cluster nodes and
> the behavior was like this:
> 
> -        During a network failure, each server has become the Master.
> 
> -        After the restoration of the network, one node killing services
> of Pacemaker on the second node.
> 
> -        The second node was not available for the cluster, but all
> resources remain active (Ldirectord,ipvs,ip address). That is, both
> nodes continue to be active.
> 
> We decided to create a test stand and play the situation, but with
> current version of Pacemaker in CentOS repos, ?luster behaves differently:
> 
> -        During a network failure, each server has become the Master.
> 
> -        After the restoration of the network, all resources are stopped.
> 
> -        Then the resources are run only on one node. - This behavior
> seems to be more logical.
> 
> Current Cluster Properties on test stand:
> 
> cluster-infrastructure: cman
> 
> dc-version: 1.1.14-8.el6-70404b0
> 
> have-watchdog: false
> 
> no-quorum-policy: ignore
> 
> stonith-enabled: false
> 
> Changed the behavior of the cluster in the new version or accident is
> not fully emulated?

If I understand your description correctly, the situation was not
identical. The difference I see is that, in the original case, the
second node is not responding to the cluster even after the network is
restored. Thus, the cluster cannot communicate to carry out the behavior
observed in the test situation.

Fencing (stonith) is the cluster's only recovery mechanism in such a
case. When the network splits, or a node becomes unresponsive, it can
only safely recover resources if it can ensure the other node is powered
off. Pacemaker supports both physical fencing devices such as an
intelligent power switch, and hardware watchdog devices for self-fencing
using sbd.

> Thank you.
> 
>  
> 
>  
> 
> Kind regards,
> 
>  
> 
> *Vladimir Pavlov*

------------------------------

Message: 4
Date: Tue, 28 Jun 2016 16:51:50 -0400
From: Digimer <lists at alteeve.ca>
To: Cluster Labs - All topics related to open-source clustering
	welcomed	<users at clusterlabs.org>
Subject: Re: [ClusterLabs] Default Behavior
Message-ID: <0021409c-86ba-7ef6-875f-0defd3fc9009 at alteeve.ca>
Content-Type: text/plain; charset=UTF-8

On 28/06/16 11:53 AM, Pavlov, Vladimir wrote:
> Hello!
> 
> We have Pacemaker cluster of two node Active/Backup (OS Centos 6.7),
> with resources IPaddr2 and ldirectord.
> 
> Cluster Properties:
> 
> cluster-infrastructure: cman
> 
> dc-version: 1.1.11-97629de
> 
> no-quorum-policy: ignore
> 
> stonith-enabled: false

You need fencing to be enabled and configured. This is always true, but
particularly so on RHEL 6 because it uses the cman plugin. Please
configure and test stonith, and then repeat your tests to see if the
behavior is more predictable.

> The cluster has been configured for this documentation:
> http://clusterlabs.org/quickstart-redhat-6.html
> 
> Recently, there was a communication failure between cluster nodes and
> the behavior was like this:
> 
> -        During a network failure, each server has become the Master.
> 
> -        After the restoration of the network, one node killing services
> of Pacemaker on the second node.
> 
> -        The second node was not available for the cluster, but all
> resources remain active (Ldirectord,ipvs,ip address). That is, both
> nodes continue to be active.
> 
> We decided to create a test stand and play the situation, but with
> current version of Pacemaker in CentOS repos, ?luster behaves differently:
> 
> -        During a network failure, each server has become the Master.
> 
> -        After the restoration of the network, all resources are stopped.
> 
> -        Then the resources are run only on one node. - This behavior
> seems to be more logical.
> 
> Current Cluster Properties on test stand:
> 
> cluster-infrastructure: cman
> 
> dc-version: 1.1.14-8.el6-70404b0
> 
> have-watchdog: false
> 
> no-quorum-policy: ignore
> 
> stonith-enabled: false
> 
> Changed the behavior of the cluster in the new version or accident is
> not fully emulated?
> 
> Thank you.
> 
>  
> 
>  
> 
> Kind regards,
> 
>  
> 
> *Vladimir Pavlov*
> 
>  
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?