[ClusterLabs] Default Behavior

Wed Jun 29 10:26:14 EDT 2016

On 06/29/2016 04:54 AM, Klaus Wenninger wrote:
> On 06/29/2016 11:00 AM, Pavlov, Vladimir wrote:
>> Thanks a lot.
>> We also thought to use Fencing (stonith).
>> But production cluster works in the cloud, node1 and node2 is virtual machines without any hardware fencing devices.
> But there are fence-agents that do fencing via the hypervisor (e.g.
> fence_xvm).
>> We looked in the direction of the SBR, but its use as far as we understand is not justified without shared storage in two-node cluster:
>> http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit
> Using SBD with a watchdog (provided your virtual environment provides a
> watchdog device inside VMs) for
> self-fencing is probably better than no fencing at all.

You can also ask your cloud provider if they provide an API for
hard-rebooting instances. If so, there are some fence agents in the wild
for common cloud provider APIs, or you could write your own.

> Regards,
> Klaus
>> Are there any ways to do fencing?
>> Specifically for our situation, we have found another workaround - use DR instead of NAT in IPVS.
>> In the case of DR, even if both servers are active at the same time it does not matter which of them serve the connection from the client. Web servers responds to the client directly.
>> This workaround has a right to life?

I forget what happens if both ldirectord are up and can't communicate,
but it's not that simple.

>> Kind regards,
>>  
>> Vladimir Pavlov
>>
>> Message: 2
>> Date: Tue, 28 Jun 2016 18:53:38 +0300
>> From: "Pavlov, Vladimir" <Vladimir.Pavlov at tns-global.ru>
>> To: "'Users at clusterlabs.org'" <Users at clusterlabs.org>
>> Subject: [ClusterLabs] Default Behavior
>> Message-ID:
>> 	<B38B34EC5621E34DABCE13E8B18936E6033F0B17C556 at EXSERV.Gallup.tns>
>> Content-Type: text/plain; charset="koi8-r"
>>
>> Hello!
>> We have Pacemaker cluster of two node Active/Backup (OS Centos 6.7), with resources IPaddr2 and ldirectord.
>> Cluster Properties:
>> cluster-infrastructure: cman
>> dc-version: 1.1.11-97629de
>> no-quorum-policy: ignore
>> stonith-enabled: false
>> The cluster has been configured for this documentation: http://clusterlabs.org/quickstart-redhat-6.html
>> Recently, there was a communication failure between cluster nodes and the behavior was like this:
>>
>> -        During a network failure, each server has become the Master.
>>
>> -        After the restoration of the network, one node killing services of Pacemaker on the second node.
>>
>> -        The second node was not available for the cluster, but all resources remain active (Ldirectord,ipvs,ip address). That is, both nodes continue to be active.
>> We decided to create a test stand and play the situation, but with current version of Pacemaker in CentOS repos, ?luster behaves differently:
>>
>> -        During a network failure, each server has become the Master.
>>
>> -        After the restoration of the network, all resources are stopped.
>>
>> -        Then the resources are run only on one node. - This behavior seems to be more logical.
>> Current Cluster Properties on test stand:
>> cluster-infrastructure: cman
>> dc-version: 1.1.14-8.el6-70404b0
>> have-watchdog: false
>> no-quorum-policy: ignore
>> stonith-enabled: false
>> Changed the behavior of the cluster in the new version or accident is not fully emulated?
>> Thank you.
>>
>>
>> Kind regards,
>>
>> Vladimir Pavlov
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://clusterlabs.org/pipermail/users/attachments/20160628/b340b971/attachment-0001.html>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Tue, 28 Jun 2016 12:07:36 -0500
>> From: Ken Gaillot <kgaillot at redhat.com>
>> To: users at clusterlabs.org
>> Subject: Re: [ClusterLabs] Default Behavior
>> Message-ID: <5772AED8.6060308 at redhat.com>
>> Content-Type: text/plain; charset=UTF-8
>>
>> On 06/28/2016 10:53 AM, Pavlov, Vladimir wrote:
>>> Hello!
>>>
>>> We have Pacemaker cluster of two node Active/Backup (OS Centos 6.7),
>>> with resources IPaddr2 and ldirectord.
>>>
>>> Cluster Properties:
>>>
>>> cluster-infrastructure: cman
>>>
>>> dc-version: 1.1.11-97629de
>>>
>>> no-quorum-policy: ignore
>>>
>>> stonith-enabled: false
>>>
>>> The cluster has been configured for this documentation:
>>> http://clusterlabs.org/quickstart-redhat-6.html
>>>
>>> Recently, there was a communication failure between cluster nodes and
>>> the behavior was like this:
>>>
>>> -        During a network failure, each server has become the Master.
>>>
>>> -        After the restoration of the network, one node killing services
>>> of Pacemaker on the second node.
>>>
>>> -        The second node was not available for the cluster, but all
>>> resources remain active (Ldirectord,ipvs,ip address). That is, both
>>> nodes continue to be active.
>>>
>>> We decided to create a test stand and play the situation, but with
>>> current version of Pacemaker in CentOS repos, ?luster behaves differently:
>>>
>>> -        During a network failure, each server has become the Master.
>>>
>>> -        After the restoration of the network, all resources are stopped.
>>>
>>> -        Then the resources are run only on one node. - This behavior
>>> seems to be more logical.
>>>
>>> Current Cluster Properties on test stand:
>>>
>>> cluster-infrastructure: cman
>>>
>>> dc-version: 1.1.14-8.el6-70404b0
>>>
>>> have-watchdog: false
>>>
>>> no-quorum-policy: ignore
>>>
>>> stonith-enabled: false
>>>
>>> Changed the behavior of the cluster in the new version or accident is
>>> not fully emulated?
>> If I understand your description correctly, the situation was not
>> identical. The difference I see is that, in the original case, the
>> second node is not responding to the cluster even after the network is
>> restored. Thus, the cluster cannot communicate to carry out the behavior
>> observed in the test situation.
>>
>> Fencing (stonith) is the cluster's only recovery mechanism in such a
>> case. When the network splits, or a node becomes unresponsive, it can
>> only safely recover resources if it can ensure the other node is powered
>> off. Pacemaker supports both physical fencing devices such as an
>> intelligent power switch, and hardware watchdog devices for self-fencing
>> using sbd.
>>
>>> Thank you.
>>>
>>>  
>>>
>>>  
>>>
>>> Kind regards,
>>>
>>>  
>>>
>>> *Vladimir Pavlov*
>>
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Tue, 28 Jun 2016 16:51:50 -0400
>> From: Digimer <lists at alteeve.ca>
>> To: Cluster Labs - All topics related to open-source clustering
>> 	welcomed	<users at clusterlabs.org>
>> Subject: Re: [ClusterLabs] Default Behavior
>> Message-ID: <0021409c-86ba-7ef6-875f-0defd3fc9009 at alteeve.ca>
>> Content-Type: text/plain; charset=UTF-8
>>
>> On 28/06/16 11:53 AM, Pavlov, Vladimir wrote:
>>> Hello!
>>>
>>> We have Pacemaker cluster of two node Active/Backup (OS Centos 6.7),
>>> with resources IPaddr2 and ldirectord.
>>>
>>> Cluster Properties:
>>>
>>> cluster-infrastructure: cman
>>>
>>> dc-version: 1.1.11-97629de
>>>
>>> no-quorum-policy: ignore
>>>
>>> stonith-enabled: false
>> You need fencing to be enabled and configured. This is always true, but
>> particularly so on RHEL 6 because it uses the cman plugin. Please
>> configure and test stonith, and then repeat your tests to see if the
>> behavior is more predictable.
>>
>>> The cluster has been configured for this documentation:
>>> http://clusterlabs.org/quickstart-redhat-6.html
>>>
>>> Recently, there was a communication failure between cluster nodes and
>>> the behavior was like this:
>>>
>>> -        During a network failure, each server has become the Master.
>>>
>>> -        After the restoration of the network, one node killing services
>>> of Pacemaker on the second node.
>>>
>>> -        The second node was not available for the cluster, but all
>>> resources remain active (Ldirectord,ipvs,ip address). That is, both
>>> nodes continue to be active.
>>>
>>> We decided to create a test stand and play the situation, but with
>>> current version of Pacemaker in CentOS repos, ?luster behaves differently:
>>>
>>> -        During a network failure, each server has become the Master.
>>>
>>> -        After the restoration of the network, all resources are stopped.
>>>
>>> -        Then the resources are run only on one node. - This behavior
>>> seems to be more logical.
>>>
>>> Current Cluster Properties on test stand:
>>>
>>> cluster-infrastructure: cman
>>>
>>> dc-version: 1.1.14-8.el6-70404b0
>>>
>>> have-watchdog: false
>>>
>>> no-quorum-policy: ignore
>>>
>>> stonith-enabled: false
>>>
>>> Changed the behavior of the cluster in the new version or accident is
>>> not fully emulated?
>>>
>>> Thank you.
>>>
>>>  
>>>
>>>  
>>>
>>> Kind regards,
>>>
>>>  
>>>
>>> *Vladimir Pavlov*