[ClusterLabs] Two nodes cluster issue

Tue Jul 25 08:59:20 UTC 2017

> Tomer Azran napsal(a):
>> I tend to agree with Klaus – I don't think that having a hook that
>> bypass stonith is the right way. It is better to not use stonith at all.
>> I think I will try to use an iScsi target on my qdevice and set SBD to
>> use it.
>> I still don't understand why qdevice can't take the place SBD with
>> shared storage; correct me if I'm wrong, but it looks like both of
>> them are there for the same reason.
>
> Qdevice is there to be third side arbiter who decides which partition is
> quorate. It can also be seen as a quorum only node. So for two node
> cluster it can be viewed as a third node (eventho it is quite special
> because it cannot run resources). It is not doing fencing.
>
> SBD is fencing device. It is using disk as a third side arbiter.

I've talked with Klaus and he told me that 7.3 is not using disk as a 
third side arbiter so sorry for confusion.

You should however still be able to use sbd for checking if pacemaker is 
alive and if the partition has quorum - otherwise the watchdog kills the 
node. So qdevice will give you "3rd" node and sbd fences unquorate 
partition.

Or (as mentioned previously) you can use fabric fencing.

Regards,
   Honza

>
>
>>
>> From: Klaus Wenninger [mailto:kwenning at redhat.com]
>> Sent: Monday, July 24, 2017 9:01 PM
>> To: Cluster Labs - All topics related to open-source clustering
>> welcomed <users at clusterlabs.org>; Prasad, Shashank <ssprasad at vanu.com>
>> Subject: Re: [ClusterLabs] Two nodes cluster issue
>>
>> On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
>> Sometimes IPMI fence devices use shared power of the node, and it
>> cannot be avoided.
>> In such scenarios the HA cluster is NOT able to handle the power
>> failure of a node, since the power is shared with its own fence device.
>> The failure of IPMI based fencing can also exist due to other reasons
>> also.
>>
>> A failure to fence the failed node will cause cluster to be marked
>> UNCLEAN.
>> To get over it, the following command needs to be invoked on the
>> surviving node.
>>
>> pcs stonith confirm <failed_node_name> --force
>>
>> This can be automated by hooking a recovery script, when the the
>> Stonith resource ‘Timed Out’ event.
>> To be more specific, the Pacemaker Alerts can be used for watch for
>> Stonith timeouts and failures.
>> In that script, all that’s essentially to be executed is the
>> aforementioned command.
>>
>> If I get you right here you can disable fencing then in the first place.
>> Actually quorum-based-watchdog-fencing is the way to do this in a
>> safe manner. This of course assumes you have a proper source for
>> quorum in your 2-node-setup with e.g. qdevice or using a shared
>> disk with sbd (not directly pacemaker quorum here but similar thing
>> handled inside sbd).
>>
>>
>> Since the alerts are issued from ‘hacluster’ login, sudo permissions
>> for ‘hacluster’ needs to be configured.
>>
>> Thanx.
>>
>>
>> From: Klaus Wenninger [mailto:kwenning at redhat.com]
>> Sent: Monday, July 24, 2017 9:24 PM
>> To: Kristián Feldsam; Cluster Labs - All topics related to open-source
>> clustering welcomed
>> Subject: Re: [ClusterLabs] Two nodes cluster issue
>>
>> On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
>> I personally think that power off node by switched pdu is more safe,
>> or not?
>>
>> True if that is working in you environment. If you can't do a physical
>> setup
>> where you aren't simultaneously loosing connection to both your node and
>> the switch-device (or you just want to cover cases where that happens)
>> you have to come up with something else.
>>
>>
>>
>>
>> S pozdravem Kristián Feldsam
>> Tel.: +420 773 303 353, +421 944 137 535
>> E-mail.: support at feldhost.cz<mailto:support at feldhost.cz>
>>
>> www.feldhost.cz<http://www.feldhost.cz> - FeldHost™ – profesionální
>> hostingové a serverové služby za adekvátní ceny.
>>
>> FELDSAM s.r.o.
>> V rohu 434/3
>> Praha 4 – Libuš, PSČ 142 00
>> IČ: 290 60 958, DIČ: CZ290 60 958
>> C 200350 vedená u Městského soudu v Praze
>>
>> Banka: Fio banka a.s.
>> Číslo účtu: 2400330446/2010
>> BIC: FIOBCZPPXX
>> IBAN: CZ82 2010 0000 0024 0033 0446
>>
>> On 24 Jul 2017, at 17:27, Klaus Wenninger
>> <kwenning at redhat.com<mailto:kwenning at redhat.com>> wrote:
>>
>> On 07/24/2017 05:15 PM, Tomer Azran wrote:
>> I still don't understand why the qdevice concept doesn't help on this
>> situation. Since the master node is down, I would expect the quorum to
>> declare it as dead.
>> Why doesn't it happens?
>>
>> That is not how quorum works. It just limits the decision-making to
>> the quorate subset of the cluster.
>> Still the unknown nodes are not sure to be down.
>> That is why I suggested to have quorum-based watchdog-fencing with sbd.
>> That would assure that within a certain time all nodes of the
>> non-quorate part
>> of the cluster are down.
>>
>>
>>
>>
>>
>>
>> On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk"
>> <dmitri.maziuk at gmail.com<mailto:dmitri.maziuk at gmail.com>> wrote:
>>
>> On 2017-07-24 07:51, Tomer Azran wrote:
>>
>>> We don't have the ability to use it.
>>
>>> Is that the only solution?
>>
>>
>>
>> No, but I'd recommend thinking about it first. Are you sure you will
>>
>> care about your cluster working when your server room is on fire? 'Cause
>>
>> unless you have halon suppression, your server room is a complete
>>
>> write-off anyway. (Think water from sprinklers hitting rich chunky volts
>>
>> in the servers.)
>>
>>
>>
>> Dima
>>
>>
>>
>> _______________________________________________
>>
>> Users mailing list: Users at clusterlabs.org<mailto:Users at clusterlabs.org>
>>
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>>
>>
>> Project Home: http://www.clusterlabs.org<http://www.clusterlabs.org/>
>>
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>
>> Bugs: http://bugs.clusterlabs.org<http://bugs.clusterlabs.org/>
>>
>>
>>
>>
>>
>> _______________________________________________
>>
>> Users mailing list: Users at clusterlabs.org<mailto:Users at clusterlabs.org>
>>
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>>
>>
>> Project Home: http://www.clusterlabs.org<http://www.clusterlabs.org/>
>>
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>
>> Bugs: http://bugs.clusterlabs.org<http://bugs.clusterlabs.org/>
>>
>>
>> --
>>
>> Klaus Wenninger
>>
>>
>>
>> Senior Software Engineer, EMEA ENG Openstack Infrastructure
>>
>>
>>
>> Red Hat
>>
>>
>>
>> kwenning at redhat.com<mailto:kwenning at redhat.com>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org<mailto:Users at clusterlabs.org>
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org<http://www.clusterlabs.org/>
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org<http://bugs.clusterlabs.org/>
>>
>>
>>
>>
>>
>> _______________________________________________
>>
>> Users mailing list: Users at clusterlabs.org<mailto:Users at clusterlabs.org>
>>
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>>
>>
>> Project Home: http://www.clusterlabs.org
>>
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org