[ClusterLabs] Two nodes cluster issue

Mon Jul 24 16:29:21 EDT 2017

yes, I just have idea, he probably have managed switch or fabric...

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: support at feldhost.cz

www.feldhost.cz - FeldHost™ – profesionální hostingové a serverové služby za adekvátní ceny.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010 0000 0024 0033 0446

> On 24 Jul 2017, at 22:18, Klaus Wenninger <kwenning at redhat.com> wrote:
> 
> On 07/24/2017 09:46 PM, Kristián Feldsam wrote:
>> so why to use some other fencing method like disablink port on switch, so nobody can acces faultly node and write data to it. it is common practice too.
> 
> Well don't get me wrong here. I don't want to hard-sell sbd.
> Just though that very likely requirements that prevent usage
> of a remote-controlled power-switch will make access
> to a switch to disable the ports unusable as well.
> And if a working qdevice setup is there already the gap between
> what he thought he would get from qdevice and what he actually
> had just matches exactly quorum-based-watchdog-fencing.
> 
> But you are of course right.
> I don't really know the scenario.
> Maybe fabric fencing is the perfect match - good to mention it
> here as a possibility.
> 
> Regards,
> Klaus
>   
>> 
>> S pozdravem Kristián Feldsam
>> Tel.: +420 773 303 353, +421 944 137 535
>> E-mail.: support at feldhost.cz <mailto:support at feldhost.cz>
>> 
>> www.feldhost.cz <http://www.feldhost.cz/> - FeldHost™ – profesionální hostingové a serverové služby za adekvátní ceny.
>> 
>> FELDSAM s.r.o.
>> V rohu 434/3
>> Praha 4 – Libuš, PSČ 142 00
>> IČ: 290 60 958, DIČ: CZ290 60 958
>> C 200350 vedená u Městského soudu v Praze
>> 
>> Banka: Fio banka a.s.
>> Číslo účtu: 2400330446/2010
>> BIC: FIOBCZPPXX
>> IBAN: CZ82 2010 0000 0024 0033 0446
>> 
>>> On 24 Jul 2017, at 21:16, Klaus Wenninger <kwenning at redhat.com <mailto:kwenning at redhat.com>> wrote:
>>> 
>>> On 07/24/2017 08:27 PM, Prasad, Shashank wrote:
>>>> My understanding is that  SBD will need a shared storage between clustered nodes.
>>>> And that, SBD will need at least 3 nodes in a cluster, if using w/o shared storage.
>>> 
>>> Haven't tried to be honest but reason for 3 nodes is that without
>>> shared disk you need a real quorum-source and not something
>>> 'faked' as with 2-node-feature in corosync.
>>> But I don't see anything speaking against getting the proper
>>> quorum via qdevice instead with a third full cluster-node.
>>> 
>>>>  
>>>> Therefore, for systems which do NOT use shared storage between 1+1 HA clustered nodes, SBD may NOT be an option.
>>>> Correct me, if I am wrong.
>>>>  
>>>> For cluster systems using the likes of iDRAC/IMM2 fencing agents, which have redundant but shared power supply units with the nodes, the normal fencing mechanisms should work for all resiliency scenarios, but for IMM2/iDRAC are being NOT reachable for whatsoever reasons. And, to bail out of those situations in the absence of SBD, I believe using used-defined failover hooks (via scripts) into Pacemaker Alerts, with sudo permissions for ‘hacluster’, should help.
>>> 
>>> If you don't see your fencing device assuming after some time
>>> the the corresponding node will probably be down is quite risky
>>> in my opinion.
>>> But why not assure it to be down using a watchdog?
>>> 
>>>>  
>>>> Thanx.
>>>>  
>>>>  
>>>> From: Klaus Wenninger [mailto:kwenning at redhat.com <mailto:kwenning at redhat.com>] 
>>>> Sent: Monday, July 24, 2017 11:31 PM
>>>> To: Cluster Labs - All topics related to open-source clustering welcomed; Prasad, Shashank
>>>> Subject: Re: [ClusterLabs] Two nodes cluster issue
>>>>  
>>>> On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
>>>> Sometimes IPMI fence devices use shared power of the node, and it cannot be avoided.
>>>> In such scenarios the HA cluster is NOT able to handle the power failure of a node, since the power is shared with its own fence device.
>>>> The failure of IPMI based fencing can also exist due to other reasons also.
>>>>  
>>>> A failure to fence the failed node will cause cluster to be marked UNCLEAN.
>>>> To get over it, the following command needs to be invoked on the surviving node.
>>>>  
>>>> pcs stonith confirm <failed_node_name> --force
>>>>  
>>>> This can be automated by hooking a recovery script, when the the Stonith resource ‘Timed Out’ event.
>>>> To be more specific, the Pacemaker Alerts can be used for watch for Stonith timeouts and failures.
>>>> In that script, all that’s essentially to be executed is the aforementioned command.
>>>> 
>>>> If I get you right here you can disable fencing then in the first place.
>>>> Actually quorum-based-watchdog-fencing is the way to do this in a
>>>> safe manner. This of course assumes you have a proper source for
>>>> quorum in your 2-node-setup with e.g. qdevice or using a shared
>>>> disk with sbd (not directly pacemaker quorum here but similar thing
>>>> handled inside sbd).
>>>> 
>>>> 
>>>> Since the alerts are issued from ‘hacluster’ login, sudo permissions for ‘hacluster’ needs to be configured.
>>>>  
>>>> Thanx.
>>>>  
>>>>  
>>>> From: Klaus Wenninger [mailto:kwenning at redhat.com <mailto:kwenning at redhat.com>] 
>>>> Sent: Monday, July 24, 2017 9:24 PM
>>>> To: Kristián Feldsam; Cluster Labs - All topics related to open-source clustering welcomed
>>>> Subject: Re: [ClusterLabs] Two nodes cluster issue
>>>>  
>>>> On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
>>>> I personally think that power off node by switched pdu is more safe, or not?
>>>> 
>>>> True if that is working in you environment. If you can't do a physical setup
>>>> where you aren't simultaneously loosing connection to both your node and
>>>> the switch-device (or you just want to cover cases where that happens)
>>>> you have to come up with something else.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> S pozdravem Kristián Feldsam
>>>> Tel.: +420 773 303 353, +421 944 137 535
>>>> E-mail.: support at feldhost.cz <mailto:support at feldhost.cz>
>>>> 
>>>> www.feldhost.cz <http://www.feldhost.cz/> - FeldHost™ – profesionální hostingové a serverové služby za adekvátní ceny.
>>>> 
>>>> FELDSAM s.r.o.
>>>> V rohu 434/3
>>>> Praha 4 – Libuš, PSČ 142 00
>>>> IČ: 290 60 958, DIČ: CZ290 60 958
>>>> C 200350 vedená u Městského soudu v Praze
>>>> 
>>>> Banka: Fio banka a.s.
>>>> Číslo účtu: 2400330446/2010
>>>> BIC: FIOBCZPPXX
>>>> IBAN: CZ82 2010 0000 0024 0033 0446
>>>>  
>>>> On 24 Jul 2017, at 17:27, Klaus Wenninger <kwenning at redhat.com <mailto:kwenning at redhat.com>> wrote:
>>>>  
>>>> On 07/24/2017 05:15 PM, Tomer Azran wrote:
>>>> I still don't understand why the qdevice concept doesn't help on this situation. Since the master node is down, I would expect the quorum to declare it as dead.
>>>> Why doesn't it happens?
>>>> 
>>>> That is not how quorum works. It just limits the decision-making to the quorate subset of the cluster.
>>>> Still the unknown nodes are not sure to be down.
>>>> That is why I suggested to have quorum-based watchdog-fencing with sbd.
>>>> That would assure that within a certain time all nodes of the non-quorate part
>>>> of the cluster are down.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk" <dmitri.maziuk at gmail.com <mailto:dmitri.maziuk at gmail.com>> wrote:
>>>> 
>>>> On 2017-07-24 07:51, Tomer Azran wrote:
>>>> > We don't have the ability to use it.
>>>> > Is that the only solution?
>>>>  
>>>> No, but I'd recommend thinking about it first. Are you sure you will 
>>>> care about your cluster working when your server room is on fire? 'Cause 
>>>> unless you have halon suppression, your server room is a complete 
>>>> write-off anyway. (Think water from sprinklers hitting rich chunky volts 
>>>> in the servers.)
>>>>  
>>>> Dima
>>>>  
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>>> http://lists.clusterlabs.org/mailman/listinfo/users <http://lists.clusterlabs.org/mailman/listinfo/users>
>>>>  
>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>>> http://lists.clusterlabs.org/mailman/listinfo/users <http://lists.clusterlabs.org/mailman/listinfo/users>
>>>>  
>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>>  
>>>> -- 
>>>> Klaus Wenninger
>>>>  
>>>> Senior Software Engineer, EMEA ENG Openstack Infrastructure
>>>>  
>>>> Red Hat
>>>>  
>>>> kwenning at redhat.com <mailto:kwenning at redhat.com>   
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>>> http://lists.clusterlabs.org/mailman/listinfo/users <http://lists.clusterlabs.org/mailman/listinfo/users>
>>>> 
>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>>  
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>>> http://lists.clusterlabs.org/mailman/listinfo/users <http://lists.clusterlabs.org/mailman/listinfo/users>
>>>>  
>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>>   
>>> 
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>> http://lists.clusterlabs.org/mailman/listinfo/users <http://lists.clusterlabs.org/mailman/listinfo/users>
>>> 
>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170724/3596a2b1/attachment-0003.html>