[ClusterLabs] Two nodes cluster issue

Klaus Wenninger kwenning at redhat.com
Mon Jul 24 16:18:48 EDT 2017


On 07/24/2017 09:46 PM, Kristián Feldsam wrote:
> so why to use some other fencing method like disablink port on switch,
> so nobody can acces faultly node and write data to it. it is common
> practice too.

Well don't get me wrong here. I don't want to hard-sell sbd.
Just though that very likely requirements that prevent usage
of a remote-controlled power-switch will make access
to a switch to disable the ports unusable as well.
And if a working qdevice setup is there already the gap between
what he thought he would get from qdevice and what he actually
had just matches exactly quorum-based-watchdog-fencing.

But you are of course right.
I don't really know the scenario.
Maybe fabric fencing is the perfect match - good to mention it
here as a possibility.

Regards,
Klaus
 
>
> S pozdravem Kristián Feldsam
> Tel.: +420 773 303 353, +421 944 137 535
> E-mail.: support at feldhost.cz <mailto:support at feldhost.cz>
>
> www.feldhost.cz <http://www.feldhost.cz> - FeldHost™ – profesionální
> hostingové a serverové služby za adekvátní ceny.
>
> FELDSAM s.r.o.
> V rohu 434/3
> Praha 4 – Libuš, PSČ 142 00
> IČ: 290 60 958, DIČ: CZ290 60 958
> C 200350 vedená u Městského soudu v Praze
>
> Banka: Fio banka a.s.
> Číslo účtu: 2400330446/2010
> BIC: FIOBCZPPXX
> IBAN: CZ82 2010 0000 0024 0033 0446
>
>> On 24 Jul 2017, at 21:16, Klaus Wenninger <kwenning at redhat.com
>> <mailto:kwenning at redhat.com>> wrote:
>>
>> On 07/24/2017 08:27 PM, Prasad, Shashank wrote:
>>> My understanding is that  SBD will need a shared storage between
>>> clustered nodes.
>>> And that, SBD will need at least 3 nodes in a cluster, if using w/o
>>> shared storage.
>>
>> Haven't tried to be honest but reason for 3 nodes is that without
>> shared disk you need a real quorum-source and not something
>> 'faked' as with 2-node-feature in corosync.
>> But I don't see anything speaking against getting the proper
>> quorum via qdevice instead with a third full cluster-node.
>>
>>>  
>>> Therefore, for systems which do NOT use shared storage between 1+1
>>> HA clustered nodes, SBD may NOT be an option.
>>> Correct me, if I am wrong.
>>>  
>>> For cluster systems using the likes of iDRAC/IMM2 fencing agents,
>>> which have redundant but shared power supply units with the nodes,
>>> the normal fencing mechanisms should work for all resiliency
>>> scenarios, but for IMM2/iDRAC are being NOT reachable for whatsoever
>>> reasons. And, to bail out of those situations in the absence of SBD,
>>> I believe using used-defined failover hooks (via scripts) into
>>> Pacemaker Alerts, with sudo permissions for ‘hacluster’, should help.
>>
>> If you don't see your fencing device assuming after some time
>> the the corresponding node will probably be down is quite risky
>> in my opinion.
>> But why not assure it to be down using a watchdog?
>>
>>>  
>>> Thanx.
>>>  
>>>  
>>> *From:* Klaus Wenninger [mailto:kwenning at redhat.com] 
>>> *Sent:* Monday, July 24, 2017 11:31 PM
>>> *To:* Cluster Labs - All topics related to open-source clustering
>>> welcomed; Prasad, Shashank
>>> *Subject:* Re: [ClusterLabs] Two nodes cluster issue
>>>  
>>> On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
>>>
>>>     Sometimes IPMI fence devices use shared power of the node, and
>>>     it cannot be avoided.
>>>     In such scenarios the HA cluster is NOT able to handle the power
>>>     failure of a node, since the power is shared with its own fence
>>>     device.
>>>     The failure of IPMI based fencing can also exist due to other
>>>     reasons also.
>>>      
>>>     A failure to fence the failed node will cause cluster to be
>>>     marked UNCLEAN.
>>>     To get over it, the following command needs to be invoked on the
>>>     surviving node.
>>>      
>>>     pcs stonith confirm <failed_node_name> --force
>>>      
>>>     This can be automated by hooking a recovery script, when the the
>>>     Stonith resource ‘Timed Out’ event.
>>>     To be more specific, the Pacemaker Alerts can be used for watch
>>>     for Stonith timeouts and failures.
>>>     In that script, all that’s essentially to be executed is the
>>>     aforementioned command.
>>>
>>>
>>> If I get you right here you can disable fencing then in the first place.
>>> Actually quorum-based-watchdog-fencing is the way to do this in a
>>> safe manner. This of course assumes you have a proper source for
>>> quorum in your 2-node-setup with e.g. qdevice or using a shared
>>> disk with sbd (not directly pacemaker quorum here but similar thing
>>> handled inside sbd).
>>>
>>>
>>> Since the alerts are issued from ‘hacluster’ login, sudo permissions
>>> for ‘hacluster’ needs to be configured.
>>>  
>>> Thanx.
>>>  
>>>  
>>> *From:* Klaus Wenninger [mailto:kwenning at redhat.com] 
>>> *Sent:* Monday, July 24, 2017 9:24 PM
>>> *To:* Kristián Feldsam; Cluster Labs - All topics related to
>>> open-source clustering welcomed
>>> *Subject:* Re: [ClusterLabs] Two nodes cluster issue
>>>  
>>> On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
>>>
>>>     I personally think that power off node by switched pdu is more
>>>     safe, or not?
>>>
>>>
>>> True if that is working in you environment. If you can't do a
>>> physical setup
>>> where you aren't simultaneously loosing connection to both your node and
>>> the switch-device (or you just want to cover cases where that happens)
>>> you have to come up with something else.
>>>
>>>
>>>
>>>
>>> S pozdravem Kristián Feldsam
>>> Tel.: +420 773 303 353, +421 944 137 535
>>> E-mail.: support at feldhost.cz <mailto:support at feldhost.cz>
>>>
>>> www.feldhost.cz <http://www.feldhost.cz/> - *Feld*Host™ –
>>> profesionální hostingové a serverové služby za adekvátní ceny.
>>>
>>> FELDSAM s.r.o.
>>> V rohu 434/3
>>> Praha 4 – Libuš, PSČ 142 00
>>> IČ: 290 60 958, DIČ: CZ290 60 958
>>> C 200350 vedená u Městského soudu v Praze
>>>
>>> Banka: Fio banka a.s.
>>> Číslo účtu: 2400330446/2010
>>> BIC: FIOBCZPPXX
>>> IBAN: CZ82 2010 0000 0024 0033 0446
>>>  
>>>
>>>     On 24 Jul 2017, at 17:27, Klaus Wenninger <kwenning at redhat.com
>>>     <mailto:kwenning at redhat.com>> wrote:
>>>      
>>>     On 07/24/2017 05:15 PM, Tomer Azran wrote:
>>>
>>>         I still don't understand why the qdevice concept doesn't
>>>         help on this situation. Since the master node is down, I
>>>         would expect the quorum to declare it as dead.
>>>         Why doesn't it happens?
>>>
>>>
>>>     That is not how quorum works. It just limits the decision-making
>>>     to the quorate subset of the cluster.
>>>     Still the unknown nodes are not sure to be down.
>>>     That is why I suggested to have quorum-based watchdog-fencing
>>>     with sbd.
>>>     That would assure that within a certain time all nodes of the
>>>     non-quorate part
>>>     of the cluster are down.
>>>
>>>
>>>
>>>
>>>
>>>
>>>     On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri
>>>     Maziuk" <dmitri.maziuk at gmail.com
>>>     <mailto:dmitri.maziuk at gmail.com>> wrote:
>>>
>>>     On 2017-07-24 07:51, Tomer Azran wrote:
>>>
>>>     > We don't have the ability to use it.
>>>
>>>     > Is that the only solution?
>>>
>>>      
>>>
>>>     No, but I'd recommend thinking about it first. Are you sure you will 
>>>
>>>     care about your cluster working when your server room is on fire? 'Cause 
>>>
>>>     unless you have halon suppression, your server room is a complete 
>>>
>>>     write-off anyway. (Think water from sprinklers hitting rich chunky volts 
>>>
>>>     in the servers.)
>>>
>>>      
>>>
>>>     Dima
>>>
>>>      
>>>
>>>     _______________________________________________
>>>
>>>     Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>>
>>>     http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>>      
>>>
>>>     Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>>
>>>     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>
>>>     Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>
>>>
>>>
>>>
>>>
>>>     _______________________________________________
>>>
>>>     Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>>
>>>     http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>>      
>>>
>>>     Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>>
>>>     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>
>>>     Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>
>>>      
>>>
>>>     -- 
>>>
>>>     Klaus Wenninger
>>>
>>>      
>>>
>>>     Senior Software Engineer, EMEA ENG Openstack Infrastructure
>>>
>>>      
>>>
>>>     Red Hat
>>>
>>>      
>>>
>>>     kwenning at redhat.com <mailto:kwenning at redhat.com>  
>>>
>>>     _______________________________________________
>>>     Users mailing list: Users at clusterlabs.org
>>>     <mailto:Users at clusterlabs.org>
>>>     http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>>     Project Home: http://www.clusterlabs.org
>>>     <http://www.clusterlabs.org/>
>>>     Getting
>>>     started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>     Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>
>>>  
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>  
>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>   
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170724/d7acb87d/attachment-0003.html>


More information about the Users mailing list