[ClusterLabs] Two nodes cluster issue

Mon Jul 24 15:16:22 EDT 2017

On 07/24/2017 08:27 PM, Prasad, Shashank wrote:
>
> My understanding is that  SBD will need a shared storage between
> clustered nodes.
>
> And that, SBD will need at least 3 nodes in a cluster, if using w/o
> shared storage.
>

Haven't tried to be honest but reason for 3 nodes is that without
shared disk you need a real quorum-source and not something
'faked' as with 2-node-feature in corosync.
But I don't see anything speaking against getting the proper
quorum via qdevice instead with a third full cluster-node.

>  
>
> Therefore, for systems which do NOT use shared storage between 1+1 HA
> clustered nodes, SBD may NOT be an option.
>
> Correct me, if I am wrong.
>
>  
>
> For cluster systems using the likes of iDRAC/IMM2 fencing agents,
> which have redundant but shared power supply units with the nodes, the
> normal fencing mechanisms should work for all resiliency scenarios,
> but for IMM2/iDRAC are being NOT reachable for whatsoever reasons.
> And, to bail out of those situations in the absence of SBD, I believe
> using used-defined failover hooks (via scripts) into Pacemaker Alerts,
> with sudo permissions for ‘hacluster’, should help.
>

If you don't see your fencing device assuming after some time
the the corresponding node will probably be down is quite risky
in my opinion.
But why not assure it to be down using a watchdog?

>  
>
> Thanx.
>
>  
>
>  
>
> *From:*Klaus Wenninger [mailto:kwenning at redhat.com]
> *Sent:* Monday, July 24, 2017 11:31 PM
> *To:* Cluster Labs - All topics related to open-source clustering
> welcomed; Prasad, Shashank
> *Subject:* Re: [ClusterLabs] Two nodes cluster issue
>
>  
>
> On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
>
>     Sometimes IPMI fence devices use shared power of the node, and it
>     cannot be avoided.
>
>     In such scenarios the HA cluster is NOT able to handle the power
>     failure of a node, since the power is shared with its own fence
>     device.
>
>     The failure of IPMI based fencing can also exist due to other
>     reasons also.
>
>      
>
>     A failure to fence the failed node will cause cluster to be marked
>     UNCLEAN.
>
>     To get over it, the following command needs to be invoked on the
>     surviving node.
>
>      
>
>     pcs stonith confirm <failed_node_name> --force
>
>      
>
>     This can be automated by hooking a recovery script, when the the
>     Stonith resource ‘Timed Out’ event.
>
>     To be more specific, the Pacemaker Alerts can be used for watch
>     for Stonith timeouts and failures.
>
>     In that script, all that’s essentially to be executed is the
>     aforementioned command.
>
>
> If I get you right here you can disable fencing then in the first place.
> Actually quorum-based-watchdog-fencing is the way to do this in a
> safe manner. This of course assumes you have a proper source for
> quorum in your 2-node-setup with e.g. qdevice or using a shared
> disk with sbd (not directly pacemaker quorum here but similar thing
> handled inside sbd).
>
>
> Since the alerts are issued from ‘hacluster’ login, sudo permissions
> for ‘hacluster’ needs to be configured.
>
>  
>
> Thanx.
>
>  
>
>  
>
> *From:*Klaus Wenninger [mailto:kwenning at redhat.com]
> *Sent:* Monday, July 24, 2017 9:24 PM
> *To:* Kristián Feldsam; Cluster Labs - All topics related to
> open-source clustering welcomed
> *Subject:* Re: [ClusterLabs] Two nodes cluster issue
>
>  
>
> On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
>
>     I personally think that power off node by switched pdu is more
>     safe, or not?
>
>
> True if that is working in you environment. If you can't do a physical
> setup
> where you aren't simultaneously loosing connection to both your node and
> the switch-device (or you just want to cover cases where that happens)
> you have to come up with something else.
>
>
>
>
> S pozdravem Kristián Feldsam
> Tel.: +420 773 303 353, +421 944 137 535
> E-mail.: support at feldhost.cz <mailto:support at feldhost.cz>
>
> www.feldhost.cz <http://www.feldhost.cz> - *Feld*Host™ – profesionální
> hostingové a serverové služby za adekvátní ceny.
>
> FELDSAM s.r.o.
> V rohu 434/3
> Praha 4 – Libuš, PSČ 142 00
> IČ: 290 60 958, DIČ: CZ290 60 958
> C 200350 vedená u Městského soudu v Praze
>
> Banka: Fio banka a.s.
> Číslo účtu: 2400330446/2010
> BIC: FIOBCZPPXX
> IBAN: CZ82 2010 0000 0024 0033 0446
>
>  
>
>     On 24 Jul 2017, at 17:27, Klaus Wenninger <kwenning at redhat.com
>     <mailto:kwenning at redhat.com>> wrote:
>
>      
>
>     On 07/24/2017 05:15 PM, Tomer Azran wrote:
>
>         I still don't understand why the qdevice concept doesn't help
>         on this situation. Since the master node is down, I would
>         expect the quorum to declare it as dead.
>
>         Why doesn't it happens?
>
>
>     That is not how quorum works. It just limits the decision-making
>     to the quorate subset of the cluster.
>     Still the unknown nodes are not sure to be down.
>     That is why I suggested to have quorum-based watchdog-fencing with
>     sbd.
>     That would assure that within a certain time all nodes of the
>     non-quorate part
>     of the cluster are down.
>
>
>
>
>
>
>     On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri
>     Maziuk" <dmitri.maziuk at gmail.com
>     <mailto:dmitri.maziuk at gmail.com>> wrote:
>
>     On 2017-07-24 07:51, Tomer Azran wrote:
>
>     > We don't have the ability to use it.
>
>     > Is that the only solution?
>
>      
>
>     No, but I'd recommend thinking about it first. Are you sure you will 
>
>     care about your cluster working when your server room is on fire? 'Cause 
>
>     unless you have halon suppression, your server room is a complete 
>
>     write-off anyway. (Think water from sprinklers hitting rich chunky volts 
>
>     in the servers.)
>
>      
>
>     Dima
>
>      
>
>     _______________________________________________
>
>     Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>
>     http://lists.clusterlabs.org/mailman/listinfo/users
>
>      
>
>     Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>
>     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>
>     Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>
>
>
>
>
>     _______________________________________________
>
>     Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>
>     http://lists.clusterlabs.org/mailman/listinfo/users
>
>      
>
>     Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>
>     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>
>     Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>
>      
>
>     -- 
>
>     Klaus Wenninger
>
>      
>
>     Senior Software Engineer, EMEA ENG Openstack Infrastructure
>
>      
>
>     Red Hat
>
>      
>
>     kwenning at redhat.com <mailto:kwenning at redhat.com>  
>
>     _______________________________________________
>     Users mailing list: Users at clusterlabs.org
>     <mailto:Users at clusterlabs.org>
>     http://lists.clusterlabs.org/mailman/listinfo/users
>
>     Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>     Getting
>     started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>     Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>
>  
>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
> http://lists.clusterlabs.org/mailman/listinfo/users
>  
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170724/5cc1176f/attachment-0003.html>