[ClusterLabs] Two nodes cluster issue

Mon Jul 24 18:01:07 UTC 2017

On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
>
> Sometimes IPMI fence devices use shared power of the node, and it
> cannot be avoided.
>
> In such scenarios the HA cluster is NOT able to handle the power
> failure of a node, since the power is shared with its own fence device.
>
> The failure of IPMI based fencing can also exist due to other reasons
> also.
>
>  
>
> A failure to fence the failed node will cause cluster to be marked
> UNCLEAN.
>
> To get over it, the following command needs to be invoked on the
> surviving node.
>
>  
>
> pcs stonith confirm <failed_node_name> --force
>
>  
>
> This can be automated by hooking a recovery script, when the the
> Stonith resource ‘Timed Out’ event.
>
> To be more specific, the Pacemaker Alerts can be used for watch for
> Stonith timeouts and failures.
>
> In that script, all that’s essentially to be executed is the
> aforementioned command.
>

If I get you right here you can disable fencing then in the first place.
Actually quorum-based-watchdog-fencing is the way to do this in a
safe manner. This of course assumes you have a proper source for
quorum in your 2-node-setup with e.g. qdevice or using a shared
disk with sbd (not directly pacemaker quorum here but similar thing
handled inside sbd).

> Since the alerts are issued from ‘hacluster’ login, sudo permissions
> for ‘hacluster’ needs to be configured.
>
>  
>
> Thanx.
>
>  
>
>  
>
> *From:*Klaus Wenninger [mailto:kwenning at redhat.com]
> *Sent:* Monday, July 24, 2017 9:24 PM
> *To:* Kristián Feldsam; Cluster Labs - All topics related to
> open-source clustering welcomed
> *Subject:* Re: [ClusterLabs] Two nodes cluster issue
>
>  
>
> On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
>
>     I personally think that power off node by switched pdu is more
>     safe, or not?
>
>
> True if that is working in you environment. If you can't do a physical
> setup
> where you aren't simultaneously loosing connection to both your node and
> the switch-device (or you just want to cover cases where that happens)
> you have to come up with something else.
>
>
>
> S pozdravem Kristián Feldsam
> Tel.: +420 773 303 353, +421 944 137 535
> E-mail.: support at feldhost.cz <mailto:support at feldhost.cz>
>
> www.feldhost.cz <http://www.feldhost.cz> - *Feld*Host™ – profesionální
> hostingové a serverové služby za adekvátní ceny.
>
> FELDSAM s.r.o.
> V rohu 434/3
> Praha 4 – Libuš, PSČ 142 00
> IČ: 290 60 958, DIČ: CZ290 60 958
> C 200350 vedená u Městského soudu v Praze
>
> Banka: Fio banka a.s.
> Číslo účtu: 2400330446/2010
> BIC: FIOBCZPPXX
> IBAN: CZ82 2010 0000 0024 0033 0446
>
>  
>
>     On 24 Jul 2017, at 17:27, Klaus Wenninger <kwenning at redhat.com
>     <mailto:kwenning at redhat.com>> wrote:
>
>      
>
>     On 07/24/2017 05:15 PM, Tomer Azran wrote:
>
>         I still don't understand why the qdevice concept doesn't help
>         on this situation. Since the master node is down, I would
>         expect the quorum to declare it as dead.
>
>         Why doesn't it happens?
>
>
>     That is not how quorum works. It just limits the decision-making
>     to the quorate subset of the cluster.
>     Still the unknown nodes are not sure to be down.
>     That is why I suggested to have quorum-based watchdog-fencing with
>     sbd.
>     That would assure that within a certain time all nodes of the
>     non-quorate part
>     of the cluster are down.
>
>
>
>
>     On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri
>     Maziuk" <dmitri.maziuk at gmail.com
>     <mailto:dmitri.maziuk at gmail.com>> wrote:
>
>     On 2017-07-24 07:51, Tomer Azran wrote:
>
>     > We don't have the ability to use it.
>
>     > Is that the only solution?
>
>      
>
>     No, but I'd recommend thinking about it first. Are you sure you will 
>
>     care about your cluster working when your server room is on fire? 'Cause 
>
>     unless you have halon suppression, your server room is a complete 
>
>     write-off anyway. (Think water from sprinklers hitting rich chunky volts 
>
>     in the servers.)
>
>      
>
>     Dima
>
>      
>
>     _______________________________________________
>
>     Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>
>     http://lists.clusterlabs.org/mailman/listinfo/users
>
>      
>
>     Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>
>     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>
>     Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>
>
>
>
>     _______________________________________________
>
>     Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>
>     http://lists.clusterlabs.org/mailman/listinfo/users
>
>      
>
>     Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>
>     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>
>     Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>
>      
>
>     -- 
>
>     Klaus Wenninger
>
>      
>
>     Senior Software Engineer, EMEA ENG Openstack Infrastructure
>
>      
>
>     Red Hat
>
>      
>
>     kwenning at redhat.com <mailto:kwenning at redhat.com>  
>
>     _______________________________________________
>     Users mailing list: Users at clusterlabs.org
>     <mailto:Users at clusterlabs.org>
>     http://lists.clusterlabs.org/mailman/listinfo/users
>
>     Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>     Getting
>     started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>     Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>
>  
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170724/d4318e3d/attachment-0002.html>