[ClusterLabs] Two nodes cluster issue
Klaus Wenninger
kwenning at redhat.com
Mon Jul 24 14:01:07 EDT 2017
On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
>
> Sometimes IPMI fence devices use shared power of the node, and it
> cannot be avoided.
>
> In such scenarios the HA cluster is NOT able to handle the power
> failure of a node, since the power is shared with its own fence device.
>
> The failure of IPMI based fencing can also exist due to other reasons
> also.
>
>
>
> A failure to fence the failed node will cause cluster to be marked
> UNCLEAN.
>
> To get over it, the following command needs to be invoked on the
> surviving node.
>
>
>
> pcs stonith confirm <failed_node_name> --force
>
>
>
> This can be automated by hooking a recovery script, when the the
> Stonith resource ‘Timed Out’ event.
>
> To be more specific, the Pacemaker Alerts can be used for watch for
> Stonith timeouts and failures.
>
> In that script, all that’s essentially to be executed is the
> aforementioned command.
>
If I get you right here you can disable fencing then in the first place.
Actually quorum-based-watchdog-fencing is the way to do this in a
safe manner. This of course assumes you have a proper source for
quorum in your 2-node-setup with e.g. qdevice or using a shared
disk with sbd (not directly pacemaker quorum here but similar thing
handled inside sbd).
> Since the alerts are issued from ‘hacluster’ login, sudo permissions
> for ‘hacluster’ needs to be configured.
>
>
>
> Thanx.
>
>
>
>
>
> *From:*Klaus Wenninger [mailto:kwenning at redhat.com]
> *Sent:* Monday, July 24, 2017 9:24 PM
> *To:* Kristián Feldsam; Cluster Labs - All topics related to
> open-source clustering welcomed
> *Subject:* Re: [ClusterLabs] Two nodes cluster issue
>
>
>
> On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
>
> I personally think that power off node by switched pdu is more
> safe, or not?
>
>
> True if that is working in you environment. If you can't do a physical
> setup
> where you aren't simultaneously loosing connection to both your node and
> the switch-device (or you just want to cover cases where that happens)
> you have to come up with something else.
>
>
>
> S pozdravem Kristián Feldsam
> Tel.: +420 773 303 353, +421 944 137 535
> E-mail.: support at feldhost.cz <mailto:support at feldhost.cz>
>
> www.feldhost.cz <http://www.feldhost.cz> - *Feld*Host™ – profesionální
> hostingové a serverové služby za adekvátní ceny.
>
> FELDSAM s.r.o.
> V rohu 434/3
> Praha 4 – Libuš, PSČ 142 00
> IČ: 290 60 958, DIČ: CZ290 60 958
> C 200350 vedená u Městského soudu v Praze
>
> Banka: Fio banka a.s.
> Číslo účtu: 2400330446/2010
> BIC: FIOBCZPPXX
> IBAN: CZ82 2010 0000 0024 0033 0446
>
>
>
> On 24 Jul 2017, at 17:27, Klaus Wenninger <kwenning at redhat.com
> <mailto:kwenning at redhat.com>> wrote:
>
>
>
> On 07/24/2017 05:15 PM, Tomer Azran wrote:
>
> I still don't understand why the qdevice concept doesn't help
> on this situation. Since the master node is down, I would
> expect the quorum to declare it as dead.
>
> Why doesn't it happens?
>
>
> That is not how quorum works. It just limits the decision-making
> to the quorate subset of the cluster.
> Still the unknown nodes are not sure to be down.
> That is why I suggested to have quorum-based watchdog-fencing with
> sbd.
> That would assure that within a certain time all nodes of the
> non-quorate part
> of the cluster are down.
>
>
>
>
> On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri
> Maziuk" <dmitri.maziuk at gmail.com
> <mailto:dmitri.maziuk at gmail.com>> wrote:
>
> On 2017-07-24 07:51, Tomer Azran wrote:
>
> > We don't have the ability to use it.
>
> > Is that the only solution?
>
>
>
> No, but I'd recommend thinking about it first. Are you sure you will
>
> care about your cluster working when your server room is on fire? 'Cause
>
> unless you have halon suppression, your server room is a complete
>
> write-off anyway. (Think water from sprinklers hitting rich chunky volts
>
> in the servers.)
>
>
>
> Dima
>
>
>
> _______________________________________________
>
> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>
> http://lists.clusterlabs.org/mailman/listinfo/users
>
>
>
> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>
> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>
>
>
>
> _______________________________________________
>
> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>
> http://lists.clusterlabs.org/mailman/listinfo/users
>
>
>
> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>
> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>
>
>
> --
>
> Klaus Wenninger
>
>
>
> Senior Software Engineer, EMEA ENG Openstack Infrastructure
>
>
>
> Red Hat
>
>
>
> kwenning at redhat.com <mailto:kwenning at redhat.com>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> <mailto:Users at clusterlabs.org>
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
> Getting
> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>
>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170724/d4318e3d/attachment-0003.html>
More information about the Users
mailing list