[ClusterLabs] Two nodes cluster issue
Klaus Wenninger
kwenning at redhat.com
Mon Jul 24 16:18:48 EDT 2017
On 07/24/2017 09:46 PM, Kristián Feldsam wrote:
> so why to use some other fencing method like disablink port on switch,
> so nobody can acces faultly node and write data to it. it is common
> practice too.
Well don't get me wrong here. I don't want to hard-sell sbd.
Just though that very likely requirements that prevent usage
of a remote-controlled power-switch will make access
to a switch to disable the ports unusable as well.
And if a working qdevice setup is there already the gap between
what he thought he would get from qdevice and what he actually
had just matches exactly quorum-based-watchdog-fencing.
But you are of course right.
I don't really know the scenario.
Maybe fabric fencing is the perfect match - good to mention it
here as a possibility.
Regards,
Klaus
>
> S pozdravem Kristián Feldsam
> Tel.: +420 773 303 353, +421 944 137 535
> E-mail.: support at feldhost.cz <mailto:support at feldhost.cz>
>
> www.feldhost.cz <http://www.feldhost.cz> - FeldHost™ – profesionální
> hostingové a serverové služby za adekvátní ceny.
>
> FELDSAM s.r.o.
> V rohu 434/3
> Praha 4 – Libuš, PSČ 142 00
> IČ: 290 60 958, DIČ: CZ290 60 958
> C 200350 vedená u Městského soudu v Praze
>
> Banka: Fio banka a.s.
> Číslo účtu: 2400330446/2010
> BIC: FIOBCZPPXX
> IBAN: CZ82 2010 0000 0024 0033 0446
>
>> On 24 Jul 2017, at 21:16, Klaus Wenninger <kwenning at redhat.com
>> <mailto:kwenning at redhat.com>> wrote:
>>
>> On 07/24/2017 08:27 PM, Prasad, Shashank wrote:
>>> My understanding is that SBD will need a shared storage between
>>> clustered nodes.
>>> And that, SBD will need at least 3 nodes in a cluster, if using w/o
>>> shared storage.
>>
>> Haven't tried to be honest but reason for 3 nodes is that without
>> shared disk you need a real quorum-source and not something
>> 'faked' as with 2-node-feature in corosync.
>> But I don't see anything speaking against getting the proper
>> quorum via qdevice instead with a third full cluster-node.
>>
>>>
>>> Therefore, for systems which do NOT use shared storage between 1+1
>>> HA clustered nodes, SBD may NOT be an option.
>>> Correct me, if I am wrong.
>>>
>>> For cluster systems using the likes of iDRAC/IMM2 fencing agents,
>>> which have redundant but shared power supply units with the nodes,
>>> the normal fencing mechanisms should work for all resiliency
>>> scenarios, but for IMM2/iDRAC are being NOT reachable for whatsoever
>>> reasons. And, to bail out of those situations in the absence of SBD,
>>> I believe using used-defined failover hooks (via scripts) into
>>> Pacemaker Alerts, with sudo permissions for ‘hacluster’, should help.
>>
>> If you don't see your fencing device assuming after some time
>> the the corresponding node will probably be down is quite risky
>> in my opinion.
>> But why not assure it to be down using a watchdog?
>>
>>>
>>> Thanx.
>>>
>>>
>>> *From:* Klaus Wenninger [mailto:kwenning at redhat.com]
>>> *Sent:* Monday, July 24, 2017 11:31 PM
>>> *To:* Cluster Labs - All topics related to open-source clustering
>>> welcomed; Prasad, Shashank
>>> *Subject:* Re: [ClusterLabs] Two nodes cluster issue
>>>
>>> On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
>>>
>>> Sometimes IPMI fence devices use shared power of the node, and
>>> it cannot be avoided.
>>> In such scenarios the HA cluster is NOT able to handle the power
>>> failure of a node, since the power is shared with its own fence
>>> device.
>>> The failure of IPMI based fencing can also exist due to other
>>> reasons also.
>>>
>>> A failure to fence the failed node will cause cluster to be
>>> marked UNCLEAN.
>>> To get over it, the following command needs to be invoked on the
>>> surviving node.
>>>
>>> pcs stonith confirm <failed_node_name> --force
>>>
>>> This can be automated by hooking a recovery script, when the the
>>> Stonith resource ‘Timed Out’ event.
>>> To be more specific, the Pacemaker Alerts can be used for watch
>>> for Stonith timeouts and failures.
>>> In that script, all that’s essentially to be executed is the
>>> aforementioned command.
>>>
>>>
>>> If I get you right here you can disable fencing then in the first place.
>>> Actually quorum-based-watchdog-fencing is the way to do this in a
>>> safe manner. This of course assumes you have a proper source for
>>> quorum in your 2-node-setup with e.g. qdevice or using a shared
>>> disk with sbd (not directly pacemaker quorum here but similar thing
>>> handled inside sbd).
>>>
>>>
>>> Since the alerts are issued from ‘hacluster’ login, sudo permissions
>>> for ‘hacluster’ needs to be configured.
>>>
>>> Thanx.
>>>
>>>
>>> *From:* Klaus Wenninger [mailto:kwenning at redhat.com]
>>> *Sent:* Monday, July 24, 2017 9:24 PM
>>> *To:* Kristián Feldsam; Cluster Labs - All topics related to
>>> open-source clustering welcomed
>>> *Subject:* Re: [ClusterLabs] Two nodes cluster issue
>>>
>>> On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
>>>
>>> I personally think that power off node by switched pdu is more
>>> safe, or not?
>>>
>>>
>>> True if that is working in you environment. If you can't do a
>>> physical setup
>>> where you aren't simultaneously loosing connection to both your node and
>>> the switch-device (or you just want to cover cases where that happens)
>>> you have to come up with something else.
>>>
>>>
>>>
>>>
>>> S pozdravem Kristián Feldsam
>>> Tel.: +420 773 303 353, +421 944 137 535
>>> E-mail.: support at feldhost.cz <mailto:support at feldhost.cz>
>>>
>>> www.feldhost.cz <http://www.feldhost.cz/> - *Feld*Host™ –
>>> profesionální hostingové a serverové služby za adekvátní ceny.
>>>
>>> FELDSAM s.r.o.
>>> V rohu 434/3
>>> Praha 4 – Libuš, PSČ 142 00
>>> IČ: 290 60 958, DIČ: CZ290 60 958
>>> C 200350 vedená u Městského soudu v Praze
>>>
>>> Banka: Fio banka a.s.
>>> Číslo účtu: 2400330446/2010
>>> BIC: FIOBCZPPXX
>>> IBAN: CZ82 2010 0000 0024 0033 0446
>>>
>>>
>>> On 24 Jul 2017, at 17:27, Klaus Wenninger <kwenning at redhat.com
>>> <mailto:kwenning at redhat.com>> wrote:
>>>
>>> On 07/24/2017 05:15 PM, Tomer Azran wrote:
>>>
>>> I still don't understand why the qdevice concept doesn't
>>> help on this situation. Since the master node is down, I
>>> would expect the quorum to declare it as dead.
>>> Why doesn't it happens?
>>>
>>>
>>> That is not how quorum works. It just limits the decision-making
>>> to the quorate subset of the cluster.
>>> Still the unknown nodes are not sure to be down.
>>> That is why I suggested to have quorum-based watchdog-fencing
>>> with sbd.
>>> That would assure that within a certain time all nodes of the
>>> non-quorate part
>>> of the cluster are down.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri
>>> Maziuk" <dmitri.maziuk at gmail.com
>>> <mailto:dmitri.maziuk at gmail.com>> wrote:
>>>
>>> On 2017-07-24 07:51, Tomer Azran wrote:
>>>
>>> > We don't have the ability to use it.
>>>
>>> > Is that the only solution?
>>>
>>>
>>>
>>> No, but I'd recommend thinking about it first. Are you sure you will
>>>
>>> care about your cluster working when your server room is on fire? 'Cause
>>>
>>> unless you have halon suppression, your server room is a complete
>>>
>>> write-off anyway. (Think water from sprinklers hitting rich chunky volts
>>>
>>> in the servers.)
>>>
>>>
>>>
>>> Dima
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>>
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>>
>>>
>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>>
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>
>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>>
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>>
>>>
>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>>
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>
>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>
>>>
>>>
>>> --
>>>
>>> Klaus Wenninger
>>>
>>>
>>>
>>> Senior Software Engineer, EMEA ENG Openstack Infrastructure
>>>
>>>
>>>
>>> Red Hat
>>>
>>>
>>>
>>> kwenning at redhat.com <mailto:kwenning at redhat.com>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> <mailto:Users at clusterlabs.org>
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> <http://www.clusterlabs.org/>
>>> Getting
>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170724/d7acb87d/attachment-0003.html>
More information about the Users
mailing list