[ClusterLabs] Two nodes cluster issue

Klaus Wenninger kwenning at redhat.com
Mon Jul 24 19:19:11 EDT 2017


On 07/24/2017 11:59 PM, Tomer Azran wrote:
>
> There is a problem with that – it seems like SBD with shared disk is
> disabled on CentOS 7.3:
>
>  
>
> When I run:
>
> # sbd -d /dev/sbd create
>
>  
>
> I get:
>
> Shared disk functionality not supported
>

Which is why I suggested to go for watchdog-fencing using
your qdevice setup.
As said I haven't tried with qdevice-quorum - but I don't
see a reason why that shouldn't work.
no-quorum-policy has to be suicide of course.

>  
>
> So I might try the software watchdog (softgod or ipmi_watchdog)
>

A reliable watchdog is really crucial for sbd so I would
recommend going for ipmi or anything else that has
hardware behind.

Klaus
>
>  
>
> Tomer.
>
>  
>
> *From:*Tomer Azran [mailto:tomer.azran at edp.co.il]
> *Sent:* Tuesday, July 25, 2017 12:30 AM
> *To:* kwenning at redhat.com; Cluster Labs - All topics related to
> open-source clustering welcomed <users at clusterlabs.org>; Prasad,
> Shashank <ssprasad at vanu.com>
> *Subject:* Re: [ClusterLabs] Two nodes cluster issue
>
>  
>
> I tend to agree with Klaus – I don't think that having a hook that
> bypass stonith is the right way. It is better to not use stonith at all.
>

That was of course with a certain degree of hyperbolism. Anything is of
course better than not having
fencing at all.
I might be wrong but what you were saying somehow was drawing a picture
in my mind that you
have your 2 nodes at 2 sites/rooms quite separated and in that case ...

> I think I will try to use an iScsi target on my qdevice and set SBD to
> use it.
>
> I still don't understand why qdevice can't take the place SBD with
> shared storage; correct me if I'm wrong, but it looks like both of
> them are there for the same reason.
>

sbd with watchdog + qdevice can take the place of sbd with shared storage.
qdevice is there to decide which part of a cluster is quorate and which
not - in cases
where after a split this wouldn't be possible.
sbd (with watchdog) is then there to reliably take down the non-quorate part
within a well defined time.

>  
>
> *From:*Klaus Wenninger [mailto:kwenning at redhat.com]
> *Sent:* Monday, July 24, 2017 9:01 PM
> *To:* Cluster Labs - All topics related to open-source clustering
> welcomed <users at clusterlabs.org <mailto:users at clusterlabs.org>>;
> Prasad, Shashank <ssprasad at vanu.com <mailto:ssprasad at vanu.com>>
> *Subject:* Re: [ClusterLabs] Two nodes cluster issue
>
>  
>
> On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
>
>     Sometimes IPMI fence devices use shared power of the node, and it
>     cannot be avoided.
>
>     In such scenarios the HA cluster is NOT able to handle the power
>     failure of a node, since the power is shared with its own fence
>     device.
>
>     The failure of IPMI based fencing can also exist due to other
>     reasons also.
>
>      
>
>     A failure to fence the failed node will cause cluster to be marked
>     UNCLEAN.
>
>     To get over it, the following command needs to be invoked on the
>     surviving node.
>
>      
>
>     pcs stonith confirm <failed_node_name> --force
>
>      
>
>     This can be automated by hooking a recovery script, when the the
>     Stonith resource ‘Timed Out’ event.
>
>     To be more specific, the Pacemaker Alerts can be used for watch
>     for Stonith timeouts and failures.
>
>     In that script, all that’s essentially to be executed is the
>     aforementioned command.
>
>
> If I get you right here you can disable fencing then in the first place.
> Actually quorum-based-watchdog-fencing is the way to do this in a
> safe manner. This of course assumes you have a proper source for
> quorum in your 2-node-setup with e.g. qdevice or using a shared
> disk with sbd (not directly pacemaker quorum here but similar thing
> handled inside sbd).
>
>     Since the alerts are issued from ‘hacluster’ login, sudo
>     permissions for ‘hacluster’ needs to be configured.
>
>      
>
>     Thanx.
>
>      
>
>      
>
>     *From:*Klaus Wenninger [mailto:kwenning at redhat.com]
>     *Sent:* Monday, July 24, 2017 9:24 PM
>     *To:* Kristián Feldsam; Cluster Labs - All topics related to
>     open-source clustering welcomed
>     *Subject:* Re: [ClusterLabs] Two nodes cluster issue
>
>      
>
>     On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
>
>         I personally think that power off node by switched pdu is more
>         safe, or not?
>
>
>     True if that is working in you environment. If you can't do a
>     physical setup
>     where you aren't simultaneously loosing connection to both your
>     node and
>     the switch-device (or you just want to cover cases where that happens)
>     you have to come up with something else.
>
>
>
>     S pozdravem Kristián Feldsam
>     Tel.: +420 773 303 353, +421 944 137 535
>     E-mail.: support at feldhost.cz <mailto:support at feldhost.cz>
>
>     www.feldhost.cz <http://www.feldhost.cz>- *Feld*Host™ –
>     profesionální hostingové a serverové služby za adekvátní ceny.
>
>     FELDSAM s.r.o.
>     V rohu 434/3
>     Praha 4 – Libuš, PSČ 142 00
>     IČ: 290 60 958, DIČ: CZ290 60 958
>     C 200350 vedená u Městského soudu v Praze
>
>     Banka: Fio banka a.s.
>     Číslo účtu: 2400330446/2010
>     BIC: FIOBCZPPXX
>     IBAN: CZ82 2010 0000 0024 0033 0446
>
>      
>
>         On 24 Jul 2017, at 17:27, Klaus Wenninger <kwenning at redhat.com
>         <mailto:kwenning at redhat.com>> wrote:
>
>          
>
>         On 07/24/2017 05:15 PM, Tomer Azran wrote:
>
>             I still don't understand why the qdevice concept doesn't
>             help on this situation. Since the master node is down, I
>             would expect the quorum to declare it as dead.
>
>             Why doesn't it happens?
>
>
>         That is not how quorum works. It just limits the
>         decision-making to the quorate subset of the cluster.
>         Still the unknown nodes are not sure to be down.
>         That is why I suggested to have quorum-based watchdog-fencing
>         with sbd.
>         That would assure that within a certain time all nodes of the
>         non-quorate part
>         of the cluster are down.
>
>
>
>
>         On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri
>         Maziuk" <dmitri.maziuk at gmail.com
>         <mailto:dmitri.maziuk at gmail.com>> wrote:
>
>         On 2017-07-24 07:51, Tomer Azran wrote:
>
>         > We don't have the ability to use it.
>
>         > Is that the only solution?
>
>          
>
>         No, but I'd recommend thinking about it first. Are you sure you will 
>
>         care about your cluster working when your server room is on fire? 'Cause 
>
>         unless you have halon suppression, your server room is a complete 
>
>         write-off anyway. (Think water from sprinklers hitting rich chunky volts 
>
>         in the servers.)
>
>          
>
>         Dima
>
>          
>
>         _______________________________________________
>
>         Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>
>         http://lists.clusterlabs.org/mailman/listinfo/users
>
>          
>
>         Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>
>         Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>
>         Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>
>
>
>
>         _______________________________________________
>
>         Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>
>         http://lists.clusterlabs.org/mailman/listinfo/users
>
>          
>
>         Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>
>         Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>
>         Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>
>          
>
>         -- 
>
>         Klaus Wenninger
>
>          
>
>         Senior Software Engineer, EMEA ENG Openstack Infrastructure
>
>          
>
>         Red Hat
>
>          
>
>         kwenning at redhat.com <mailto:kwenning at redhat.com>  
>
>         _______________________________________________
>         Users mailing list: Users at clusterlabs.org
>         <mailto:Users at clusterlabs.org>
>         http://lists.clusterlabs.org/mailman/listinfo/users
>
>         Project Home: http://www.clusterlabs.org
>         <http://www.clusterlabs.org/>
>         Getting
>         started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>         Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>
>      
>
>
>
>     _______________________________________________
>
>     Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>
>     http://lists.clusterlabs.org/mailman/listinfo/users
>
>      
>
>     Project Home: http://www.clusterlabs.org
>
>     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>
>     Bugs: http://bugs.clusterlabs.org
>
>       
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170725/5b9703e5/attachment-0003.html>


More information about the Users mailing list