[ClusterLabs] Two nodes cluster issue

Prasad, Shashank ssprasad at vanu.com
Mon Jul 24 13:32:26 EDT 2017


Sometimes IPMI fence devices use shared power of the node, and it cannot be avoided.

In such scenarios the HA cluster is NOT able to handle the power failure of a node, since the power is shared with its own fence device.

The failure of IPMI based fencing can also exist due to other reasons also.

 

A failure to fence the failed node will cause cluster to be marked UNCLEAN.

To get over it, the following command needs to be invoked on the surviving node.

 

pcs stonith confirm <failed_node_name> --force

 

This can be automated by hooking a recovery script, when the the Stonith resource ‘Timed Out’ event.

To be more specific, the Pacemaker Alerts can be used for watch for Stonith timeouts and failures.

In that script, all that’s essentially to be executed is the aforementioned command.

Since the alerts are issued from ‘hacluster’ login, sudo permissions for ‘hacluster’ needs to be configured.

 

Thanx.

 

 

From: Klaus Wenninger [mailto:kwenning at redhat.com] 
Sent: Monday, July 24, 2017 9:24 PM
To: Kristián Feldsam; Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] Two nodes cluster issue

 

On 07/24/2017 05:37 PM, Kristián Feldsam wrote:

	I personally think that power off node by switched pdu is more safe, or not?


True if that is working in you environment. If you can't do a physical setup
where you aren't simultaneously loosing connection to both your node and
the switch-device (or you just want to cover cases where that happens)
you have to come up with something else.





S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: support at feldhost.cz

www.feldhost.cz - FeldHost™ – profesionální hostingové a serverové služby za adekvátní ceny.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010 0000 0024 0033 0446 

 

	On 24 Jul 2017, at 17:27, Klaus Wenninger <kwenning at redhat.com> wrote:

	 

	On 07/24/2017 05:15 PM, Tomer Azran wrote:

		I still don't understand why the qdevice concept doesn't help on this situation. Since the master node is down, I would expect the quorum to declare it as dead.

		Why doesn't it happens?

	
	That is not how quorum works. It just limits the decision-making to the quorate subset of the cluster.
	Still the unknown nodes are not sure to be down.
	That is why I suggested to have quorum-based watchdog-fencing with sbd.
	That would assure that within a certain time all nodes of the non-quorate part
	of the cluster are down.
	
	
	

	
	
	

	On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk" <dmitri.maziuk at gmail.com> wrote:

	On 2017-07-24 07:51, Tomer Azran wrote:
	> We don't have the ability to use it.
	> Is that the only solution?
	 
	No, but I'd recommend thinking about it first. Are you sure you will 
	care about your cluster working when your server room is on fire? 'Cause 
	unless you have halon suppression, your server room is a complete 
	write-off anyway. (Think water from sprinklers hitting rich chunky volts 
	in the servers.)
	 
	Dima
	 
	_______________________________________________
	Users mailing list: Users at clusterlabs.org
	http://lists.clusterlabs.org/mailman/listinfo/users
	 
	Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> 
	Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
	Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> 

	
	
	
	

	_______________________________________________
	Users mailing list: Users at clusterlabs.org
	http://lists.clusterlabs.org/mailman/listinfo/users
	 
	Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> 
	Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
	Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> 

	 

	-- 
	Klaus Wenninger
	 
	Senior Software Engineer, EMEA ENG Openstack Infrastructure
	 
	Red Hat
	 
	kwenning at redhat.com   

	_______________________________________________
	Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org> 
	http://lists.clusterlabs.org/mailman/listinfo/users <http://lists.clusterlabs.org/mailman/listinfo/users> 
	
	Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> 
	Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> 
	Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170724/acd1ee93/attachment-0003.html>


More information about the Users mailing list