[ClusterLabs] Problem using IPMI for fencing

Jose Manuel Martínez jose.martinez at fcsc.es
Wed Mar 4 08:28:05 UTC 2015


I understand the problem.

Fencing and reallocation of resources is not possible if the fencing 
resource is not able to know the real status of the other node. So, I'm 
going to try to remove this single point of failure adding a second 
fencing method (thanks for the links). Our PDU's are not able to 
disconnect a sinble power bank, so this is not actually an option for 
us, but I think we have other options.

I'll post if I can solve the problem. Thank you so much. You have been 
very helpfull.

JoseM


El 03/03/15 a las 19:51, Digimer escribió:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 03/03/15 01:42 PM, Ken Gaillot wrote:
>> On 03/03/2015 01:14 PM, Jose Manuel Martínez wrote:
>>> Hello everybody.
>>>
>>> I'm trying to build an active/passive cluster for the Lustre
>>> filesystem. Pacemaker is working fine in most situations except
>>> one: If a node goes out of power in a 2-node cluster, and I am
>>> using fence_ipmilan as fencing resource (for HP iLO2), the alive
>>> node is not able to takeover the resources of the failed node. It
>>> tries to check the fencing device trying to reboot it, but as the
>>> node is dead (no power), the IPMI interface does not answer.
>> Correct, IPMI that shares power with its host should not be used as
>> the sole fencing device for this very reason. There is no way for
>> the cluster to be certain that the host is down and not just the
>> IPMI.
>>
>> IPMI is fine as the first-attempt fencing device, but there should
>> be a fallback fencing device that is independent of the host (such
>> as a remotely controllable power switch).
> I agree 100%, and do this myself.
>
> IPMI fencing, when it works, is best because when it returns "off", we
> can be *very* sure the node is actually off. As you said though, it is
> electrically and mechanically coupled to the host, so it's vulnerable
> to certain failure cases (ie: total loss of power, mechanical
> destruction, etc).
>
> For this, I always use a pair of switched PDUs (APC and Raritan both
> work well, TrippLite works but is slow). I use a pair because I also
> have power redundancy (separate UPSes, PDUs, etc). So to hand this,
> you need a fairly complex stonith configuration to make it work.
>
> In 'pcs', this is called 'STONITH Levels' and you can see how how to
> build this here:
>
> http://clusterlabs.org/wiki/STONITH_Levels
>
> In 'crm', this is called "Fencing Topology" and you can see how to
> configure it here:
>
> http://clusterlabs.org/wiki/Fencing_topology
>
> In my opinion, this is the only proper configuration for fencing if
> you want to remove all single points of failure.
>
> - -- 
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
>
> iQIcBAEBAgAGBQJU9gLDAAoJECChztQA3mh0SUUQAJ89YOoBH3m1/jR0hUsS5VV2
> sgwi2DuCXPKamHDzMNL8mFnozZxCD5QMs5+yDjWJZWxXCXEz6VB4aR4zVz31URi5
> iZdN0RmaUVbdVqgJrY0KH8QaxuCg440H2mE41Qj/8OKYmK9RW0fhErU59Ydud5wX
> jTuTqBRhfniMr4Qd2myYTmkm7+AwEwy1NfthimweTOTLib/11G8/esJ5AMz6Upeq
> dyKbDxoOJuPODJfglKCrytqJnWuFrfzUWSbVnpRf4pMaRIdeL/Ko9Vsi4zIB3UD3
> TWxbWUS/MM3QopzV9ruFX1yvu0B+YHKhmecgEGtXAxgyWI6zj7RQNHiJ/rBRi8Rk
> Dld5bdAnTzADQeHsvU3PIK+ilrwFjZsCoK8dgK5eSr0jQrKRGUhkTOF6LtMP7HYA
> xtWu3kXE/YbVrBT8BhdFTWSGTBvnCGIfzGNY+/wm45uLXf4lMg2fWW5OCKlgAj0K
> W/srPAU5M8tJesrPXiDY//V2DkQhAsurrNUwVjL+e6mA8LQyyH79bNcP0cN+gyIo
> LHqmK2OVEdwr7uOjijtA5y49iyreR92nfVLOZUzxjrpjXs36eSzJ+DdUulzx3cJJ
> 49BQPlT/+Hb5V7hIVUSFTneyLGrJOLG9g9hFtx4nL9sNTddzpJhXeeEyfy/q9+yy
> DEigN8nBiILHLfHdGMIh
> =1UnT
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


-- 

*Jose Manuel Martínez García / Tel. 987 293 174 *

*Coordinador de Sistemas*

Fundación Centro de Supercomputación de Castilla y León

Edificio CRAI-TIC, Campus de Vegazana, s/n

Universidad de León

24071 León, España

www.fcsc.es

logoFCSCL jcyl

_________________________________

Este correo va dirigido, de manera exclusiva, a su destinatario y puede 
contener información confidencial, cuya divulgación no está permitida 
por la ley. Si usted no es su destinatario notifíquelo urgentemente al 
remitente y borre este correo de su sistema.
Proteja el Medio Ambiente. Evite imprimir este mensaje si no es 
estrictamente necesario.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20150304/3001c587/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fdificbh.png
Type: image/png
Size: 11930 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20150304/3001c587/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: efacdfdg.png
Type: image/png
Size: 8376 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20150304/3001c587/attachment-0005.png>


More information about the Users mailing list