[ClusterLabs] Problem using IPMI for fencing
Jose Manuel Martínez
jose.martinez at fcsc.es
Wed Mar 4 08:28:05 UTC 2015
I understand the problem.
Fencing and reallocation of resources is not possible if the fencing
resource is not able to know the real status of the other node. So, I'm
going to try to remove this single point of failure adding a second
fencing method (thanks for the links). Our PDU's are not able to
disconnect a sinble power bank, so this is not actually an option for
us, but I think we have other options.
I'll post if I can solve the problem. Thank you so much. You have been
very helpfull.
JoseM
El 03/03/15 a las 19:51, Digimer escribió:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 03/03/15 01:42 PM, Ken Gaillot wrote:
>> On 03/03/2015 01:14 PM, Jose Manuel Martínez wrote:
>>> Hello everybody.
>>>
>>> I'm trying to build an active/passive cluster for the Lustre
>>> filesystem. Pacemaker is working fine in most situations except
>>> one: If a node goes out of power in a 2-node cluster, and I am
>>> using fence_ipmilan as fencing resource (for HP iLO2), the alive
>>> node is not able to takeover the resources of the failed node. It
>>> tries to check the fencing device trying to reboot it, but as the
>>> node is dead (no power), the IPMI interface does not answer.
>> Correct, IPMI that shares power with its host should not be used as
>> the sole fencing device for this very reason. There is no way for
>> the cluster to be certain that the host is down and not just the
>> IPMI.
>>
>> IPMI is fine as the first-attempt fencing device, but there should
>> be a fallback fencing device that is independent of the host (such
>> as a remotely controllable power switch).
> I agree 100%, and do this myself.
>
> IPMI fencing, when it works, is best because when it returns "off", we
> can be *very* sure the node is actually off. As you said though, it is
> electrically and mechanically coupled to the host, so it's vulnerable
> to certain failure cases (ie: total loss of power, mechanical
> destruction, etc).
>
> For this, I always use a pair of switched PDUs (APC and Raritan both
> work well, TrippLite works but is slow). I use a pair because I also
> have power redundancy (separate UPSes, PDUs, etc). So to hand this,
> you need a fairly complex stonith configuration to make it work.
>
> In 'pcs', this is called 'STONITH Levels' and you can see how how to
> build this here:
>
> http://clusterlabs.org/wiki/STONITH_Levels
>
> In 'crm', this is called "Fencing Topology" and you can see how to
> configure it here:
>
> http://clusterlabs.org/wiki/Fencing_topology
>
> In my opinion, this is the only proper configuration for fencing if
> you want to remove all single points of failure.
>
> - --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
>
> iQIcBAEBAgAGBQJU9gLDAAoJECChztQA3mh0SUUQAJ89YOoBH3m1/jR0hUsS5VV2
> sgwi2DuCXPKamHDzMNL8mFnozZxCD5QMs5+yDjWJZWxXCXEz6VB4aR4zVz31URi5
> iZdN0RmaUVbdVqgJrY0KH8QaxuCg440H2mE41Qj/8OKYmK9RW0fhErU59Ydud5wX
> jTuTqBRhfniMr4Qd2myYTmkm7+AwEwy1NfthimweTOTLib/11G8/esJ5AMz6Upeq
> dyKbDxoOJuPODJfglKCrytqJnWuFrfzUWSbVnpRf4pMaRIdeL/Ko9Vsi4zIB3UD3
> TWxbWUS/MM3QopzV9ruFX1yvu0B+YHKhmecgEGtXAxgyWI6zj7RQNHiJ/rBRi8Rk
> Dld5bdAnTzADQeHsvU3PIK+ilrwFjZsCoK8dgK5eSr0jQrKRGUhkTOF6LtMP7HYA
> xtWu3kXE/YbVrBT8BhdFTWSGTBvnCGIfzGNY+/wm45uLXf4lMg2fWW5OCKlgAj0K
> W/srPAU5M8tJesrPXiDY//V2DkQhAsurrNUwVjL+e6mA8LQyyH79bNcP0cN+gyIo
> LHqmK2OVEdwr7uOjijtA5y49iyreR92nfVLOZUzxjrpjXs36eSzJ+DdUulzx3cJJ
> 49BQPlT/+Hb5V7hIVUSFTneyLGrJOLG9g9hFtx4nL9sNTddzpJhXeeEyfy/q9+yy
> DEigN8nBiILHLfHdGMIh
> =1UnT
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
--
*Jose Manuel Martínez García / Tel. 987 293 174 *
*Coordinador de Sistemas*
Fundación Centro de Supercomputación de Castilla y León
Edificio CRAI-TIC, Campus de Vegazana, s/n
Universidad de León
24071 León, España
www.fcsc.es
logoFCSCL jcyl
_________________________________
Este correo va dirigido, de manera exclusiva, a su destinatario y puede
contener información confidencial, cuya divulgación no está permitida
por la ley. Si usted no es su destinatario notifíquelo urgentemente al
remitente y borre este correo de su sistema.
Proteja el Medio Ambiente. Evite imprimir este mensaje si no es
estrictamente necesario.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150304/3001c587/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fdificbh.png
Type: image/png
Size: 11930 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150304/3001c587/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: efacdfdg.png
Type: image/png
Size: 8376 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150304/3001c587/attachment-0009.png>
More information about the Users
mailing list