[ClusterLabs] Problem using IPMI for fencing

Tue Mar 3 18:14:14 UTC 2015

Hello everybody.

I'm trying to build an active/passive cluster for the Lustre filesystem.
Pacemaker is working fine in most situations except one: If a node goes 
out of power in a 2-node cluster, and I am using fence_ipmilan as 
fencing resource (for HP iLO2), the alive node is not able to takeover 
the resources of the failed node. It tries to check the fencing device 
trying to reboot it, but as the node is dead (no power), the IPMI 
interface does not answer.

Log says:
/Mar 03 18:16:18 [20355] lustre03 stonith-ng:    error: remote_op_done:  
Operation reboot of lustre04 by lustre03 for 
crmd.20359 at lustre03.7a198338: No route to host/

Log knows what should do ('lustre04' node is the dead one and 'lustre03' 
is the alive one):
/
  warning: stage6:  Scheduling Node lustre04 for STONITH
Mar 03 18:16:18 [20358] lustre03    pengine:     info: 
native_stop_constraints:         Fencing_Lustre03_stop_0 is implicit 
after lustre04 is fenced
Mar 03 18:16:18 [20358] lustre03    pengine:     info: 
native_stop_constraints:         Resource_OST09_stop_0 is implicit after 
lustre04 is fenced
Mar 03 18:16:18 [20358] lustre03    pengine:     info: 
native_stop_constraints:         Resource_OST06_stop_0 is implicit after 
lustre04 is fenced
Mar 03 18:16:18 [20358] lustre03    pengine:     info: 
native_stop_constraints:         Resource_OST07_stop_0 is implicit after 
lustre04 is fenced
Mar 03 18:16:18 [20358] lustre03    pengine:     info: 
native_stop_constraints:         Resource_OST08_stop_0 is implicit after 
lustre04 is fenced
Mar 03 18:16:18 [20358] lustre03    pengine:   notice: LogActions: 
*Move    Fencing_Lustre03        (Started lustre04 -> lustre03)*
Mar 03 18:16:18 [20358] lustre03    pengine:     info: LogActions:      
Leave   Fencing_Lustre04        (Started lustre03)
Mar 03 18:16:18 [20358] lustre03    pengine:   notice: LogActions: 
*Move    Resource_OST09  (Started lustre04 -> lustre03)*
Mar 03 18:16:18 [20358] lustre03    pengine:   notice: LogActions: 
*Move    Resource_OST06  (Started lustre04 -> lustre03)*
Mar 03 18:16:18 [20358] lustre03    pengine:     info: LogActions:      
Leave   Resource_OST04  (Started lustre03)
Mar 03 18:16:18 [20358] lustre03    pengine:     info: LogActions:      
Leave   Resource_OST05  (Started lustre03)
Mar 03 18:16:18 [20358] lustre03    pengine:   notice: LogActions: 
*Move    Resource_OST07  (Started lustre04 -> lustre03)*
Mar 03 18:16:18 [20358] lustre03    pengine:   notice: LogActions: 
*Move    Resource_OST08  (Started lustre04 -> lustre03)*/

...but these operations never happen. If it can't fence the dead node, 
the resources are not takeovered.

This is an infinite loop and resources are never takeovered.

Is there a way to say the cluster what to do in this case?.

Best regards

-- 

*Jose Manuel Martínez García / Tel. 987 293 174 *

*Coordinador de Sistemas*

Fundación Centro de Supercomputación de Castilla y León

Edificio CRAI-TIC, Campus de Vegazana, s/n

Universidad de León

24071 León, España

www.fcsc.es

logoFCSCL jcyl

_________________________________

Este correo va dirigido, de manera exclusiva, a su destinatario y puede 
contener información confidencial, cuya divulgación no está permitida 
por la ley. Si usted no es su destinatario notifíquelo urgentemente al 
remitente y borre este correo de su sistema.
Proteja el Medio Ambiente. Evite imprimir este mensaje si no es 
estrictamente necesario.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150303/95f3e95f/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jafibdcj.png
Type: image/png
Size: 11930 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150303/95f3e95f/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: eedjhdja.png
Type: image/png
Size: 8376 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150303/95f3e95f/attachment-0007.png>