[ClusterLabs] Fencing one node kill others

Fri Dec 30 15:40:57 EST 2016

Hi,

I have a four node cluster that uses iLo as fencing agent. When i simulate
a node crash (either killing corosync or echo c > /proc/sysrq-trigger) the
node is marked as UNCLEAN and requested to be restarted by the stonith
agent, but everytime that happens another node in the cluster is also
marked as UNCLEAN and rebooted as well. After the nodes are rebooted they
are marked as online again and cluster resume operation without problem.

I have reviewed corosync and pacemaker logs but found nothing that explain
why the other node is also rebooted.

Any hint of what to check or what to look for would be appreciated.

-----------------Cluster conf----------------------------------
 node 1239211542: e1b12 \
attributes standby=off
node 1239211543: e1b13
node 1239211581: e1b03 \
attributes standby=off
node 1239211582: e1b07 \
attributes standby=off
primitive fence-e1b03 stonith:fence_ilo \
params ipaddr=e1b03-ilo login=fence_agent passwd=XXX ssl_insecure=1 \
op monitor interval=300 timeout=120 \
meta migration-threshold=2 target-role=Started
primitive fence-e1b07 stonith:fence_ilo \
params ipaddr=e1b07-ilo login=fence_agent passwd=XXX ssl_insecure=1 \
op monitor interval=300 timeout=120 \
meta migration-threshold=2 target-role=Started
primitive fence-e1b12 stonith:fence_ilo \
params ipaddr=e1b12-ilo login=fence_agent passwd=XXX ssl_insecure=1 \
op monitor interval=300 timeout=120 \
meta migration-threshold=2 target-role=Started
primitive fence-e1b13 stonith:fence_ilo \
params ipaddr=e1b13-ilo login=fence_agent passwd=XXX ssl_insecure=1 \
op monitor interval=300 timeout=120 \
meta migration-threshold=2 target-role=Started
..... extra resources ......
location l-f-e1b03 fence-e1b03 \
rule -inf: #uname eq e1b03 \
rule 10000: #uname eq e1b07
location l-f-e1b07 fence-e1b07 \
rule -inf: #uname eq e1b07 \
rule 10000: #uname eq e1b03
location l-f-e1b12 fence-e1b12 \
rule -inf: #uname eq e1b12 \
rule 10000: #uname eq e1b13
location l-f-e1b13 fence-e1b13 \
rule -inf: #uname eq e1b13 \
rule 10000: #uname eq e1b12
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.15-e174ec8 \
cluster-infrastructure=corosync \
stonith-enabled=true \
cluster-name=test-cluster \
no-quorum-policy=freeze \
last-lrm-refresh=1483125286
----------------------------------------------------------------------------------------

Regards,
  Ali
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20161230/a29a7abd/attachment-0002.html>